29 Jan
29Jan

It's very easy to get carried away by classical data visualizations and the stories behind these visualizations, but the truth is that the bedrock of analysis is Data cleaning. No Data cleaning,  no analysis, no visualization. Yes, Data cleaning is that important.


However, data cleaning should not be done for the fun of it. There are measures to take for efficient data cleaning. Just as Donato Diorio puts it, “Without a systematic way to make data clean, bad data will happen.” And I dare to say, 'bad data will most surely result to bad analysis.'


In this project, I was presented with a large dirty dataset, See raw data file here . The process can actually be boring. But the systematic way of data cleaning added a bit of fun to it.  Embedded below is the dirty data set.

In cleaning the Dataset, I took the following steps:

  1. I rightly placed the headers in order to get a definite data structure.
  2. I removed empty columns and rows.
  3. I changed each column value to its  appropriate data type, matching the column header.
  4. I removed duplicate values.
  5. I watched out for outliers in each column.


This systematic approach ensured that I have a 98% clean Dataset. See the clean dataset here.  Also embedded is the spreadsheet of the cleaned data.

 


Apart from being ready to draw insight from, it's visible that the above embedded sample of the cleaned data sends a cool feeling to the brain, unlike the embedded "chaotic" sample of the dirty data. What this does to the sight/brain is exactly what it does to analysis, nothing but chaos, rendering one's analysis way below the level of correctness.


 I'm glad I've not only shown you a project, but I've also made you understand how important data cleaning is to data analysis.


TOOLS USED: MICROSOFT EXCEL, POWER QUERY

SHARED DATASET: AHMED OYELOWO


RELATED PROJECT: DIAGNOSTIC ANALYSIS TO SOLVE THE FINANCIAL CRISIS BEING EXPERIENCED BY KEYSTONE KITCHEN-WARE

RELATED PROJECT: EXPLORATORY AND PREDICTIVE ANALYSIS FOR THE SELLER SUPERMARKET TO DETERMINE WHICH CATEGORY PRODUCTS TO STOCK, WHICH CATEGORY PRODUCT HAS THE LEAST ORDER AND TO FIND A DATA-DRIVEN RECOMMENDATION ON THE LEAST PERFORMING PRODUCT

Comments
* The email will not be published on the website.
I BUILT MY SITE FOR FREE USING