It's very easy to get carried away by classical data visualizations and the stories behind these visualizations, but the truth is that the bedrock of analysis is Data cleaning. No Data cleaning, no analysis, no visualization. Yes, Data cleaning is that important.
However, data cleaning should not be done for the fun of it. There are measures to take for efficient data cleaning. Just as Donato Diorio puts it, “Without a systematic way to make data clean, bad data will happen.” And I dare to say, 'bad data will most surely result to bad analysis.'
In this project, I was presented with a large dirty dataset, See raw data file here . The process can actually be boring. But the systematic way of data cleaning added a bit of fun to it. Embedded below is the dirty data set.
In cleaning the Dataset, I took the following steps:
This systematic approach ensured that I have a 98% clean Dataset. See the clean dataset here. Also embedded is the spreadsheet of the cleaned data.
Apart from being ready to draw insight from, it's visible that the above embedded sample of the cleaned data sends a cool feeling to the brain, unlike the embedded "chaotic" sample of the dirty data. What this does to the sight/brain is exactly what it does to analysis, nothing but chaos, rendering one's analysis way below the level of correctness.
I'm glad I've not only shown you a project, but I've also made you understand how important data cleaning is to data analysis.
TOOLS USED: MICROSOFT EXCEL, POWER QUERY
SHARED DATASET: AHMED OYELOWO
RELATED PROJECT: DIAGNOSTIC ANALYSIS TO SOLVE THE FINANCIAL CRISIS BEING EXPERIENCED BY KEYSTONE KITCHEN-WARE