Data Quality & Cleansing Tools’ Effect on Data Mining
Often, people think that ETL (extract, transformation, and load) is all there is to ensure data quality. There is a lot more to data quality than ETL; however, a data analyst should be familiar with ETL basics: processes, techniques, and tools. Data mining models may not perform well with inaccurate data or dirty data. The time to train and test a model may cause a project to fail when data is sparse; sparse data may lead to more time during exploration and finding better data to use for training. Understanding the basics of data management to include data quality may help a data analyst take less time to succeed with their data mining project.
For the Unit 2 assignment, you will research and write a short (3–4 pages for the body section) paper in APA (6th edition) style and format, with a minimum of five references, that covers the following topics:
Written communication: Written communication is free of errors that detract from the overall message.
APA formatting: Resources and citations are formatted according to APA (6th edition) style and formatting.
Length of essay paper: 3–4 pages, excluding the references page.
Font and font size: Times New Roman, 12 point.