Tag Archives: data quality

Contextual Data Completeness Metric Computation on Spark

Data quality is critical for the healthy operation of any data driven enterprise. There are various kinds of data quality metrics. In this post, the focus will be on the completeness of data. Data quality from a completeness point of … Continue reading

Posted in Big Data, Data Science, ETL, Spark | Tagged , | Leave a comment

Data Quality Control With Outlier Detection

For many Big Data projects, it has been reported  that significant part of the time, sometimes up to 70-80% of time,  is spent in data cleaning and preparation. Typically, in most ETL tools,  you define constraints and rules statically for … Continue reading

Posted in Big Data, Data Science, ETL, Hadoop and Map Reduce, Internet of Things, Outlier Detection, Statistics | Tagged , , , , | 1 Comment