Category Archives: Data Transformation

Combating High Cardinality Features in Supervised Machine Learning

Typical training data set for real world machine learning problems has mixture of different types of data including numerical and categorical. Many machine learning algorithms can not handle categorical variables. Those that can, categorical data can pose a serious problem … Continue reading

Posted in Big Data, Data Science, Data Transformation, ETL, Hadoop and Map Reduce, Predictive Analytic | Tagged , , , | Leave a comment

Transforming Big Data

This is a sequel to my earlier posts on Hadoop based ETL covering validation and profiling. Considering  the fact that in most data projects more than 50% of the time is spent on  data cleaning and munging, I have added significant … Continue reading

Posted in Big Data, Data Transformation, ETL, Hadoop and Map Reduce | Tagged , | 3 Comments