Category Archives: Hadoop and Map Reduce

Gaining Insight by Mining Simple Rules from Customer Service Call Data

Although the goal for most predictive analytic problem is to make prediction, sometimes we are more interested in the model learnt by the learning algorithm. If the learnt model could be expressed as s set of rules, then those rules … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Machine Learning, Rule Mining | Tagged , , | Leave a comment

Simple Sanity Checks for Data Correctness with Spark

Sometimes when running a complex data processing pipeline with Hadoop or Spark, you may encounter data, where most of the data is just grossly invalid. It might save lot of pain and headache, if we could do some simple sanity checks before feeding … Continue reading

Posted in ETL, Hadoop and Map Reduce, Spark | Tagged | Leave a comment

Customer Lifetime Value, Present and Future

Customer lifetime value for a business is the monetary value associated with relationship with a customer, although there have been attempts to include non monetary value associated  with a customer. It’s an important metrics to have for any marketing initiative … Continue reading

Posted in Big Data, Data Science, Hadoop and Map Reduce, Marketing Analytic, Statistics | Tagged , , , | Leave a comment

Detecting Incidents with Context from Log Data

Analyzing vast amount of machine generated unstructured or semi structured data is Hadoop’s forte. Many of us have gone through the exercise of searching log files, most likely with grep,  for some pattern and then looking at surrounding log lines … Continue reading

Posted in Big Data, Hadoop and Map Reduce, Log Analysis, Uncategorized, Web Analytic | Tagged , , , | Leave a comment

Association Mining with Improved Apriori Algorithm

Association mining solves many real life  problems e.g., frequent items bought together, songs frequently listened together in one session etc. Apriori is a popular algorithm for mining frequent items sets. In this post, we will go over a Hadoop based … Continue reading

Posted in Association Mining, Big Data, Data Mining, Data Science, Hadoop and Map Reduce, Marketing Analytic, Rule Mining | Tagged , , | Leave a comment

Transforming Big Data

This is a sequel to my earlier posts on Hadoop based ETL covering validation and profiling. Considering  the fact that in most data projects more than 50% of the time is spent on  data cleaning and munging, I have added significant … Continue reading

Posted in Big Data, Data Transformation, ETL, Hadoop and Map Reduce | Tagged , | 2 Comments

Profiling Big Data

Data profiling is the process of examining data to learn about important characteristics of data. It’s an important part of any ETL process. It’s often necessary to do data profiling before embarking on any serious analytic work. I have implemented … Continue reading

Posted in Big Data, Data Profiling, data quality, ETL, Hadoop and Map Reduce | Tagged , | 5 Comments