Tag Archives: dedup

Removing Duplicates from Order Data Using Spark

If you work with data, there is a high probability that you have run into duplicate data in your data set. Removing duplicates in Big Data is a computationally intensive process and parallel cluster processing with Hadoop or Spark becomes … Continue reading

Posted in Big Data, Data Science, ETL, Spark | Tagged , | 2 Comments