Tag Archives: spark

Contextual Outlier Detection with Statistical Modeling on Spark

Sometimes an outlier is defined with respect to a context. Whether a data point should be labeled as an outlier depends on the associated context. For a bank ATM, transactions that are considered normal between 6 AM and 10 PM, … Continue reading

Posted in Anomaly Detection, Big Data, Data Science, Spark | Tagged , , | 3 Comments

Data Normalization with Spark

Data normalization is a required data preparation step for many Machine Learning algorithms. These algorithms are sensitive to the relative values of the feature attributes. Data normalization is the process of bringing all the attribute values within some desired range. Unless … Continue reading

Posted in Big Data, Data Science, ETL, Machine Learning, Spark | Tagged , , | Leave a comment

Alarm Flooding Control with Event Clustering Using Spark Streaming

You show up at work in the morning and open your email to find 100 alarm emails in your inbox for the same error from an application running on some server within a short time window of 1 minute. You … Continue reading

Posted in Anomaly Detection, Big Data, Real Time Processing, Spark, stream processing | Tagged , , , | 1 Comment

Big Road Map for Big Data

The number of choices for big data solutions sometimes makes it overwhelming and confusing. Purpose of this post is to  layout a road map for the big data solutions. I will be categorizing the products under four different category of … Continue reading

Posted in Big Data | Tagged , , , , , , , , , , , , | 5 Comments

Bring some Spark into your life

Hadoop is a great cluster computing framework. But sometimes  it may not be a great fit for your particular problem in hand. Or you may be having Hadoop fatigue and want to explore other options. There are certain problems where … Continue reading

Posted in Big Data, Cluster Computation, Scala, Spark | Tagged , , , | 4 Comments