Category Archives: Data Warehouse

Bulk Mutation in an Integration Data Lake with Spark

Data lakes act as repository of data from various sources, possibly of different formats. It can be used to build data warehouse or to perform other data analysis activities. Data lakes are generally built on top of Hadoop Distributed File … Continue reading

Posted in Big Data, Data Warehouse, eCommerce, ETL, Spark | Tagged , , , , | Leave a comment

Making Hive Squawk like a Real Database

Hive is great for large scale data warehousing applications. In one of my recent projects I was handed over the interesting and challenging task of  making Hive behave like an OLTP system i.e., support update and delete. To be more … Continue reading

Posted in Big Data, Data Warehouse, Hive | Tagged , | 16 Comments

Hive Plays Well with JSON

Hive is an abstraction on Hadoop Map Reduce. It provides a SQL like interface for querying HDFS data, whch accounts for most of it’s popularity.  In Hive, table structured data in HDFS is encapsulated with a table as in RDBMS. … Continue reading

Posted in Big Data, Data Warehouse, Hadoop and Map Reduce, Hive, Query | Tagged , | 42 Comments