Category Archives: Hive

Bulk Insert, Update and Delete in Hadoop Data Lake

Hadoop Data Lake, unlike traditional data warehouse, does not enforce schema on write and serves as a repository of data with different formats from various sources. If the data collected in a data lake is immutable, they simply accumulate in an append only … Continue reading

Posted in Big Data, ETL, Hadoop and Map Reduce, Hive | Tagged , , , | 19 Comments

Making Hive Squawk like a Real Database

Hive is great for large scale data warehousing applications. In one of my recent projects I was handed over the interesting and challenging task of  making Hive behave like an OLTP system i.e., support update and delete. To be more … Continue reading

Posted in Big Data, Data Warehouse, Hive | Tagged , | 16 Comments

Big Web Analytic

I had started on a Hadoop based web analytic open source project some time ago. Recently I did some work on it and decided blog about the development I did on the the project. The project is  called visitante and … Continue reading

Posted in Big Data, ETL, Hadoop and Map Reduce, Hive, Web Analytic | Tagged , , | 9 Comments

Hive Plays Well with JSON

Hive is an abstraction on Hadoop Map Reduce. It provides a SQL like interface for querying HDFS data, whch accounts for most of it’s popularity.  In Hive, table structured data in HDFS is encapsulated with a table as in RDBMS. … Continue reading

Posted in Big Data, Data Warehouse, Hadoop and Map Reduce, Hive, Query | Tagged , | 46 Comments