Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarisation, query, and analysis

The Apache Hive™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto data and query it using a SQL-like language called HiveQL, but also allows traditional map/reduce programmers the option of plugging in custom mappers and reducers.

Use Cases for Apache Hive

Queries and manages large datasets residing in distributed storage.
Manages reporting and ad hoc queries.
Carries out data mining and analysis on monthly global users.
User analytics.
Dataset cleaning and machine learning R&D, as part of a larger Hadoop pipeline to serve near-real time web analytics.
Customer-facing analysis destination for our hosted syslog and app log management service.
Tracks and analyses all the usage data of the ads across our network.
Key Benefits of Apache Hive
  • Reduces the time it takes to perform semantic checks.
  • Can mine and analyse specific data from large datasets.
  • Can query in an ad hoc manner.
Features of Apache Hive
  • Facilitates querying and managing large datasets residing in distributed storage: HiveQL.
  • Open-source software under Apache Licence.
  • Open-source.

