Apache HAWQ : An Introduction

[A presentation on this topic is here]

In-memory, distributed, SQL datastore that allows transactions (for catalog data : via postgresql , for user data : it is append only) as first-class hadoop service, with out of the box integration to HDFS, Hive, HBase.
40x faster than hive which has already got significant performance gains (35x-45x) with stinger initiative.
Open source. Apache project. Support (Hortonworks).
Hive/Tez for batch processing (alternatives being MR and Pig) and interactive queries, while HAWQ is a strong candidate for hosting reporting schema (fact tables) and serve as backend for BI tools.

below are few use cases that help illustrate hawq and its integration features. (Sample codes and results are linked).

Install hawq and PXF on a hadoop cluster (below installation section covers this).
Reasonble specs on each segment node (ex: 4 cores, 20 GB).

HAWQ external table using HDFS location of an existing compressed file via pxf.
Sample code to load data to hdfs, create hawq external tables and verify them is here and results are here.

HAWQ external table using existing Hive table.
Sample code to create and load hive tables, create corresponding hawq external tables and verify them is here and results are here.

HAWQ external table using existing hbase table.
Sample code to create and load one hbase table, create corresponding hawq external table and verify it is here and results are here (look for ‘order_lineitems_hawq=512071’).

In HDP 2.4 : try this for sandbox, same should work in a clustered mode with few modifications
In HDP 2.5 : HAWQ runs on slider
If you attempted installing HAWQ on 2.4 try removing HAWQ service upgrade to HDP 2.5 and then try adding service back. Only this time it would also auto suggest/select slider. Ensure you try installing master on a node that doesn’t have master binaries through previous attempts.
Install PXF (hawq extension framework) service through ambari (below section covers it) .

PXF

PXF installation in HDP 2.5 fails currently for lack of pxf json rpm. Go ahead and install PXF, it would setup other rpms and below steps would correct json rpm issue
build pxf-json rpm manually from here

For use cases 1,2,3

This post was later published on LinkedIn here.