Approach
- HBase is great at large number of (on-demand) columns, random writes and reads and maintaining history
- Against each node(row key), treat every neighbor as a column with a fixed (can be extended for weighted graphs) value of 1. Include self loop.
- Pick node and least neighbor and insert into another hbase table with just node and component-id(least neighbor)
- Use joins (based on GIMV) to refine component-ids. Iterate till convergence is achieved. (In several ways join is also similar to a BSP approach in Pregel/Giraph)
- Use hive as the language for above steps
Solution is here
Testing
Features
- Incremental processing of graphs is feasible
- Should theoretically scale for web scale graphs – sparse or dense.
- Though based on GIMV – Matrix Vector multiplication isnt needed. GIMV is a very useful base to model solutions like above for other graph problems that can be solved by it.
References
- Previous posts – part 2, part 1
- GIMV paper and pegasus project
- BSP based approach in Pregel paper
- Another GIMV/HBase approach paper
Illustration