Getting Started with Hadoop
Our fourth installment from Mitch Seigle’s time on “The Cube” at Hadoop Summit focuses on getting started with Hadoop. Three initial steps to consider include:
- Experiment – We strongly recommend an experimentation phase, where significant testing is performed in a Hadoop environment, before it is put into production
- Prioritize Data – Identify the high value problems and prioritize Hadoop projects aimed at solving those first
- Don’t Re-architect…Yet – Test first before any strategy discussion around rebuilding data integration or data warehousing processes takes place
Syncsort DMExpress can help simplify many of the Hadoop processes such as loading data into the Hadoop framework, improving MapReduce performance, and helping to alleviate the skills gap associated with Hadoop (given DMExpress’ self-tuning capabilities).