The Best Books on Big Data for 2015
Hopefully, you’ve already consumed Kenneth Cukier and Viktor Mayer-Schönberger’s Big Data: A Revolution That Will Transform How We Live, Work, and Think and the ultimate big data book, Tom White’s Hadoop: The Definitive Guide. Now you’re ready for the latest and greatest big data literature, and you’re in luck, because here’s your list!
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 by Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, and Jeff Markham
Leverage big data with YARN in 2015 using this sensible guide.
Released in March 2014, this book goes beyond MapReduce and delves into the YARN breakthrough. Touted as, “The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop YARN,” it offers faster and easier ways to adapt your existing code and develop new code in order to make use of the latest Hadoop technologies.
It discusses scalability, cluster utilization, and new models and services for programmers. It also offers options to Java and batch processing. Walking you step-by-step through the YARN lifecycle, it gives examples of implementation to help you visualize using these techniques in your own work.
Apache Hadoop Yarn is written with programmers in mind, so it isn’t the best book for executives and managers looking to find ways to use big data, but it’s an excellent resource for the analyst and IT department. It is available for free download.
Hadoop Application Architectures by Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira
Don’t waste time and money. Get your Hadoop architecture right from the beginning with this helpful guide.
Released in July 2014, this Hadoop tutorial offers practical insight on the architectural considerations of using Hadoop. Complete with real-world examples, it delves into how to design and implement new Hadoop applications, as well as learning to incorporate Hadoop into an existing data infrastructure. It also provides guidelines on the best practices for designing HDFS and HBase schema.
Using Flume by Hari Shreedharan
Released in October 2014, Using Flume: Flexible, Scalable, and Reliable Data Streaming explains how to get all that data off the front end servers and into Hadoop in real time (or almost real time). It’s a great resource for learning about Flume’s features, such as collection, aggregation, streaming large data sets to the Hadoop Distributed File System, HBase, Elastic Search, HBase, and more. Written for coders and engineers, it is definitely a resource for the highly technical workers.
Practical Hadoop Security by Bhushan Lakhe
Released in September 2014, Practical Hadoop Security is a product of the Cloudera engineers. Written for the security architects and systems administrators, it discusses working Hadoop into the grand scheme of the company’s security systems. It discusses the risks involved in Hadoop deployment, as well as the how’s and why’s of developing the right policies for Hadoop security. This book is less technical than the previous books on this list, designed to advise the concerns of the security architects of the enterprise.
Learning Spark: Lightning-Fast Big Data Analytics by Holden Karau
Released in February 2015, this book is about managing the big data that streams in from websites efficiently. Focusing on Spark, it discusses ways to make programs run faster and using cluster computing. Written by Spark’s development team, it’s an excellent introductory course to using Spark in lieu of MapReduce for loading data into memory and querying it. This book is written with the more technical workers in mind.