Data Science

How to Get a Handle on Big Data & Hadoop

Hadoop adoption rates are up, but not all of the initiatives are meeting their goals in terms of generating revenue or creating cost savings. In fact, Gartner forecasts that about 70 percent of all Hadoop initiatives will fail in this regard through the year 2018. All of the vendors tout easier to use products, so what is the cause for the difficulties in getting big data and Hadoop to perform?

Getting a Handle on Big Data & Hadoop Skills

Mathematics - Blog 6-15-16

The skills it takes to effectively access, integrate, and analyze big data take a long time to master, and until recently there weren’t many programs where people could learn these skills. It requires a unique set of mathematics, technology, and business skills. On the positive side, Data Integration vendors like Syncsort have delivered tech with Syncsort DMX-h  that greatly simplifies data access and integration challenges with simple and efficient solutions for both batch and real-time streaming data workloads between legacy and emerging platforms, including mainframes and Apache Hadoop, Spark and Kafka.

But for analytics, the skills it takes to get a grip on big data and wrangle some productivity out of Hadoop are still a holdup. Fortunately, new training programs (both online and at brick and mortar colleges) are popping up that should soon help address these skills gaps. The position of data scientist was recently voted Sexiest Job of the 21st Century, meaning more students ought to be headed toward that career goal. For now, many businesses turn to outside contractors for the big data skills they need when hiring in-house becomes difficult or impossible.

Getting a  Handle on Ever-Changing Tools & Frameworks

Naturally, if there is trouble finding the right talent, there will be other problems along the way — one of which is keeping up with the rapidly-evolving landscape of tools and frameworks in the Hadoop ecosystem. Syncsort has also addressed this challenge by delivering Intelligent Execution capabilities with DMX-h, dynamically planning for the applications at run-time based on the chosen framework, future-proofing the applications as the Big Data technology stack evolves.  In fact, with delivery of Spark support in their latest release, customers can take the same jobs initially designed for MapReduce and run them natively in Spark by simply changing the execution framework from a drop-down menu – without requiring any rewriting or recompiling.

Getting a Handle on Big Data & Hadoop Storage

Naturally, if there is trouble finding the right talent, there will be other problems along the way — one of which is choosing the right big data storage infrastructure. One choice with growing use cases is a cloud-based infrastructure that can grow and adapt as data sets do. Depending on the type(s) of data the organization uses, data storage can include a data lake, a new NoSQL database, or a traditional SQL database. But there have been struggles with all of these — SQL being inadequate for storing unstructured data, a learning curve involved in adopting NoSQL, and the issue of data lakes becoming data swamps where data is irretrievable.

Getting a Handle on Big Data Quality

With data streaming in from numerous sources, data quality becomes an issue. Companies that don't take the time and effort to cleanse and de-duplicate their data aren't able to get meaningful results out of their data analysis.

With data streaming in from numerous sources, data quality becomes an issue. Companies that don’t take the time and effort to cleanse and de-duplicate their data aren’t able to get meaningful results out of their data analysis.

While many companies understand that not all data is good data – and that quality matters — it remains a stumbling block for many.  Luckily, there are numerous data de-duplication, data cleansing, and related services to help companies get their databases up to grade so that the analytics done in Hadoop are meaningful and relevant.

Getting a Handle on Project That Deliver Quick ROI

Assuming companies have the right skills, tools and data – picking the right use case to tackle with Hadoop is critical to success.  Another area where vendors have made huge strides is in the process of ETL, or accessing data from diverse sources and integrating with Hadoop or Spark and other emerging technologies within the Hadoop ecosystem. But without the right product, getting data from its sources or data warehouses into the big data infrastructure can be nothing short of daunting. Many businesses have invested tremendous amounts of time and money, only to be stuck with a big data project as another silo where data is held and isolated. However, there are excellent products and services available to integrate all of the company’s data for a single, cohesive data solution.

Syncsort leverages its extensive mainframe and big data expertise to simplify access and integration of diverse, enterprise-wide big data, including mainframe into Hadoop and Spark. This not only helps get a handle on big data and Hadoop integration, but reduces the new skills required to do it. Their Big Data solutions can help you begin gleaning value from big data today.

Christy Wilson

Authored by Christy Wilson

Syncsort contributor Christy Wilson began writing for the technology sector in 2011, and has published hundreds of articles related to cloud computing, big data analysis, and related tech topics. Her passion is seeing the fruits of big data analysis realized in practical solutions that benefit businesses, consumers, and society as a whole.
0 comments

Leave a Comment

*