Do you have any “go-to” products that you rely upon for ETL when a customer wants to load a ton of data into Hadoop?
We try to pick the ETL tool that best matches the customer’s requirements. If the data is highly structured and is coming in from CSV or fixed length files, we usually use multi-threaded HDFS client programs to bulk load the data from NFS mounted file systems or local storage into HDFS. From there it can be loaded into Hive tables or HBase tables. Many times, a MapReduce program can assist in parallelizing the ingest process. If the data is coming from a relational database, then we try to use Sqoop or other JDBC based connectivity tool. Sometimes, if the sources are web logs or server logs and are being ingested on a continuous basis, we can set up Flume agents.
Eliassen Group has been around since 1989. Since the organization can take the long view of things, do you see Big Data as an incremental shift from distributed computing and HPC, or is Big Data something fundamentally different?
It’s fundamentally different for a lot of reasons. Having said that, not everybody will do it. Some of the key differences are the massive amounts of data that you can store, analyze and manipulate. Another difference is the ability to work with semi-structured and unstructured data (think YouTube video, text strings, Twitter streams, etc.). Why won’t some do it? Well, they’ll struggle with where to put the proverbial first stake in the ground, or they’ll worry about the cost, or their business folks won’t adopt it — lots of reasons. Here’s the trick to success: focus on how Big Data might enhance business value, obtain competitive advantage, better understand your clients, etc., and you’ll worry a lot less about the issues I’ve outlined.
One aspect of finding talent that hasn’t changed much is the search for vertical talent (i.e., knowledge of a specific product) vs. horizontal talent (hybrid analyst / architect roles). Some think that internet search has turned, as a result, into a rather mindless SEO exercise rather than a subtle matching of talent to need? Agree or disagree?
Completely and totally disagree. What somebody puts down on a resume as it relates to skills, competencies and experience — stuff that can be easily “mined” using search tools — is just the beginning. The fact that somebody is technically qualified to do the job isn’t remotely enough to convince today’s hiring managers that the candidate should be hired. The subtle matching of talent is super important, and our recruiters at Eliassen Group are particularly good at it. They work hard to understand the specific needs and requirements of the hiring manager, and ensure that their candidates are a perfect match. Questions like: “Will they fit into the culture of the employer?” “Are they willing to work extra hours?” “Are they willing to travel?” “Do they work well in teams?” “Can they work independently?” “Do they respond well to constructive criticism?” These are all things that cannot be determined by parsing a resume to see if somebody has the technical skills necessary.
Do you find that different sorts of talent are needed for long-term engagements with a single client vs. a project team that is dismantled at the end of the project? Or are long-term engagements now a thing of the past in IT?
The team that you put in place for a client for a long-term engagement can be a bit different, and longer-erm engagements are definitely not a thing of the past. Many organizations today, particularly since recovering from the most recent recession, strategically maintain a healthy balance or ratio of full time to contract staff – say, for example, 75% employee-based staff and 25% contractor-based staff. Because these hiring ratios are deliberate or intended, meaning that the contractors could be on board for quite a while, the type of work that can be assigned to them can certainly be more strategically important to the company. Often when constructing a team that will be in place for the long term, more consideration to “fit” is given, since in many instances the client will ask permission to hire the contractor.
Are clients beginning to ask for both Big Data and Software Defined Networking (SDN) talent? In the November 2014 IEEE Computer, the editor suggests that SDN is the second wave of cloud computing. Given the connection between Big Data and cloud computing, a connection with SDN might be a logical future step. Do you agree?
Absolutely. Big Data tools like Hadoop require very robust network infrastructure components to perform well. Big Data architects, developers and administrators must be knowledgeable about the network requirements including deep buffering, non-blocking top-of-rack switches and aggregation switches that can handle the rack-to-rack traffic bursts that occur with MapReduce when you enter the shuffle-sort and reduce phases. They also have to be aware that the new generation of in-memory processing engines like Spark require even more memory and network bandwidth to keep the performance levels at acceptable levels. Software Defined Networks can assist in configuring network environments to more efficiently direct network traffic to and from Big Data components; and as such, network architects and administrators will, in the future, be more a part of the Big Data teams that develop, deploy and maintain Big Data applications. Big Data and SDN go hand in hand.
What are the top three issues that you encounter when enterprises say they want “To do Big Data, like Hadoop, man?”
The first issue is matching the enterprises’ business requirements appropriately. Not every business requirement requires a Big Data solution, and it doesn’t benefit the organization if they try to use Big Data tools to solve a non-Big Data problem. If the requirements do require a Big Data solution, then the next issue is how to pick a good pilot project or proof-of-concept project that shows the organization business value without being too large to complete in a short amount of time. Finally, when the organization chooses to deploy a Big Data application into production, we have to help implement the appropriate security levels in the Big Data environment to include authentication, encryption, AD/LDAP integration and the like. This is often a challenge with the newly-emerging Big Data tools.
How much interest are you seeing from clients in graph databases for Big Data applications?
We haven’t seen that much interest so far, but many use cases strive to analyze relationships between entities, so we believe that as Big Data graph tools mature, they will become more a part of the Big Data architecture.
Are you seeing a much-anticipated shift toward streaming data sources, or is that still in the distance for most shops?
Thus far, our customers have been dabbling with streaming data sources, but it has not been a make-or-break technique for most business requirements. As technology improves and becomes more reliable, that may change, especially in the financial industry.
Is managing a Big Data project materially different from other IT projects? For example, are the prerequisite infrastructure needs more difficult to acquire or perhaps to justify to client management?
Yes, managing a Big Data project is materially different, but probably not for the reasons that you think. The technology complexities can be handled, so this is not the issue. If your project management approach embraces an agile development methodology, you’re in a good place to begin with, because your business counterparts already “get it.” If not, then the most important prerequisite is to secure senior level business buy-in up front, including direct and active involvement. Two key items must be tackled early in the project life: “What is the business problem or opportunity that we are trying to solve?” and “If we focus on delivering business value, how will we measure it and know that we were ultimately successful?” Defining what business value looks like and how it will be measured are key ingredients to ultimate success when launching Big Data initiatives.
What issues do you anticipate around Big Data security or privacy that are different from previous decades in your lifetime as a firm?
The important part of this question is the ” … that are different … ” segment. The issues related to Big Data security and privacy remain the same, but perhaps the exposure that could result from a data breach is much larger today due to the massive amounts of information that could be accessible via a breach. When implementing Big Data solutions, many companies will opt for a cloud solution, whether private or public (multi-tenant), since the costs are typically much lower than if a company were to put a solution in place themselves. Whenever a company outsources business to a cloud service provider, they worry about data security and privacy; but truth be told, most of the cloud providers today are better at securing data than the companies themselves.
Greg Palmer, an Eliassen Group Big Data business partner, also participated in answering some of these questions. He is the owner of Scalinear, a Big Data firm headquartered in Chantilly, VA.