Interview with Shanti Subramanyam of Orzota.com About Big Data as a Service
What insights did you take away from your time at Sun that most informs you at Orzota?
Sun was always at the forefront of new technology; in many cases coming up with ideas and technologies that the market was not yet ready for. On the other hand, they missed the boat on some important developments (e.g., Linux). I am mindful that as a bootstrapped company, we cannot afford to be way ahead of the market; at the same time, we need to demonstrate sufficient technological thought leadership to win new business while keeping up with the trends in the industry to ensure we don’t fall behind.
On your Big Data web page, the Orzota text hits the Syncsort ETL bulls-eye in the first sentence. But seriously, do prospective customers see the importance of ETL in exploiting Hadoop for Big Data applications? What aspects of ETL are most often overlooked?
ETL is THE best use case for Hadoop. Until a couple of years ago, I think this was quite evident. However, with the huge hype about data science and analytics, many customers are now confused and seem to think that Big Data means insights. It is never that easy. You need to crawl before you walk. The best thing we in the Big Data community can do is to get away from the hype and encourage enterprises to adopt Big Data with the most straight-forward use cases … and ETL is certainly the first one.
Another angle to this is that many web companies, including new startups, rapidly start accumulating data and soon realize that they need Hadoop. They throw all their data into Hadoop without giving ETL a thought, the prevailing thinking being that ETL is for enterprises, not for a cool internet company like ours! The danger in not thinking about ETL in advance is that all analytics you do could very well be wrong if your data is not clean. I wrote an article in Datanami to counter this thinking.
Can you address the importance of workflow management in the Orzota Big Data Management Platform? It’s often overlooked when shopping lists are constructed. Why is that?
Thanks for asking. This is an extremely important requirement for building Big Data applications. ETL requires workflows, and so do analytic apps. The reason it is overlooked is because of a lack of understanding of the complexity of putting big data apps into production. The only way to do this is to design and build workflows – something that the enterprise world understands, but is not stressed sufficiently by the big data players. At Orzota, we are plugging the holes in the Hadoop stack by ensuring that apps can be put into production easily.
What is the impact of streaming data applications, such as many anticipated from the Internet of Things, on the current skills in Big Data technical communities? Are they ready? Or are all the problems solely with technical concerns like infrastructure, bandwidth and security?
Good question. The valley moves at amazing speed – we are very good at inventing the next new technology and getting really excited about it. The likes of Twitter and Facebook either invent or adopt the technology, and we assume that it is ready for prime-time. Streaming/Real-time Big Data applications and IoT is now in this cycle. One thing to realize is that many enterprises don’t even have their first Big Data application in production. It is still quite difficult to find Hadoop admins. Even after a year of starting on Hadoop, many companies (technology companies included) are still struggling to streamline DevOps and tuning. While it’s great that technologies such as Spark and Storm are going to bring many more applications with near real-time requirements into the Big Data realm, it is going to take a couple of years to have people trained and enterprises adopt them. The good news is that going forward, things are going to speed up as no one is now questioning the value of Big Data or whether their company needs it!
You have a predictive analytics offering. What readiness factors do you look for when assessing whether a prospect can benefit?
First, the client needs to be a mature business with a sufficiently large number of customers. Ideally, they store as much as possible of the data pertaining to customer profiles, transactions and interactions including calls to customer support, emails and click streams.
Secondly, they must have a well-defined business problem they want to address. For example, understanding why some customers never return, how to personalize cross-sells, etc.
We can then go in and help the client. We usually start by building well-defined metrics to track. Just having the ability to track metrics can be an eye-opener for companies. Seeing the metrics and visualizing the trends is sufficient to spark discussion, and they intuitively know what actions to take. Adding the ability to predict certain behaviors and make recommendations then dramatically moves the needle and helps them achieve their business goals.
An Orzota case study discusses the scalable NoSQL you built to support ad hoc analytics and reporting. What issues were considered when selecting the NoSQL solution that was deployed?
This was a complex project, not just in terms of the technical architecture but because of the way the company was structured and the variety of groups and people who needed access to the data.
The primary consideration was of course the performance of the queries, especially with the scale of the data and the multiple streams of access while having the ability to ingest data from Hadoop. Then there were the security considerations – considering the sensitivity of data. Finally, the non-technical considerations like the stability, support, cost, etc., also played a role in the decision.
Have you seen much demand for hybrid premises/cloud solutions?
We have been asked about this a few times, but I think it is going to be a few more years before we will see this take off in a big way. The big challenge (in addition to regulations/privacy concerns) is the network performance and cost.
Many Big Data projects are green field, which may be part of the bloom in the Big Data garden. Are you seeing work with legacy applications as well?
Definitely. In fact, I would argue that many of the “hot” Big Data use cases such as fraud detection and customer churn prediction are alive and well using legacy data stores and technologies. What the new technologies bring is the ability to improve these applications using semi-structured and unstructured data while dramatically lowering their cost. If you look at the evolution of Hadoop, we can observe that the technologies being added to it are all taken from the legacy world – role-based security, auditing, archiving, metadata, governance: All of these already existed for legacy databases.
What expertise niche is Orzota developing that’s different from what you expected when you started up?
We started as a horizontal play in the Hadoop infrastructure space. We could help any customer who was trying to get on the Big Data bandwagon. But once the data is in Hadoop, clients wanted to analyze the data in myriad different ways. When predictive analytics became the next big thing, we found ourselves moving up the stack to implement more complex business use cases. Today, we provide technology-enabled services, using our platforms and tools to speed up Big Data projects.
Read more about the Syncsort ETL bulls-eye.