Expert Interview with Kole Hicks
Some GoGrid customers have hybrid cloud/premises needs. What ETL and related requirements do they have?
Our customers’ ETL requirements vary widely depending on the use case and industry vertical. Many of our largest customers in the advertising vertical are collecting data from various transaction and social endpoints, pushing that data to a NoSQL hub like HBase, MongoDB, Riak, or Cassandra performing some type of MapReduce function, and then pushing to a cache to serve targeted advertising to customers.
In the healthcare vertical, we see requirements related to loading data from wearable devices or monitoring solutions, where the data may start out in the cloud and then move to a dedicated cluster to meet HIPAA compliance requirements. In most cases, there’s a need to quickly scale horizontally, provide multi-DC failover, meet some type of PCI or HIPAA requirement, accommodate a polyglot of DB solutions, and also network various applications together to solve for the specific use case and extract, transform, and load the data as needed.
Which of these three present the biggest obstacle to new business: (1) prospect knowledge of big data tools and practices; (2) prospect wariness about cloud solutions; (3) prospect difficulties in converting from legacy architectures and data repositories.
1 and 3 present the biggest challenges, depending on the size of the company and the age of the project/product they’re working on. Smaller developer-driven projects in many cases are starting fresh and are trying to evaluate which solutions are available to solve for their specific use cases.
Their questions are typically more geared toward trying to identify whether or not they should go with something like an HBase/Hadoop or MongoDB/Hadoop solution for NoSQL and analytics, and they have questions about using technology like Riak for caching or MemSQL. The answer is very specific to what they’re trying to do.
Larger enterprise customers are typically trying to start by leveraging big data tools to access information stored in legacy data structures throughout the organization. In these scenarios, connecting to a legacy structure and deciding what to keep and what to EOL becomes a very real hurdle to overcome.
What actions, if any, can your team take to mitigate these obstacles to adoption? Are the obstacles unique to your niche, or do they affect hosting enterprise generally?
We make it easy to evaluate and run a variety of options. We’ve found that our customers are typically running three to five open data solutions at the same time. For this reason, we’ve created our 1-Button Deploy™ technology to help them easily get up and running with a variety of NoSQL clusters. We make Riak, Cassandra, HBase, and MongoDB clusters available at the push of a button.
Additionally we’ve purpose-built our infrastructure to support the three most typical types of database structures. We have SSD and Block for large RDBMS installs; Raw Disk machines designed for HBase with dedicated physical disks for each volume; orchestrated NoSQL cluster deployments; and even infrastructure that is purpose-built to run caching solutions like memcache or other in-memory databases like MemSQL or Clustrix.
What impact might the Internet of Things have upon your business?
The Internet of Things means very interesting use cases are being solved across industries. Everything we do generates data. We need to continue to focus on providing our customers with the options they need to capture, store, analyze, and take action on that data. Our storage offerings need to continue to evolve rapidly to meet these needs, and we need to continue to virtualize and automate everything to ensure our customers can continue to be nimble.
The Internet of Things means massive growth in data across the board. The ability to handle that data and enable customers to take action on it as fast as possible will be key to the success of service providers. Our orchestration and management tools allow customers to meet these needs.
While Akamai dominates the Content Delivery Network segment, GoGrid also has an offering http://bit.ly/1fxhzMz. You mention several use cases on your web site. Which of these capabilities is attracting the most interest, and what does it take to compete effectively against Akamai?
GoGrid partners with Edgecast to offer CDN services to our customers. In large part, the services offered by Akamai and Edgecast are very similar. CDN customers are price sensitive and typically want to avoid lock-in from Akamai. They generally choose GoGrid because they’re interested in our elastic cloud infrastructure and NoSQL solutions along with our CDN services. They want a unified experience.
In posts such as this one on your web site, you aim to allay security fears of prospective or current customers. What is their primary worry: PCI compliance, data leakage, data corruption, industrial espionage, confidentiality breach (anticipated by your HIPAA solution bundle), malware from other tenants, or something altogether different?
It really varies depending on the industry vertical and the customer-facing application they’re managing. For our healthcare customers, HIPAA is always the driver. They need to be compliant and ensure their applications are architected to be as highly available (HA) as possible. Advertising customers and eCommerce customers, in comparison, are much more concerned with PCI compliance.
What features has GoGrid has been able to stand up that allow you to compete effectively versus AWS or Rackspace?
We have a different approach than both Rackspace and AWS. AWS is moving toward building proprietary PaaS offerings, whereas we enable our customers by providing an orchestrated deployment for the polyglot of open data solutions that are already available. In essence, we want our customers to run the best solutions possible.
With GoGrid, they’re able to orchestrate and manage those solutions across both cloud and dedicated infrastructure and across multiple data centers. They select the best solutions for their use case using the best open data solutions available to them.
What is your take on Amazon’s recent Kinesis announcement? If customers began routinely demanding services such as this, what changes would you expect at GoGrid?
Kinesis is a PaaS offering and although it’s neat, it locks a customer into AWS. Our customers want to run industry-leading solutions like Hadoop, HBase, MongoDB, Cassandra, and Riak across multiple environments. They may run part of their operation in the GoGrid cloud, part on dedicated infrastructure, and part in their own DC.
It’s about managing and deploying these solutions anywhere. Kinesis doesn’t allow this approach. Our customers are asking us to create toolsets that let them manage a polyglot of solutions, and that’s what we are building.
Do you have in-house requirements for Big Data, such as for tuning, performance measurement and billing? If so, are you using Big Data tools for these?
We use a variety of solutions to meet the needs of our organization. At the moment, we have several NoSQL data hubs combined with RDBMS solutions and caching solutions to handle the operation. The specific solution is very dependent on the use case. For example, we use our CDN and caching solutions to speed up our web front end, and store log and transaction data that gets collected in various NoSQL hubs as needed.
What big data trends, especially for Open Data Services, are you watching most keenly?
We’re watching open source NoSQL tool sets to see what technologies are growing rapidly and gaining traction within our customer base. Adoption trends of these technologies by enterprise customers and innovations that relate to Hadoop are also things we’re watching closely.
Do you or your team have a few favorite software tools that you feel are indispensable?
a. Ansible is an amazing tool for orchestrating and managing infrastructure.
b. I’m currently working on our backup and restore products, and Backula is simply amazing in what it can do.
c. And of course I don’t know if any operation can live without Nagios.
What features in Big Data software do you feel are lacking?
Big Data is a broad term, and I think there are challenges to be solved at various points. Standard APIs among providers would be very helpful. Creating standards would make it incredibly easy to solve most ETL challenges by making the application platform easier to manage. I realize that’s asking for quite a lot, but I think it’s entirely possible – and much closer than we realize.