At this stage in its adoption, Cloud has moved from a being tech buzzword to an established practice. But that doesn’t mean it’s old news. We’ve rounded up a number of our blog posts about cloud computing, including some recent Big Data advances and its impact on the mainframe.
Big Data in the Cloud
As part of our 2017 Strata + Hadoop World recap, we noted there were a number of sessions that showcased advances in the usage of cloud computing for Big Data.
Apache Hadoop was born as an on-premise platform, and most of the use cases for early commercial Hadoop vendors – like Cloudera, Hortonworks and MapR – focused on on-premise implementations of the open source data analytics platform. Our post 5 Reasons to Run Hadoop in the Cloud reviews why and how to run Hadoop in the cloud in order to supercharge your data analytics operation.
Spark has also drawn attention for its Cloud integration. Our post It’s an Electrical Storm: Spark Goes Cloud (and How You Can Capitalize on It) explains that Spark doesn’t go in the cloud just because it’s trendy. There are some legit reasons it belongs there. For one, the matter of getting all that data into Spark is no trivial task.
We interviewed a number of experts who had a lot to say on Big Data in the cloud. First, we spoke with Doug Cutting, Cloudera’s Chief Architect and Hadoop Co-Founder. He saw the start of a trend of people moving to the public cloud in a big way. “There’s a lot of people staying on their on-premises data centers, but more and more we’re seeing people move to public cloud. We’re trying to see how we can make this open source Big Data ecosystem really work well in the cloud, too.”
In our interview with Cloudera’s Sean Anderson, he gave us an update on Spark 2.o, including a look at its Cloud support. He states, “For us, really understanding how we guide our customers on deploying in the Cloud is great.” Cloudera also had recently announced S3 integration for Apache Spark which allows for running Spark jobs on data that already lives in S3.
And finally, we published multi-part interview with Databricks’ Spark Community Evangelist Jules Damji, he touched on the overwhelming move of a lot of Big Data processing to the Cloud, the advantages of doing big data processing and data analysis work on the Cloud, and some specifics about the Databricks cloud.
In Part 3 of Damji’s interview, he states, “As more and more data is going into the Cloud, people are more and more worried about sensitive data, and how do you protect that? So, security comes as part of [Databricks’] augmented offering.”
In Part 4 of that conversation, Damji sums up his philosophy around cloud computing in terms of a commodity, in the same way one might think of electricity. In his analogy below Databricks is the refrigerator:
Edison’s partner said that, “The people who are going to make money are not only the power utility companies that are gonna provide the power, that’s going to be a commodity. The people who are gonna make money are the people who are gonna build appliances on top of that.” Refrigerators, lamps, toasters, TVs, all those appliance manufacturers are making money. And they depend on the grid being already there.
The benefits of storing and processing data in the Cloud are undeniable. Organizations of all sizes enjoy the ability to rapidly deploy, scale up and down quickly, and align costs to their specific big data application needs. Cloud ETL solutions, like those from Syncsort, can help you access and integrate the data that fuels your Big Data applications.
A Place for Mainframes in Cloud Computing
So, has cloud computing killed mainframes? You might think so. In fact, mainframes remain supremely important, even in the age of the cloud. Our post Why Mainframes Matter in the Age of the Cloud discusses why.
As noted in our post Finding Room for the Mainframe in Your Cloud Architecture, the mainframe is still an excellent tool for leveraging the cloud, and the modern mainframe is an ideal solution for mobile technologies. As many as 70 to 80 percent (depending on whom you ask) of all transactions conducted today are still done by the mainframe.
Expert Trevor Eddolls agrees. During our interview, he proposed a hybrid infrastructure. “You can make it the center of your Cloud computing environment. Most your end users won’t even know there’s a mainframe inside the Cloud.”
Avoiding Stormy Weather
While there are many benefits to cloud computing, you can’t always count on blue skies.
In our interview with Robert Corace of SoftServe, he discussed security being a chief concern and top challenge for his clients, but worth the risk:
I wouldn’t describe these as purely challenges, though, as these companies also stand to gain a lot. Digital asset management, Cloud computing, mobile technologies, and the Internet of Things (IoT) approached as a part of digital transformation efforts can bring a lot of benefits to consumer facing operations, retail, the finance and banking sector, and many others.
With cloud computing’s cost savings, better performance and simplified operations, you may be thinking What’s not to like? Answer: Data loss and downtime that can plague migrations, along with the risk that comes with having all your eggs in one virtual basket. In order to truly make cloud services an asset and not a liability, you have to protect virtual workloads running in the cloud. Otherwise, you’re at extreme risk for losing time, money and data in an outage or emergency. What’s more, if you’re backing up virtual and Cloud servers to physical servers, you’ve failed to remove the risks inherent to physical servers.
Whether you’ve already adopted cloud and virtual servers, or you’re planning to in the near future, keep in mind that in order to have a solid business continuity plan, you have to have an HA/DR solution that is created specifically for Cloud and virtual servers.
Keeping Your Cloud in Check
One of the benefits of cloud storage is the idea of endless space. Some assume capacity management is no longer necessary because if you hit your limit, you just add in more capacity from this infinitely elastic Cloud. But even with physical limitations removed, you’re likely still facing a financial ceiling.
Cloud capacity management can help you balance your technical and financial needs to ensure you’re getting what you need and not paying more than you should.
Check out the TDWI Checklist Report: Cloud Data-Quality Tool Considerations to review seven key considerations for organizations that are trying to decide what, if any, role the cloud should play in their data quality tool strategy.