Jorge A. Lopez

(Or 5 things anyone playing with elephants should know)

A couple of weeks ago, during the Strata Conference in Santa Clara, Syncsort announced a significant milestone in our quest to make Hadoop a more mature environment for the enterprise. As many of you know, we’ve been working closely with the Hadoop community for some time now, but you may wonder… How did all started?

Elephant CrossingWell the truth is that some of our customers were fascinated by the enormous potential of Hadoop since the very beginning. However, walking the uncharted path is never easy. Therefore, we worked with many of them to identify the most common barriers for wider Hadoop adoption within the Enterprise. As early adopters and loyal Syncsort fans, our customers also helped us find many areas of opportunity to solve these issues (hint: stay tuned for more).

Over the past few weeks, we’ve seen more and more organizations calling out some of these obstacles as they get their hands dirty with Hadoop, thus we thought it would be a good idea to document and share our findings.

So without further adieu, I give you, hot off the press, our new eBook: 5 Pitfalls to Avoid with Hadoop. Now you cannot say you didn’t know!

P.S. Anything you want to add to this list? Please, let us know.

{ 0 comments }

For several years now I’ve been lucky enough to attend and present at MicroStrategy’s biggest user conference, MicroStrategy World, where thousands of professionals from many industries and places around the world meet together for a week to talk about the latest trends in technology, and more specifically, business intelligence. (In the spirit of full disclosure, I’m a former MicroStrategist).

This year, the meeting place was The Wynn in Las Vegas. As always, it was great to catch up with friends, colleagues, customers and partners. Moreover, it was equally exciting to see how business intelligence continues to reinvent itself. This year it looks more energized than ever thanks to advancements in mobile, social, cloud and Hadoop.

I’m sure there will be countless blogs and commentary about conference happenings. Therefore, I want to provide what I hope are different and valuable “takeaways” from the conference ─ a view through the data integration looking glass. So here we go:

1. Mobile and Social Intelligence are key drivers for Big Data. Mobile and social media are creating unprecedented amounts of information. Every Facebook check-in, every like, every comment on social media, provides valuable information about consumer preferences, sentiment, habits, networks, etc. Organizations who can leverage this data will definitely have an edge over the competition.

2. Transforming data is the key to the fourth “V” in Big Data. Transformations – the “T” in ETL – are still one of the most critical challenges organizations face today as they try to leverage Big Data. With increasing volume, velocity and variety of data, what will become even more important is finding ways to capitalize on the elusive “V” ─ value. Similarly, organizations capable of transforming more data in less time, with fewer resources, will be able to answer more “big questions” to provide better products and services to their customers.

3. The elephant in the room is Hadoop. Hadoop has quickly emerged as the framework of choice for Big Data processing and analytics. As such, it is playing a key role in making data processing affordable and disrupting the status quo. During his presentation at “World” Amr Awadallah, CTO and co-founder of Cloudera (a Syncsort partner) talked about how companies are offloading the “T” from expensive proprietary databases to Hadoop. Such a move can shift the economies of scale from as much as $100K/TB to as little as $1K/TB. As organizations implement Hadoop initiatives as a means to scale and reduce costs, they will need technologies to help them unlock Hadoop’s potential.

4. Don’t blame the messenger. In many cases, BI performance and data freshness are a data integration problem. Unfortunately, users often blame the tool that presents the information, in this case, the BI tool. However, more people are starting to realize that behind every successful BI or data warehousing project, there’s a strong ETL foundation. This is especially important when it comes to keeping BI data fresh. Therefore, it is critical to build a high-performance, scalable ETL environment that can seamlessly grow to suit the future needs of your organization.

5. Big Data requires new approaches. During a presentation by Netflix, one attendee asked why Netflix wasn’t using an enterprise data warehouse for Big Data (one that is known to be very scalable but also expensive) The answer from the presenter was simple, but deadly. The so-called data warehouse would never be able to reach the levels of scalability Netflix required. My take? Not all organizations manage petabytes of data as Netflix, or comScore (another Syncsort customer) do. However, they can still benefit from Big Data architectures. As organizations evolve their data processing environments, it’s important to adopt smarter approaches to data integration. A smart approach is one that will scale with the requirements of the business and will deliver results for fewer dollars and with fewer resources. This is exactly what a leading healthcare organization (another Syncsort customer) presented at “World” this year. As they migrated from their legacy ETL tool, they gained faster performance, better standards and best practices, faster deployment times, and enhanced scalability for future growth.

2013 is indeed looking like the year of Big Data, and MicroStrategy World provided more proof of how organizations are quickly embracing the “new normal.”

{ 0 comments }

I’d like to share with you several short videos (no longer than a minute each) with observations from me and my colleagues who attended MicroStrategy World. The videos are short, unedited and provide an uncensored look into what happened at the show. And yes, there’s one of a partner juggling our branded Syncsort oranges, reflecting our theme of keeping Big Data fresh! ─ his juggling is not bad at all. Enjoy and let us know if you have suggestions for what you’d like to see at future shows.

{ 0 comments }

‘Big Data’ was probably one of the most used and abused terms in the IT industry throughout 2012. Conferences, publications, vendors, and analysts alike talked tirelessly about the opportunities and challenges created by Big Data. However, what is really amazing is the speed at which organizations are trying to learn and assimilate radically new architectures to process their data. For instance, recent estimates from IDC and GigaOM predict the Big Data market will be $26B or more by 2016. It seems like 2012 was all about experimentation and setting expectations. Now, in 2013, it’s time to walk the talk! So, what can organizations expect as they embark on their Big Data journey? Well, for starters, it’s important to recognize some of the key challenges they will face.

The Big Data skills gap. Hadoop has emerged as the de-facto platform to process Big Data. However, as my colleague Steve Totman pointed out in a recent ITWorks blog, technical skills in Hadoop, MapReduce and all things Big Data are becoming more and more expensive and difficult to find. Therefore, it is critical for organizations to find tools that can leverage skills that already exist within their organizations –writing Java, designing data processing flows with a GUI – to take advantage of this highly scalable framework.

Weed out the noise. Earlier last year, Gartner pointed out the relationship between noise and Big Data. I couldn’t agree more. As companies start to collect, store and process Big Data, they need to be very careful to filter out the bad data. As noise grows, the value of Big Data goes down exponentially. Therefore, organizations will need the right tools to not only connect to all relevant sources of data, but also pre-process and cleanse before they load it to their data processing frameworks, which will most likely be a Hadoop environment.

It’s the economy, stupid. Ok, ok, I knew that line would catch some attention, but it’s actually true. What we’re seeing is a shift from big, heavy architectures that demand exponential costs just to keep up – a.k.a. scale to meet the demands for more data – to seemingly low cost, highly scalable approaches like Hadoop. While Hadoop can scale much more cost-effectively by adding commodity hardware (nodes), organizations will hit a wall at some point. Think about maintenance, cooling, power, and even real-state costs. As Hadoop implementations grow, the need for tools that can maximize the performance of each node will become a critical factor or success.

2013 can be the year that Big Data technology gets traction and this will only ramp up Hadoop adoption. Organizations looking to reap the benefits of Big Data, have to be smart about their Big Data strategy, not only as it pertains to Hadoop – yes Hadoop is not the holy grail for everything – but as they move through a path that should eventually lead them to answer the big questions, or isn’t that what Big Data is all about?

{ 0 comments }