Data Integration

Last week,  I attended two days of sessions at GigaOM’s Structure:Data Conference  in New York City where over 700 attendees came together to discuss the business and industry-transformative nature of Big Data, and the latest technologies and approaches to best manage it all.

What struck me this year is that the conversation has evolved from Big Data being an infrastructure-only issue to now the realization that the Big Data stack requires contribution from everyone from the bottom layer of the infrastructure up through the top application layer.

The following key themes emerged from the onsite discussions and will be the focus as the community continues to develop the Big Data stack:

1) It’s all about high performance computing and speeding up analytics as data volumes grow exponentially. The pain points for unstructured versus structured data are different. While unstructured data requires better visualization of the data, structured data requires more cleansing making filtering and grouping much more critical. One of the speakers referenced a quote from Clay Shirky that, “Information overload is not the problem. It’s filter failure.”

2) The line between personal and business behavior is blurring as analytics moves out of the IT realm and into the hands of business users, and as a result there is an expectation that delivery of data can be more easily consumed, such as through visualization capabilities and collaboration.

3) Real-time decision making through predictive analytics and machine learning is becoming essential with sensor data, digital exhaust and need to get ‘insight’ to consumer behavior.

As such, there’s a realization that the Big Data market is fragmented, and there is plenty of opportunity to contribute to building the Big Data stack. Software packages and tools need to be built on top of Hadoop for example to increase enterprise adoption. Currently most of the available enterprise software is proprietary. Offering applications layered on top of Hadoop will spur the Big Data market leading to more open source contributions and additional opportunities for startups.

Syncsort has a lot to offer in the areas of performance, data integration, and processing – all critical components to the Big Data stack. We can deliver and run ETL over Hadoop without requiring a brand new development team and skill set. One of the speakers suggested that businesses should consider adopting Hadoop only if they are willing to dedicate a separate team. Syncsort’s offering eliminates this requirement for the enterprise. We can also efficiently move the data in and out of Hadoop which as John Webster points out in his CNET post continues to be an issue.

To reach the holy grail of Big Data management – the focus needs to be on building a top to bottom Big Data stack which will require different segments of the market to come together.

{ 0 comments }

I recently came across a blog post from Susan Hall over at IT Business Edge on the “Seven Keys to Becoming a Data Integration Expert.” Naturally, the headline caught my attention and I soon learned that it was based on a recent post from David Linthicum on “Obtaining Mad Data Integration Skills.”

As I read through both of these posts, I started thinking. Instead of the order that the seven keys had been originally listed, what if I tried to rank them by how much time and money these things cost organizations during an average month. Here is what I came up with:

  1. Performance
  2. Data governance (this could arguably be number 1, but most organizations I have seen aren’t really doing wholesale data governance)
  3. Security
  4. Rules and routine
  5. Database concepts
  6. Interfaces to data
  7. Data mediation and transformation

David’s performance criteria states that “…the ability to define how a data integration solution will perform over time.  This is very important.”  I couldn’t agree more!  Building performance and scalability into a DI approach is not only important today, but also for the Big Data requirements of the future. David goes on to say that many DI approaches “become useless after several years.”

We see this every day with our customers and partners.  When they’ve hit the wall with their current approach, they often try one or more of the following:

  • Add hardware (CPU, memory) – this is expensive and adds to the software cost, and usually does not scale linearly
  • Fine tune the approach/tool – this requires very senior IT staff and/or highly-skilled (read: expensive) consultants from the vendor
  • Rip out the logic and push it into the database – now you have an ELT approach pushing the cost and complexity into hundreds of lines of SQL and PL/SQL

Syncsort helps customers solve their performance and scalability issues without needing to resort to stop-gap measures that accelerate costs.

Thanks to Susan and David for their posts and the inspiration they provided me to write this one.  I look forward to following the discussion on their blogs and reading what they write about next. In the meantime, feel free to leave a comment or challenge me if you want to debate the way I’ve ranked the list above.

{ 2 comments }

Recently, I had the opportunity to attend and present at a conference in Madrid commemorating the 25th anniversary of NessPro, a key Syncsort partner in the EMEA region. More than 100 IT professionals attended the event at the core of Madrid’s financial district, La Torre de Cristal (The Tower of Glass) in the neighborhood of La Castellana. The theme of the event was increasing business profitability while reducing costs.

It’s hard to imagine a more appropriate topic for an event being held in a country struggling with growing debt and high La Torre de Cristalunemployment. Indeed, the financial crisis has underscored the need to cut costs and improve productivity. However, there was a great sense of optimism and excitement on the 50th floor of La Torre de Cristal. As another presenter mentioned, the word crisis in Chinese is the combination of two characters: danger and opportunity.

There is definitely an opportunity today for organizations to optimize business processes. Information holds the keys to reduce inventory costs, adapt in real-time to customer demands, streamline operations, uncover new business opportunities, improve productivity, and more. That’s why organizations continue to invest significantly in IT solutions that enable them to uncover a vast array of opportunities.

This is especially true in the competitive banking industry. For instance, RSI (Rural Servicios Informaticos), a joint NessPro and Syncsort customer, presented at the event on how the company is using Syncsort technology to achieve up to 9x savings by gradually migrating most of their applications off the mainframe to open systems. Today, RSI supports Grupo Caja Rural, one of the top 5 banks in Spain with more than 7 million customers, running 75 percent of their batch loads on the open systems. Syncsort’s technology is delivering up to 60 percent faster processing times and is enabling significant savings – helping RSI remain competitive and successful during challenging times.

In the end, opportunity is not just about money. While reducing costs is an opportunity to increase profitability and competitiveness, I was very pleased to see that for my Spanish colleagues, there’s also a strong sense of corporate responsibility. There was a lot of discussion about reducing energy consumption and minimizing waste to preserve the environment, and ultimately bring sustainable economic growth and leadership back to Spain. There are green initiatives in many areas such as reducing paper use, printer toner, PC power consumption, and more.

Special thanks to all my colleagues at NessPro Spain for a very exciting event and equally productive week with the characteristic Spanish hospitality. And of course, many thanks to Alberto, who not only worked closely with me to ensure a successful presentation, but also volunteered to take me through a culinary “tapas” tour at the popular Mercado de San Miguel.  For me, it was the perfect close to a great day of talking about technology and the opportunities ahead.

I wish nothing but the best for Spain and its people.  I have no doubt they will emerge stronger from the challenges they have been facing. ¡Hala, España!

{ 0 comments }

Earlier this week as millions of people in the United States went to the polls to cast their votes on Super Tuesday, I was busy getting ready to present with Forrester’s Sebastian Selhorst and Noel Yuhanna on, “Big Data Integration: Achieving Positive ROI in the Era of Exponential Data Growth.”

It seems like the buzz about Big Data is everywhere lately. By now, you’ve likely read all about how Big Data is about the 3 V’s of volume, velocity and variety. However, nothing brings a topic like Big Data to life for me like a real-world example. Think for a minute about Super Tuesday. Not so long ago, people would go out and vote during the day. Later that night they might turn on the television at home for a hint at the early returns. With a bit of luck, the results would be published (and hopefully accurately!) in time to read about it over breakfast in the morning newspaper.

Taking a closer look at this scenario, the volume of information was relatively low in the form of a few pages in the newspaper. Variety was also low with information primarily coming from the newspaper, radio and television (and maybe that annoying neighbor who takes pride in being the first to know and share everything!). Velocity was also low in that you had to wait a good amount of time to get any meaningful data and results.

Fast forward to present day. While millions of people still go out to vote, they access the latest information and trends in real time on Facebook, Twitter, and maybe an online news source like www.nytimes.com or www.cnn.com. In many cases, they are accessing this data from their iPhone or Blackberry, or maybe even their iPad.

In addition to consuming information, they are also creating it. They tweet, post on Facebook, and share their views and preferences via comments on blogs and pushing the ‘like’ button. They are able to track results almost in real time, and chances are they already know which candidates have won (or are very likely to win) which states before turning on the television at night to watch their favorite reality program or sporting event (I won’t even get into how the breakdown of statistics on ESPN contributes to the Big Data phenomenon!).

Clearly, volume is so vast that you could not fit all the articles, comments, photos, podcasts, etc. into a year’s worth of newspapers from the good old days! Variety is high with data coming from a myriad of sources including social media, mobile devices, blogs, newspapers, and more. Velocity is also high, with data coming in every second. That is exactly how Big Data is shaping our lives!

For politicians, the ability to leverage all of this data and the different channels of information can play a decisive factor in gaining a competitive advantage in an election. Of course, not all of us are fortunate (or unfortunate depending on your view!) to be politicians. Whether politics or business, it seems that an organization’s survival today is highly dependent on the ability (or lack thereof) to efficiently and cost-effectively leverage the enormous amount of information being created on a daily basis.

While it is hard to deny the impact that Big Data is having on all aspects of our daily lives, many organizations don’t think of themselves as ‘Big Data’ companies. As a result, I often get the question, “Is Big Data relevant to me?” when speaking to customers and prospects about their data integration challenges. This is an easy one and I almost always respond, “Yes, it is!” The reality is that most organizations are spending more time and resources to maintain their data integration SLAs. They are in need of tools that are easier to use and increase staff productivity. They need these same tools to be able to help them process more data in less time and with fewer resources. Ultimately, they need a more cost-effective solution that meets their expectations!

If these ideas resonate with you, I’d strong encourage you to check out the replay of our ‘Big Data’ Tuesday webinar with Forrester. Additionally, you can download a full copy of Forrester’s report on, “The Total Economic Impact of Syncsort DMExpress,” and play with our online calculator to get a better understanding of the positive impact that DMExpress can have on your business.  If you have any questions or want someone from Syncsort to walk you through the online calculator, feel free to leave a comment and I’ll be happy to help connect you.

{ 0 comments }