Interview with Donato Diorio of RingLead on Big Data quality
RingLead is a firm that offers a suite of products to cleanse, protect and enhance company and contact information. The RingLead website suggests that data quality is a particular issue for CRM. Is there a reason for the focus in this area?
Today, many businesses hold their data in their CRM. CRM could be just as easily replaced with the word “database,” “marketing automation system,” data silo,” – you pick the term. CRM is the repository, and that’s where RingLead comes in. We work with large sets of data to keep it clean and high quality.
RingLead has developed integrations with several cloud products, including Salesforce, Marketo and others. What technical challenges had to be overcome to develop these integrations?
Every system has its own language and ways that you must integrate with them. For example, to write an application that is native inside of Salesforce, you must understand Apex programming. For Microsoft, you are writing in .NET languages. If you’re connecting with Marketo, you’re dealing with APIs or Webhooks. The challenge is making sure that you’ve got someone that is talented and knows the space. When we started working in the Salesforce space, we enlisted the help of a Salesforce specialist, Michael Farrington. In-house experts are necessary in each of these systems in order to help unlock those challenges. When you have all of these different systems, how do you best develop for it? If you develop with the end in mind that you’re going to be integrating with multiple systems. You design differently from the start. For example, our products integrate with Salesforce, Marketo, Excel sheet, an ODBC database, and more.
What additional features should the integration partners “unlock” to improve data quality?
You have to take an assessment-first approach. One of the challenges in data quality is the iterative approach. Tell me about your data. What happens here? What happens there? You go back and forth, and pretty soon you’ve got a lot of time wasted. An assessment-first approach is going to unlock what the issues are so you can dive in and solve each of the data quality issues.
When writing about Big Data, Volume – Velocity – Variety are the three main V’s. Data Quality usually brings up the rear under “Veracity,” “Validity” or “Value.” What trends are you seeing around Big Data Quality?
Big data is not about the amount of data you have; it’s about what you do with it. The vision of possibility is exciting. There are opportunities for big data in every area unimaginable – in what people eat, what they do, how they sell, even what their heartbeat is. The opportunity is in the analysis. Big data is fascinating, and there’s lots of opportunity.
Can Big Data analytics platforms be used to perform internal studies of data quality?
The assessment-based approach enables you to easily determine the problems you’re facing, come up with suggestions and move on. It’s not an iterative approach that sucks time away from everybody.
Is there one single thing that data stewards could do to improve data quality, or is it more a matter of developing a group of initiatives including training?
Take a single approach and understand the core issues of your data, break those down into components, and start tackling them in the right order. When I engage with a company as I am brought into the big data issues, I need to involve sales, marketing and IT. You need group buy-in and change management.
How many of the problems typically encountered around data quality can be improved through automation alone?
To make positive change, you need a combination of people and technology. Find the right mix of automation and human interaction. Humans drive it and define it, while technology automates it. People are very necessary in the automation process. For example, don’t automate the analysis of your analytics. People must do that. As Bill Gates says, “The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.” Automate what you have perfected. Leave the rest to the people, and figure out that right balance between the two.
What role in the enterprises you service is most often assigned responsibility for data quality?
It really depends. I sometimes see marketing operations or sales operations, but it’s typically about the department that owns the technology side. Also, it depends on who has the budget, of course. It’s going to differ in each organization, but is often a matter of determining who most feels the pain. Mark Cuban once said, “Don’t take advice from anybody that doesn’t have to live with the consequences of the decision.” This applies to data quality in any organization. If you’ve got to live with the consequences, you’re going to get buy-in; you’re going to alleviate that pain.
Should we expect that Big Data quality problems will scale linearly? Or will Big Data promote geometric effects on data quality?
Data quality impacts all aspects of the organization. There are hundreds of impacts of bad data that fall across all departments from the top down and the bottom up. The best way to tackle this is to work together as a team, no matter which department is more impacted, to get the necessary buy-in and implementation to fix the data issue across the entire organization.
If ETL tools are part of your data quality pipeline, check out the offerings at Syncsort.com such as Ironcluster Hadoop ETL for Amazon EMR, which can connect to virtually any data source including RDBMS, mainframe, HDFS, Salesforce.com, Redshift and S3.