Uniting Data Quality and Data Integration
Recently Syncsort has added several companies to its portfolio. An astute observer may notice that the product offerings from these companies are complementary. We are not planning to make any changes to these products. Our customers will continue to use these products in their production systems and we will continue to focus on providing excellent support.
On the other hand, a tighter integration of some of these products will be beneficial to many customers. Case in point: joining key data integration and data quality functionality. That’s what we did in creating Trillium Quality for Big Data.
About our Data Quality Technology
Syncsort’s Trillium data quality software has been the leader in Data Quality for so many years. In fact, Gartner named Syncsort a leader in its Magic Quadrant for Data Quality Tools 2017 for the twelfth consecutive year – every year since Gartner started publishing it.
Why? Looking at what Gartner, and customers consider important is solid, stable core data quality functionality such as parsing, standardization and cleansing, and additional key capabilities like profiling, interactive visualization, matching, multi-domain support and business-driven workflow.
Trillium Quality allows users to cleanse and de-dupe customer records. Why is this so important? Data Scientists spend a significant amount of their time cleansing and de-duping before doing any analytics or applying machine learning algorithms on the data. Also, if you want to get a Customer 360 view, you need to de-dupe the customer records.
The cleansing and matching processes are set up as a Data Quality project in Trillium Control Center UI. Users can then run the project on a stand-alone server. Nowadays, processing millions of customer transactions on daily basis is a very common occurrence. In some cases, the project may run several hours and may not meet the end user SLAs.
About our Data Integration Technology
Syncsort’s DMX-h is a high performance ETL tool that allows users to develop their ETL data flow once and run it in a standalone server or in a Big Data platform like Hadoop. They can even use different distributed computing paradigms like MapReduce or Spark without making any changes to the data flow. The DMX-h Intelligent Execution (IX) engine dynamically comes up with an optimum execution plan based on the computing paradigm of choice on premise or in the cloud.
Typically, users develop an ETL data flow with the DMX-h UI as a DMX-h job. They verify that the job is set up correctly by running it on a small subset of data on a standalone server. Once it is debugged for correctness, they deploy it to run on distributed platforms on premise or in the cloud. Running the job on a cluster of tens or hundreds of compute nodes brings tremendous horizontal scaling.
Putting it all Together with Trillium Quality for Big Data
During last several months, Trillium and DMX-h engineers were collaborating to build a solution that would put Trillium Quality on steroids by allowing it to run under DMX-h IX. The fruit of their hard work is Trillium Quality for Big Data.
With Trillium Quality for Big Data, users design the Quality project in the Trillium Control Center. There is no change to that. They then choose to deploy the project to the Hadoop edge node. On the edge node, they run a shell script that automagically converts the deployed Trillium project to a DMX-h job and runs it using MapReduce or Spark framework at user’s will. They still have the option to run the project standalone on the edge node. The “Design Once, Run Anywhere” mantra of DMX-h is now available for Trillium Quality projects. By deferring the choice of the platform until run time, users can concentrate on the business logic in the project at design time.
The horizontal scaling of Trillium Quality projects is phenomenal. During our in-house testing, we noticed that projects that took hours to run stand-alone were finishing in minutes when run in the cluster. Already, a large European bank is testing this integrated solution for their transactional data.
Stay tuned for more exciting integrated product offerings from Syncsort that leverage our data integration and data quality technology in near future!
Check out our eBook on 4 ways to measure data quality.