Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Engineering Machine Learning – 5 Short Webcasts to Help Create a Data Pipeline

Over the past month, Syncsort has released a series of short webcasts focused on Engineering Machine Learning. Making the hurdle from designing a machine learning model to putting it into production, is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering a robust production data pipeline has its own set of tough problems to solve. See what our experts have to say in these 15-minute webcasts.


1. Pulling in Data from Multiple Sources

The first step is consolidating data from sources all over the enterprise. The data machine learning models come from a wide variety of physical locations, technical platforms and storage formats. Watch 15-minute webcast



2. Cleansing Data at Scale

Once you’ve got data pulled in from multiple sources, you need to assess the mess. In nearly every data set, there will be flaws. Missing data, misspelled data, misfielded data, dozens of common problems that need to be repaired before the data is ready to use. Watch 15-minute webcast




3. Finding and Matching Duplicates at Scale

When you pull in data from different sources across the enterprise, chances are that you have information about the same person, company, product, or other entity in multiple records. To get a full view of the data regarding that entity, you must find all the records that relate and combine them. Watch 15-minute webcast



4. Tracking Data Lineage from the Source

Once you’ve found and matched duplicates to resolve entities, the next step is to track the lineage of data as it moves from source to final analysis – which is required by several regulations and by the need to verify the source of final decisions or recommendations. Watch 15-minute webcast



5. Streaming New Data as It Changes

Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models. Watch 15-minute webcast


Looking for more on Engineering Machine Learning? Download our eBook today on Five Data Engineering Requirements for Enabling Machine Learning.

Related Posts