Engineering Machine Learning – 5 Short Webcasts to Help Create a Data Pipeline
Over the past month, Syncsort has released a series of short webcasts focused on Engineering Machine Learning. Making the hurdle from designing a machine learning model to putting it into production, is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering a robust production data pipeline has its own set of tough problems to solve. See what our experts have to say in these 15-minute webcasts.
1. Pulling in Data from Multiple Sources
The first step is consolidating data from sources all over the enterprise. The data machine learning models come from a wide variety of physical locations, technical platforms and storage formats. Watch 15-minute webcast
2. Cleansing Data at Scale
Once you’ve got data pulled in from multiple sources, you need to assess the mess. In nearly every data set, there will be flaws. Missing data, misspelled data, misfielded data, dozens of common problems that need to be repaired before the data is ready to use. Watch 15-minute webcast
3. Finding and Matching Duplicates at Scale
When you pull in data from different sources across the enterprise, chances are that you have information about the same person, company, product, or other entity in multiple records. To get a full view of the data regarding that entity, you must find all the records that relate and combine them. Watch 15-minute webcast
4. Tracking Data Lineage from the Source
Once you’ve found and matched duplicates to resolve entities, the next step is to track the lineage of data as it moves from source to final analysis – which is required by several regulations and by the need to verify the source of final decisions or recommendations. Watch 15-minute webcast
5. Streaming New Data as It Changes
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models. Watch 15-minute webcast
Looking for more on Engineering Machine Learning? Download our eBook today on Five Data Engineering Requirements for Enabling Machine Learning.