Expert Interview (Part 1): Kenny Scott on the Challenges of Data Management
At the Collibra Data Citizens event in May of this year, Paige Roberts had a chance to speak with Kenny Scott, a Data Management Consultant. In part one of this two part series, Roberts and Scott speak about some of the challenges that come along with being a data management consultant.
Roberts: Let’s start off with an introduction of yourself for our readers.
Scott: My name is Kenny Scott. I started in banking 29 years ago, in the Trustee department providing a regulatory service to large Investment companies. After a few years I moved to London, returning to Edinburgh three years later to take up a role in a newly formed Business Systems Team, working with business objects, database creation, and forming the bridge between the business and Technology teams. There was a considerable amount of diverse data, from shareholder registers to complex market derivatives data at a time when Data Governance was not as prolific a subject as it is now.
Looking back with the experience I have now, there were a few practices that would definitely not be allowed in today’s data environment.
Roberts: Wild, wild west.
Scott: I spent some time in Luxembourg, bringing data into a complex monitoring tool. Working with a lot of good people, but was quite a maverick environment. Around this time I started working on data quality. An opportunity came for a business intelligence role. I didn’t get it, as another candidate has better experience, but they liked my approach and attitude and said, “How would you like to be a Metadata Manager? They said no one knows metadata, but you can find out what it is and deliver it.”
You can figure it out. [Laughter]
Within six months, I had a handle on Metadata Management and started to deploy it across the organization. We implemented hotkeys functionality, which brought up the definitions and linked them to business process. When we started, there were 25,000 business terms, this was too many so we removed them and went back to the basics. By the time I left, it was at 600 approved business terms there at the organization as opposed to 25,000 term of little value
After a year, I was given a data quality team to look after, because the data quality manager left. That’s when I started using Syncsort’s Trillium Data Quality software.
When was this?
Four and a half, five years ago. Ever since then I’ve been ensconced in Trillium software. The metadata and Trillium software were working together. They’re very complementary, as I was talking about in the presentation.
After a reorganization I found myself looking for new challenges and took the opportunity to take the contracting route for a few years
What problems did you have when you arrived? What was the driver for them to hire you?
It was a Data Foundation program that was in place. The company was finding it difficult to attract anyone with Data Quality experience, especially with exposure to Trillium software, they saw my CV, and thought, “well somebody knows Trillium” because that’s the tool that they had bought.
They were using Trillium Data Quality software and they got training so they could produce some business insights, but they were giving figures to the business senior managers, 80% and less, 60% and less, 40% and less. Businesses don’t care, they want to know the problem. They want it written back to them in English.
One of the challenges was that previous Quality Assessments could not be replicated due to a lack of documentation and process. They’ve done a huge customer analysis and it gave some great insight and narrative but nobody could tell us how they’ve cut the data, how they’ve sourced the data, how it’s been sliced, or what the rules were that they used to get to the focus.
The key is that everything we do is documented with a standard operating process. Those are used every time through the process. If people find a better way of doing something, we tweak it and then make it better, but we’ve written in such a way that anyone can come in off of the street and, as soon as they have access to the data, they can do the processes.
What has really been your major stumbling block in implementation?
Access to data. Getting the data out of systems when nobody wants to show anything because it could pull down the system or hunker performance, even in non-production environments. That’s changing now because they see the value of what we’re doing and there’s no tax on the server.
You had to explain to them that Trillium software would not mess with their performance before they would let you pull the data?
Yeah, those are the problems on the data governance side which wasn’t my task. What we were doing with data quality started to give validation to what we were doing in data governance.
When you say what you’re doing with the quality, you’re talking about the discovery aspect of it? You’re discovering the problems, and you could say, “Look, here they are.”
Yes, absolutely. What we do is focus on the customer data funnel where we get people from in the business. For example, it was alluded to address line two must be kept blank. And it’s actually in the manual to leave that blank. Who leaves a critical field like address line two blank?
Why do you have address line two it if you’re never going to put data in it?
Exactly. [Laughter] Because you’ve got an inconsistency in your data mastering platform, you’ve got to make your algorithms more complex to harvest the data to do it. The market sector we were working in was for Agriculture and farms in the UK. You’ve got a house, a street, and a street type. You’ve effectively got a house name, a farm name, an area name, and a town. There’s no numbers in four strings and when you’re using a name and address for projects you look for numbers in the patterns.
And you don’t have those patterns, you can’t find them.
They’re also doing things like the address fields having name information Mister and Missus. Mr. Scott could be the first line of the address. You’ve got names and addresses all over the place. That’s why we need consistency.
That’s a challenge.
That thing is the biggest challenge. Actually with all of the data, I reckon that we could get 96-97% exact match to a postal address if we just structure that correctly, which is a pretty good place to be.
Make sure to check part two where Roberts and Scott speak about the final stages of a data consultant project and where Scott’s next move will take him.
Check out our eBook on 4 ways to measure data quality.