How Change Data Capture Can Power Your Next-Generation Analytics and Mobile Applications
Data integration tools excel at getting an enterprise’s disparate data from systems like the mainframe, IBM i, relational databases and data warehouses to modern distributed platforms such as Hadoop. Oftentimes an organization begins its big data journey by addressing the challenge of “filling the data lake”. This involves sourcing many different data assets, standardizing their formats, assessing and improving the quality of the data, and getting them in a suitable store for analytics and other business initiatives.
Once this initial need is met, however, the challenge of keeping that data fresh can prove more difficult to overcome. Detecting changes to source data can potentially apply undue strain on the transactional systems that host that data. More importantly, those updates need to be made available to a variety of downstream consumers at application-specific delivery rates.
Distributed Messaging Framework
This growing requirement is prompting users to move away from point-to-point data replication topologies in big data environments. With many different downstream consumers of the data, the design process, maintenance, and overhead of delivering data in a point-to-point fashion is not manageable or scalable. Instead, there is an increasing drive towards adopting distributed messaging frameworks like Apache Kafka to serve as the backbone of the enterprise data architecture. Let’s look at an example of how this strategy may manifest at a fictional large bank named Gamma.
Gamma uses the mainframe as its system of record. All deposits, withdrawals, and changes to account details and status ultimately trigger a transaction to an underlying Db2 database running on IBM z/OS. Gamma is looking to do several things with this account data.
- Gamma wants to improve its customer experience and increase satisfaction by making account updates immediately available in the mobile app used by its clients. This data must be made available to the app immediately, to prevent a user from seeing a stale balance reported after she has made a withdrawal from an ATM, for example.
- In an effort to shift to a more customer-focused business, Gamma wants to perform customer segmentation analysis and deliver targeted marketing campaigns, using the daily transaction data from Db2 as one of several sources to power this initiative. The data will ultimately land in a store like Impala, where analytics will be run.
- Gamma is also concerned with fraud and money laundering detection, and therefore wants to leverage insights within the live transaction data to suggest corrective actions in cases of suspicious behavior. Since fraud detection must be quick and requires immediate action, this application is very sensitive to the rate at which changes can be made available to it.
Delivery of Data
All three of these use cases rely on fresh data coming from the live transactional systems of record. All three use cases require different delivery rates of that data to the downstream applications that require it. While application 2 requires periodic delivery, applications 1 and 3 require data as soon it is committed to the Db2 system of record. It becomes quite clear that performing point-to-point delivery of fresh data between Db2/z and each of these applications individually would become unwieldy, especially as new applications and use cases are deployed within Gamma’s big data environment. The publish/subscribe approach, on the other hand, is an architecture that is much better suited to this type of data delivery requirement. By using Kafka as the location to which live changes are replicated in real-time, each application can consume only what it needs as often as it requires that data.
This simple use case highlights how strategic initiatives originating from a business requirement—adoption of mobile technologies, customer-centric services, anomaly and fraud detection—are driving enterprises to adopt new technologies and data delivery paradigms as a centerpiece of their IT strategy. Ultimately, an organization’s ability to liberate its valuable data assets and leverage these treasure troves of unmined insight will determine how well it can deliver value to its customers beyond that delivered by its fiercest competitors.
Check out our webcast for some of the advantages and disadvantages of various change data capture strategies.