Change Data Capture vs. Change Tracking: Three Real-World Examples
In today’s world, ensuring that the data your business depends on for its daily operations is always reliably up-to-date is an absolute necessity. That’s why two relatively new technologies, Change Tracking (CT) and Change Data Capture (CDC), have become vital tools for tracking changes made to the database systems that are at the heart of modern corporate IT.
Although they are similar in concept and function, CT and CDC differ in important ways. Let’s take a brief look each of them and see how they compare.
Change Tracking (CT)
When Change Tracking is enabled for a particular database table, it creates a hidden internal table that can be queried to ascertain the time and type of any change made to a row in the database table. CT is often described as a “lightweight” solution because it only stores the most recent change to each row – no change history information is maintained.
A particular advantage of CT is that it operates synchronously, so that change information is refreshed in real time, coincident with the incorporation of that change into the database.
Change Data Capture (CDC)
In contrast to CT, CDC maintains a history of row changes, including the actual data that was changed. It does this by asynchronously reading the database’s transaction log to detect when a change has occurred. Since CDC is not directly involved in database update activity, its use does not impose any significant performance penalty.
A potential disadvantage is that because CDC retains history information, storage overhead is higher than with CT. Also, due to CDC’s asynchronous update mechanism, there can be a brief delay between the time when the database is changed and when that change is reflected in the CDC change tables.
Change Tracking and Change Data Capture Examples
One advantage common to both CT and CDC is that in distributing or replicating data from a source database, only change data (rather than entire databases) must be transmitted.
CT is well suited for applications that require notice of a database change, but which don’t need a change history. A good example is an application that is run regularly (as opposed to continuously). Each time such an application is executed, it can query a CT-enabled database to see which rows were changed, and then, if necessary, retrieve data only from the rows that were updated.
On the other hand, CDC is most useful for transactional applications where maintaining historical data is important.
Here are some examples:
Data Extraction and Synchronization – When only current data is needed for analytics applications, or to keep separate databases in sync, CT may be the best solution. The target application or system can stay current by retrieving data only from those source database rows that have changed.
Data Warehouse and ETL – In general, a data warehouse that incorporates transactional or business intelligence (BI) information is required to maintain historical as well as current data. For these, the ETL processes used to initially load and continuously refresh the warehouse will normally benefit from incorporating CDC.
Business Intelligence (BI) – CDC is particularly useful in extracting data updates from a mainframe into a Hadoop cluster so that BI analytics doesn’t consume costly mainframe CPU cycles.
Connect CDC captures data from IBM Db2 for z/OS, IBM Db2 for i, VSAM data sets, and other sources, and reliably replicates it, in near real time, to data warehouse and database targets such as Hadoop and Microsoft SQL Server. And it does so with minimal impacts on database performance and network bandwidth.
To learn more about Connect CDC watch our webcast, where we introduce you to Connect CDC’s capabilities and discuss how you can use Connect CDC in a variety of use cases that help drive your business forward.