Search This Blog

Friday 8 August 2008

Tracking v/s Capturing changes

Change Data Capture has by far been my favourite feature everytime i think about SQL Server 2008. The really neat bit is something I missed during my learning process, There are two flavours to capturing changes, the change itself and the data that has changed, this is what distinguishes Change Data Tracking and Change Data Capture in SQL Server 2008

Change data capture provides historical change information for a user table by capturing both the fact that DML changes were made and the actual data that was changed. Changes are captured by using an asynchronous process that reads the transaction log and has a low impact on the system.. When you want to stage data in logical blocks such as a website publishing engine, or a clearing system this feature could prove very useful. This is mainly because of the granularity of the changes that are captured and the nature in which they are stored providing no coupling to the object whose changes are captured.

Change tracking on the other hand captures the rows in a table that changed, but does not capture the data itself. This allows applications to determine rows that have changed only with the latest row data being available in the user tables. Therefore, change tracking is more limited in the historical questions it can answer compared to change data capture. However, for applications that do not require historical information, there is far less storage overhead because of the changed data not being captured. It is the data captured which causes the database to grow. A synchronous tracking mechanism is used to track the changes and has been designed to have minimal overhead to the DML operations.

Either of these features can be used to synchronize applications or there database engines. Synchronization can be implemented in applications in two directions, one-way and two way.

In One-way synchronization applications, such as a client or mid-tier caching application, can be built that use change tracking. e.g, a caching application requires data to be stored in the database and to be cached in other data stores. In this scenario the application must be able to keep the cache up-to-date with any changes that have been made to the database. There are no changes to pass back to the Database Engine.

In two way synchronization, the data in the Database Engine is synchronized with one or more data stores. The data in those stores can be updated and the changes must be synchronized back to the database. A good example of two-way synchronization is an application which is occasionally connected such as a mobile application. In this type of application, a client application queries and updates a local store. When a connection is available between a client and server, the application will synchronize with a server, and changed data flows in both directions. In two-way synchronization applications must be able to detect conflicts. A conflict would occur if the same data was changed in both data stores in the time between synchronization's. With the ability to detect conflicts, an application can make sure that changes are not lost.

So my misinterpretation that change capture and change tracking meant the same proved to be wrong. This really useful feature could be put to use effectively in a scalable manner by choosing the right flavour of change capture which is based on the needs or nature of your application. There is no denial that applications on SQL Server versions prior to 2005 will need a major overhaul if there is an existing mechanism in place, that said it is best not to underestimate the implementation of Change Data Capture for a existing application. However new applications could base there designs around this feature and seek to benefit rapidly.

On this note a quick note to people who use Log shipping, The feature is useful when batch processing of transactions is to be done on a regular frequency, however it is still limited in not being able to identify each transaction individually, there is no denial however that is the best choice for Disaster Recovery options.

No comments: