Master-Slave Transactional Incremental
Replication
Version 1.0.0
GotDotNet community for collaboration on
this pattern
Complete List of patterns & practices
Context
You are about to design the handling of transmissions on
a replication link. Your requirements are:
The replication set that you are sending to the target
consists only of changes made to the source data, so you are
designing an Incremental Replication.
All needed data must be available to the target
applications at any point in time; this includes related
data such as reference data in other tables.
You have decided to achieve this by replicating the
changes on a transactional level. Thus, you must retrieve
transactional information from the source by either by
accessing the transaction log of the database system or by
preparing the source database as described in the
Capture Transactional Detailspattern . Since the
granularity of a transmission is a transaction, both source
and target have to be database management systems (DBMSs),
which we will describe in relational DBMS (RDBMS) terms.
Note: This pattern uses concepts, terms, and
definitions that are introduced in the
Data Replication architectural pattern. It
uses the services described in the Capture
Transaction Details pattern as well.
Problem
How do you design a replication link to support the
transmission of transactional changes, and to replay them on
the target while meeting all integrity requirements?
Forces
Any of the following compelling forces justify
using the solution described in this pattern:
Availability of consistent data on a complex target
schema. When applications read data from the target, all
related data (for example, referenced data in other tables)
must be available and consistently up-to-date. This requires
that each transaction that updates the source be transmitted
to the target as one transaction.
The following enabling forces facilitate the
adoption of the solution, and their absence may hinder such
a move:
Small volume of changes compared to the volume of the
replication set. The size of the changes being moved
from the source to the target on each transmission is
significantly smaller than the size of the replication set.
Thus, the changes will be transmitted faster than a new
snapshot of the replication set.
Similarity in the replication sets. The schemas
of both source and target must be nearly identical, although
you can tolerate the following types of differences:
Different data types for corresponding columns at
source and target.
One column of the source table can be split into
different columns in the target table.
Several columns of the source table can be combined
into a single column of the target table.
Solution
The solution is to acquire the information about
committed transactions from the source and to replay the
transactions in the correct sequence when they are written
to the target.
Note: This pattern uses the terms
transactions' and operations' with the following
meanings:
A transaction is a collection of SQL commands
that form a unit of work. Depending on the RDBMS, a
transaction is started explicitly by a command like
Begin Transaction, or implicitly by the first SQL
command outside of a transaction. The transaction is
ended either explicitly by a commit or a rollback, or
implicitly at the end of every SQL command in autocommit
mode.
An operation is the change (INSERT, UPDATE, or
DELETE) of an individual row within a transaction.
Figure 1 shows the replication building block for this
type of replication.
Figure 1: Replication building block for Master-Slave
Transactional Incremental Replication
Depending on the features of the RDBMS and on the
requirements of the replication link, the transactional
information can be acquired either from the logging system
of the source or from additional schema objects on the
source. In both cases, the effects of the transaction are
acquired as a collection of operations on the rows being
updated by the transaction. The transmission to the target
results in the corresponding rows being updated by
transactions of the same size and in the correct sequence
(according to their completion time at the source).
The detailed description of this pattern is separated
into:
Background
Prerequisites for this pattern
Recorded transactions executed on the target
Background
Before starting to describe the design of a transactional
replication link, here are two considerations of more
general nature:
Transaction order for replay
Handling triggers, if the Write service uses standard
SQL
Transaction Order for Replay
When replaying the transactions on the target, you do not
have to mirror the database connection environment on the
target to achieve data integrity. As concurrent transactions
are isolated from each other, you can execute the
transactions on the target sequentially instead. Thus, you
can use a single database connection to replay all
transactions of the current transmission. However, you must
execute the individual transactions in the correct sequence,
which is given by the time of their completion on the
source.
The following example shows why the transactions must be
ordered by their completion time, not their start time.
Two tables are written by two concurrent transactions.
Figure 2 shows how both transactions are running in
parallel.
Figure 2: Two concurrent transactions
Transaction 1 starts first and updates tableA. Then
Transaction 2 starts, updates tableB, and completes with a
commit. Finally, Transaction 1 also updates tableB. After
both transactions, colB of tableB has a value of 20.
If the ordering were done on the transaction start time,
the update of Transaction 2 would be the last one. Thus,
colB of tableB would have a value of 10 at the end, instead
of 20. However, if the ordering is done on the end time of
each transaction, colB of tableB has a value of 20 at the
end, which is the correct value.
If Transaction 1 had written tableB before Transaction 2
(for example, instead of tableA), then Transaction 2 would
have been blocked until the end of Transaction 1.
Transaction 2 would have been executed after Transaction 1,
and colB of tableB would have a value of 10. Here again the
transactions have to be ordered on their completion, which
produces the correct result.
A sufficiently precise timestamp is needed to distinguish
the completion time of any two transactions and thereby
correctly order the transactions.
Handling Triggers, If the Write Uses Standard SQL
Triggers are schema objects that perform additional
operations on behalf of an initial operation. Triggered
operations are also part of the initiating transaction and
are logged in the same way as any other operation.
There is an issue around how the RDBMS behaves when
logged changes get applied to the target database. If you
can apply the log information to the target without ever
triggering a secondary operation, then you are fine. But if
a trigger could be fired by applying a log record, then you
have to be concerned.
This applies to either triggers that match source ones,
or totally different triggers at the target.
For example, a source table has a trigger that is fired
on every UPDATE. This would INSERT the old state of the row
into a history table. There is a similar trigger on the
target table. An UPDATE of the row in the source log is
recorded, and the triggered INSERT into the history table is
also recorded as part of the transaction. When executing
this transaction on the target, the UPDATE of the table is
performed first. This fires the trigger on the target, which
causes an entry into the history table. But the source also
recorded an INSERT into the history table. When replaying
the next operation, a second INSERT into the history table
is performed. Thus, the history table would have two new
entries instead of one.
Hence, if you don't take special precaution around the
handling of triggers, you might perform more operations on
the target than you performed on the source.
To solve the problem, you must eliminate the effect of
the trigger on the target during transmissions. To achieve
this, the replication link should connect to the target
database with a dedicated user or a dedicated role that is
only used for transmissions, but not by any other
applications. Additionally, the trigger has to be defined in
a manner (depending on the SQL dialect of the RDBMS) that
does not perform any operations if the database connection
uses this special user or role.
Prerequisites
This pattern depends on two features that the DBMS must
provide and on a prerequisite for the data model:
A fine-grained clock. The order in which
transactions are executed on the source must be the same as
the order in which they are replayed on the target. Thus,
the clock must provide a sufficiently fine resolution to
preserve the order. A clock grain of a millisecond is
generally sufficient; many systems provide even
microseconds. A clock with a resolution of whole seconds
only will definitely prevent the use of this pattern.
Transaction Identifiers. The RDBMS must provide a
means to identify the operations that belonging to the same
transaction. This is called a Transaction Identifier
throughout the remaining discussion. It can be an opaque
data type, and is generally provided to handle distributed
transactions.
Unique key. All tables of the replication set
must have either unique keys or another combination of
columns that uniquely identifies every row. The unique
identifier of every row is referred to as the Replication
Key throughout this document. After the Manipulate
process, the resulting Replication Key must also identify
every row in the target uniquely.
Elements of the Replication Building Block
The following paragraphs describe the elements of the
replication building block for this type of replication.
Source
The source contains the replication set, which is a log
of all changes that you want to acquire.
Acquire
The steps of Acquire are:
1.Connect to the source.
2.Find the transaction pending for transmission to the
target that has the oldest completion timestamp.
3.Find the first operation of this transaction.
4.Pass the Transaction Identifier, the table name, the
type of operation (INSERT, UPDATE, or DELETE) and the names
and values of the columns to Manipulate.
5.Continue with step 4 until all operations for the
current transaction are read.
6.Delete the record of the transmitted transaction,
unless it is needed for other replication links.
7.Continue with step 2 until all transactions are read.
Manipulate
This service performs the following steps:
1.Get the first row from Acquire.
2.If the table needs some manipulation, perform the
appropriate action, such as converting data types, and
combining or splitting fields of the row.
3.Pass the Transaction Identifier, the table name, the
type of operation (INSERT, UPDATE, or DELETE) and the names
and values of the columns to the Write service.
4.Get the next row from Acquire and continue with step 2
until all rows are processed.
Write
The Write service performs the following steps:
1.Get the first row from Manipulate.
2.If the Transaction Identifier differs from the
Transaction Identifier of the previously handled row, COMMIT
the transaction.
3.Depending on the remaining attributes, build a SQL
statement that INSERTs, UPDATEs, or DELETEs a single row in
the target.
4.Get the next row from the Manipulate service and
continue with step 2 until all rows are processed.
Target
The target is the database where the transactions are
replayed. You must ensure that no triggered operations are
executed, as described above.
Example
The
Implementing Master-Slave Transactional Incremental
Replication Using SQL Server pattern presents an
implementation of the design pattern by the means provided
with Microsoft SQL Server.
Resulting Context
The use of this pattern inherits the benefits and
liabilities from
Master-Slave Replication and has the following
additional benefit and liability:
Benefit
Basis for other useful services. Other services
might have a similar need to use transactional information.
For example, spin off the transmission data to a historical
store, such as a data warehouse.
Liability
Dependencies of schemas. This pattern depends
greatly on the similarity between the source and the target
schemas. If one of them changes, the other must change
accordingly. (For minor changes, it could be sufficient to
adapt Manipulate for the data conversion from the source
schema to the target schema.)
Security Considerations
The database connection used to replay the transactions
on the target must have sufficient access rights to INSERT,
UPDATE, or DELETE in all tables belonging to the replication
set. In addition, the Acquire database connection account
needs to have SELECT or READ privileges on the source
transaction objects.
It is recommended that you create a dedicated user in the
target database with appropriate privileges and use this
user for all transmissions. In general, this user will not
be used for any other purposes.
Defining such a user allows you to tailor the privileges
to the specific needs of Acquire or Write.
Operational Considerations
Before implementing Master-Slave Transactional
Incremental Replication, the following are
considerations to achieve smooth running operations:
Load on the source database server. When a large
number of targets want to use the same source for their
replication links, then the source can have operational
difficulties in meeting all their demands. In this case,
consider using the Master-Slave Cascading Replication
pattern.
Load on the target database server. Transmissions
replay transactions in the same way they were executed on
the source. Although the transactions are executed
sequentially on the target, the transmission consumes
significant resources, such as CPU time and I/O activity, on
the target. This impacts the response times of the
applications during transmissions.
Space requirements for replaying the transactions on
the target. Every transaction on the source is replayed
as a single transaction on the target. Thus, you must
provide sufficient space in the logging system to complete
the largest transaction that might be executed on the
source. Although the source must deal with concurrent
transactions that the target does not need, a good starting
point is to configure the logging system of the target in
the same way that it is configured on the source.
Variants
After introducing the solution for transmitting the
changes of a replication set from the source to the target,
two variants show possible enhancements. The first variant
shows a way to obtain a higher robustness of the
replication. The second one sketches the implementation of a
change history.
Higher Robustness
When writing the transactions to the target, you might
consider converting INSERTs into UPDATEs or vice versa.
Thus, if an INSERT raises a duplicate key error, you perform
an UPDATE instead; and if the number of rows being hit by an
UPDATE is zero, you perform an INSERT instead. This is
sometimes referred to as an UPSERT.
Although such error handling is not necessary as long as
the content of the target corresponds exactly to the state
of the source before the transaction, it offers a higher
degree of robustness. If the content of the target had been
changed, for example, by an erroneous action of an operation
or administrator, the normal execution of the transaction
would fail; however, such an error handling will again align
the contents of the source and the target.
Implementing a Change History
Instead of updating changed rows, they can be appended by
adding a version number. Therefore, the previous version of
the target row will be kept to retain a change history at
the target. For example, this information can be used to
trace a stock's performance record.
Additional Prerequisites
Some preparations have to be made for this variant to
work.
The data schema must be extended by a version column.
The version number must be managed by Write, which
increases the version number before it writes the new row by
performing an INSERT.
Operational Considerations
Because rows will not be updated but inserted, the target
database grows faster than the source database. Therefore,
you must provide an appropriate amount of free disk space at
the target site.
Related Patterns
For more information, see the following related patterns:
Patterns That May Have Led You Here
Move Copy of Data. This is the root pattern of
this cluster; it presents the overall architecture for
maintain copies of data after they have been updated.
Data Replication. This pattern presents
the architecture of a replication.
Master-Slave Replication. This pattern describes
design considerations for transmitting data from the source
to the target by overwriting potential changes in the target
on a higher level.
Capture Transaction Details. This pattern
describes the underlying change-capture service to provide
transactional information from the source when a DBMS log is
not available or is not to be used for this purpose.
Patterns That You Can Use Next
Implementing Master-Slave Transactional Incremental
Replication Using SQL Server. This pattern shows how
to implement Master-Slave Transactional Incremental
Replication by using SQL Server.
Other Patterns of Interest
Master-Slave Snapshot Replication. This pattern
presents a design for transmitting a complete replication
set. This can be used to equalize both databases as a
starting point before establishing an Incremental
Replication.
Master-Slave Cascading Replication. This
pattern shows replication topologies where Master-Slave
Transactional Incremental Replication can be used to
design the individual replication links of the topology.
|