Implementing Master-Slave Transactional
Incremental Replication Using SQL Server
Version 1.0.0
GotDotNet community for collaboration on
this pattern
Complete List of patterns & practices
Context
You want to build a
Master-Slave Transactional Incremental Replicationbetween
two Microsoft SQL Server databases, and you want to take
advantage of the integrated replication functionality of SQL
Server.
The schemas of the source and target do not need to be
identical; simple manipulations as defined in the
Data Replication pattern can be implemented in
the replication link. In addition, horizontal (row) and
vertical (column) filters can be used to restrict the
replication set.
Note: This pattern uses terms and concepts
from the following data patterns:
Move Copy of Data
Data Replication
Master-Slave Transactional Incremental Replication
Background
Before introducing the implementation with SQL Server,
this pattern covers the following topics:
A summary of the replication building block, as
described in the Master-Slave Transactional Incremental
Replication pattern
The SQL Server replication services that this pattern
uses (SQL Server transactional replication).
The mapping of the elements of the replication building
block to the services of SQL Server
Replication Building Block
The replication building block used in the
Master-Slave Transactional Incremental Replication
pattern consists of a source and a target that are connected
by a replication link, as Figure 1 shows.
Figure 1: Replication building block
Source
The source contains the replication set, which is the
data copied from the source and sent across a data movement
link to the target.
Acquire
The Acquire service reads the data changes that are
replicated.
Manipulate
The Manipulate service performs simple data
transformations, such as data type conversions and splitting
or combining columns.
Write
The Write service updates the target with the manipulated
rows.
Target
The target is the database where the replication set is
to be written.
The process of moving the replication set from source to
target according to defined functional and operational
requirements is called a transmission.
SQL Server Transactional Replication
SQL Server replication uses several standard services to
move data from a publication database to a subscription
database. Figure 2 shows the services and processes involved
in SQL Server transactional replication.
Figure 2: SQL Server transactional replication
Platform Roles
SQL Server defines three roles for the platforms involved
in the replication:
Publisher. The Publisher contains the source and
the Log Reader Agent. In this environment, the Subscriber
defines the replication set (publication) to be replicated.
Distributor. The Distributor holds the
distribution database. It runs the cleanup jobs for managing
the distribution database, and in a push replication also
runs the Distribution Agent.
Subscriber. The Subscriber contains the target
and creates a subscription to subscribe a publication. In a
pull replication, the Subscriber runs the Distribution
Agent.
The Distributor does not necessarily need to be a
separate platform. Its role can also be assigned to the
Publisher, where it is called a local Distributor.
Otherwise, it is called a remote Distributor.
Software Components
SQL Server transactional replication consists of two main
processes and a database:
Log Reader Agent. The Log Reader Agent extracts
changes defined by the publication from the transaction log
of the publication database, and stores the changes to be
replicated in an additional database, called the
distribution database.
Distribution database. The distribution database
is a SQL Server database and is located on the Publisher
(called a local Distributor), or on a dedicated platform
(called a remote Distributor). Because it stores changes in
a proprietary format, the distribution database can only be
used for SQL Server replication, not for any applications.
After the Log Reader Agent extracts the changes, the logging
system of the publication database is allowed to shrink the
transaction log, because the information for the
transmissions is saved in the distribution database.
Distribution Agent. The Distribution Agent
applies the initial snapshot to the subscription database by
moving transactions held in the distribution database to
Subscribers. The Distribution Agent runs at either the
distributor for push subscriptions, or at the Subscriber for
pull subscriptions. Before actually writing the changes, the
Distribution Agent can manipulate them using a Data
Transmission Services (DTS) package that the Subscriber
defines. If you need to perform manipulations before writing
the changes to the subscription database, you can do so
using such a DTS package.
Independent of the replication processes, cleanup jobs
run on the distribution database and perform the following
tasks:
Agent History Clean Up: Distribution. Removes
replication agent history from the distribution database.
Distribution Clean Up: Distribution. Removes
replicated transactions from the distribution database.
Expired Subscription Clean Up. Detects and
removes expired subscriptions from publication databases.
Reinitialize Subscriptions Having Data Validation
Failures. Reinitializes all subscriptions that have data
validation failures.
Replication Agents Checkup. Detects replication
agents that are not actively logging history.
A SQL Server replication is based on the following rules:
Any table that should be published needs a primary key
as a unique identifier for a row.
SQL Server transactional replication assumes that the
subscription database is read-only (unless updating
subscriptions are used). This means:
Any changes to the subscription database at the
Subscriber will be overwritten by the next transmission.
Any deletion of rows on the subscription database
will cause a failure in the replication process.
Mapping the Replication Building Block to SQL Server
SQL Server transactional replication implements the
replication building block elements used by Master-Slave
Transactional Incremental Replication as shown in Figure
3. Figure 3 is the result of merging Figures 1 and 2.
Figure 3: SQL Server implementation of Master-Slave
Transactional Incremental Replication
Source
Incremental replication transmits only the changes that
have been made to source rows that the target has copies of,
rather than getting full refreshes of the data. As described
in the design pattern Master-Slave Transactional
Incremental Replication, this type of replication needs
a recording of the source transactions to use for change
transmission. SQL Server replication preserves the changes
to the publication database in the distribution database for
transmission to the subscription database. The distribution
database is updated by the Log Reader Agent, which reads the
transactional information from the transaction log of the
relational database management system (RDBMS) and writes it
to the distribution database. This allows the RDBMS to
shrink the transaction log.
The changes remain available in the distribution database
for a specified period of time (the retention period).
Replication Set
In SQL Server, the replication set to be replicated from
the publication database to the subscription database is
called a publication. A publication consists of one or more
tables, or only parts of tables. You can define the parts of
tables in one of two ways:
Filter the rows to be replicated by defining a search
condition.
In the theory of relational databases, the result is
called a restriction.
Filter the columns to be replicated as a subset of the
table's columns.
In the theory of relational databases, the result is
called a projection.
Acquire
The Distribution Agent reads all the changed data from
the distribution database from the last transmission up to
the most recent transaction.
Manipulate
The Distribution Agent can manipulate the data. When the
Subscriber registers its subscription to the Distributor, it
can define a DTS package, which the Distribution Agent
invokes before sending the data to the Subscriber. Each
Subscriber can use its own package, so different Subscribers
can get various views of the same data.
Within such a DTS package, you can define any
manipulations that can be performed using the data of the
current row and the ActiveX script language. Each row from
the publication database can result in only one row at the
subscription database, or it can be skipped. Columns can be
split or combined. Additionally, any kind of data type
conversions and changes of column names can be done.
Write
The Distribution Agent writes the data to the
subscription database. The Write is done by default using a
stored procedure. Optionally, it can be done directly using
SQL.
In general, the Distribution Agent writes the changes
regardless of whether the contents of the subscription
database have been updated by any application. If you must
prevent updates from being overwritten, you have to use
updateable subscribers or configure a merge replication
instead of a transactional replication. A merge replication
is described in the pattern
Implementing Master-Master Row-Level Synchronization
Using SQL Server.
Target
The target is the subscription database, which contains
the replication set to be updated.
Implementation Strategy
To set up a new replication link, follow these steps:
1.Configure the Distributor:
a.Create the distribution database: This is an
internal SQL Server database used from the Distributor.
In most cases, you use the default options to create
this database, but you can also customize it.
b.Choose the snapshot folder: You can either choose
the default folder for exchanging the initial snapshot
files or use a custom folder.
c.Set up the Subscriber parameter: To define the
default parameter for all subscribers, you can confirm
the offered configuration options or customize them.
2.Define the publication:
a.Select the publication database that includes the
publication.
b.Decide if you want to transform the data during the
transmission. If so, any subscriber must have a DTS
package for the transformation.
c.Select the possible database systems the
Subscribers will use: SQL Server 2000, earlier versions
of SQL Server, or another RDBMS.
d.For each table, specify whether the entire table
should be replicated, or restrict the table by
specifying horizontal and vertical filters.
e.Decide whether anonymous Subscribers are allowed to
subscribe to the publication.
f.Decide how often the Snapshot Agent creates a new
snapshot.
3.Create a DTS package for the transformation, if you
have decided to use one.
4.Define the subscription:
a.Decide if you want a pull or a push subscription.
b.Select the publication you want to subscribe to.
c.Choose the subscription database.
d.Decide if you want to initialize the subscriber
using an initial snapshot from the publisher.
e.If you want to start the transmission
automatically, define a schedule.
f.Specify the DTS package you want to use for this
subscriber.
g.Verify that all required services are started.
At this point, all parts of the replication link have
been configured. From now on, both Publisher and Subscriber
will log all changes to the specified replication set using
triggers.
The SQL Server transactional replication runs different
jobs:
Snapshot Agent: Creates the initial snapshot on the
Publisher.
Distribution Agent: Applies the snapshots on the
Subscriber.
Cleanup jobs: Clean up the distribution database on the
Distributor.
The example that follows describes in detail how to use
the SQL Server wizards to set up such a replication link.
Example
This example shows how to configure transactional
replication based on a particular replication set. The
replication set used here is the authors table from the pubs
sample database, which is delivered with SQL Server.
Overview
The environment has two SQL Server computers: PUB_SERVER
and SUBS_SERVER. The Publisher has a database, PUB_DB, that
contains the authors table from the pubs sample database. To
use PUB_SERVER as a local Distributor, you create a
distribution database named Distribution. The Subscriber has
an empty database, SUBS_DB. Figure 4 shows the environment
of the example.
Figure 4: Example environment
This configuration creates a periodical incremental
transactional replication process with an initial snapshot.
The Subscriber initiates the replication (pull) every 15
minutes. The data from the Publisher is manipulated during
the replication by converting the first and the last name
from the authors table into uppercase. (This example does
not allow updateable or anonymous subscribers.)
Configuring the Publisher and the Distributor
To set up the example environment, follow these steps in
the Configure Publishing and Distribution Wizard:
1.In SQL Server Enterprise Manager, select the
publication database server, right-click Replication,
and then click Configure Publishing, Subscribers and
Distribution.
2.On the Select Distributor page, select the
Make Pub_Server its own Distributor
option, where Pub_Server is the name of the server
you want to configure as a distributor. Click Next.
Note: The Distributor can be located on the
same server as the Publisher, in which case it is called
a local Distributor, or on a remote server. This example
uses a local Distributor.
3.On the Specify Snapshot Folder page, type the
name of a shared folder on the Publisher where the Snapshot
Agent can store snapshot files.
Note: You must create and share this folder
manually, or use a previously defined system shared
folder for this purpose.
4.On the Customize the Configuration page, select
No, use the following default settings to create a
default distribution database.
Note: In this simple example, it is sufficient
to use the default settings. In a more complex
environment, you should manually configure the
distribution database.
Creating a Publication
Next, use the Create Publication Wizard to create a
publication on the Publisher.
1.In SQL Server Enterprise Manager, select the
publication database server, expand Replication,
right-click Publications, and then click New
Publication.
2.On the Welcome Screen, check the Show
advanced options check box to use this wizard with the
advanced options needed to enable a transformation of the
data.
3.On the Choose Publication Database page, select
Pub_DBas the publication database, where Pub_DB
is the name of your database for the replication.
4.On the Select Publication Type page, select
Transactional Publication to create a publication for a
transactional replication.
5.You do not want to allow the replication of changes
from the subscriber to the publisher; therefore, ensure that
no option is selected on the Updateable Subscription
page.
6.On the Transform Published Data page, select
Yes, transform the data to enable the transformation
during the replication.
7.On the Specify Subscriber Types page, select
only Servers running SQL Server 2000.
8.The Specify Articles page shows possible objects
for replication in the publication database. From the
Object Type Tables list, select the table authors.
Note: Clicking the ellipses ( ... )
button for each object shows additional properties for
the replication. You do not need them in this example.
9.On the Select Publication Name and Description
page, specify the Publication name as example.
The default publication description can be used.
10.On the Customize the Properties of the Publication
page, select the option beginning with No, create the
publication as specified, because you can use the
defaults options shown in the text box below the options.
11.On the Completing the Create Publication Wizard
page, click Finish to create the publication.
Defining a DTS Package for the Transformation
Before you can create a subscription for the defined
publication, you must create a DTS package for the
transformation.
1.In SQL Server Enterprise Manager, select the
publication database server, expand Replication, open
Publications, right-click on the example:pub_db
replication, and then click Properties.
2.On the Publication Properties page, click theSubscriptions
tab and then click Transformations.
3.On the Welcome Screen, click Next.
4.On the Choose a Destination page, select the
subscription database server and database. (See Figure 5.)
Figure 5: Choose a Destination page
5.On the Define Transformations page, click the
ellipses ( ... ) button from the article authors.
6.On the Column Mappings tab of the Column
Mappings and Transformations page, select Drop the
existing table and recreate it from the list box. Leave
the mappings unchanged. (See Figure 6.)
Figure 6: Column Mappings tab on the Column Mappings and
Transformations page
7.Click the Transformations tab. Click
Transform data using the following script and select
VB Script Language from the list box. (See Figure 7.)
Figure 7: Transformations tab on the Column Mappings and
Transformations page
8.Use the following code to transform the first and the
last name of the author to uppercase:
Function Main()
DTSDestination("au_id") = DTSSource("au_id")
DTSDestination("au_lname") = UCase(DTSSource("au_lname"))
DTSDestination("au_fname") = UCase(DTSSource("au_fname"))
DTSDestination("phone") = DTSSource("phone")
DTSDestination("address") = DTSSource("address")
DTSDestination("city") = DTSSource("city")
DTSDestination("state") = DTSSource("state")
DTSDestination("zip") = DTSSource("zip")
DTSDestination("contract") = DTSSource("contract")
Main = DTSTransformStat_OK
End Function
9.On the DTS Package Location page, select the
option to save the package on the Distributor. Use the SQL
Server authentication option, and insert the user and the
password to connect to the Distributor.
10.On the DTS Package Identification page, specify
the name of the package. In this example, the name should be
transform_sub. You do not need to define an owner
password.
11.On the Completing the Transform Published Data
Wizard page, click Finish to save the package as
specified.
Creating a Subscription
To complete the configuration you must create a pull
subscription for the defined publication on the Subscriber.
1.In SQL Server Enterprise Manager, select the
subscription database server, expand Replication,
right-click Subscriptions, and then click New Pull
Subscription.
2.On the Welcome Screen, click Next.
3.On the Look for Publications page, select
Look at publications from registered servers.
4.On the Choose Publication page, expand
PUB_SERVER and select the publication named
example:pub_db, where PUB_SERVER is the name of
your Publisher and example:pub_db is the name of your
publication. (See Figure 8.)
Figure 8: Choose Publication page
5.On the Specify Synchronization Agent Login page,
select Use SQL Server Authentication. Enter the login
and the password of a user account that is used to connect
to the Publisher during the replication.
6.On the Choose Destination Database page, select
Subs_DB as the database for the subscription, where
Subs_DB is the name of your subscription database.
7.On the Initialize Subscription page, select
Yes, initialize the schema and the data.
8.On the Snapshot Delivery page, specify the
snapshot folder. Use the default folder of the publication.
9.On the Set Distribution AgentSchedule page,
select Using the following schedule and click
Change to specify a new schedule for the Distribution
Agent.
10.On the Edit Recurring Job Schedule page, define
the schedule for the Distribution Agent as a daily run with
an interval of 15 minutes. (See Figure 9.)
Figure 9: Edit Recurring Job Schedule page
11.On the Specify DTS Package page, select
Distributor and click List packages. For the
transformation, select the transform_sub package that
you created previously. (See Figure 10.)
Figure 10: Specify DTS Package page
12.The Start Required Services page shows the
status of the SQL Server Agent on the Subscriber. If the SQL
Server Agent is not running, select the check box next to
the entry for the agent. The agent will be started after the
wizard creates the subscription.
13.On the Completing the Pull Subscription Wizard
page, review the options for the specified subscription.
Click Finish to create the subscription with these
options.
The configuration process is finished. Before starting
the first transmission, the following checks should be done:
Did the Snapshot Agent create the first initial
snapshot? It may be necessary to start the Snapshot Agent
manually on the publication database.
Did the Pull Agent initialize the schema and the first
replication? If this has not been done automatically, you
can start the process manually on the subscription database.
Starting and Restarting the Synchronization
1.In the Enterprise Manager, select the publication
database server, expand Replication Monitor, select
Publishers, Pub_Server, and open the
publication example:pub_db.
2.Right-click the SnapshotAgent and select
Start Agent to create an initial snapshot.
3.Right-click SUBS_Server:subs_db and select
Start Synchronizing to start the transactional
replication.
Testing the Example
You can easily test the functionality of the implemented
replication by changing some data in the publication,
starting the replication process, and checking to see if the
Subscriber has changed accordingly.
To check various kinds of data changes, perform INSERT,
UPDATE, and DELETE operations on the data in the publication
database.
1.Change data in the publication database. For example:
UPDATE authors SET au_lname = 'Smith' WHERE au_id = '807-91-6654'
INSERT INTO authors (au_id, au_lname, au_fname, phone, contract)
VALUES ('453-12-3255', 'Smith', 'John', '400 486-234', 1)
DELETE FROM authors WHERE au_id = '672-71-3249'
2.Manually start the replication process.
3.Check to see if the rows changed in the subscription
database. For example:
SELECT * FROM authors
endCode}
Resulting Context
Although this pattern describes a replication between SQL
Server databases, the subscription database can be any
heterogeneous database that provides an ODBC or OLEDB
interface, such as Oracle, DB2, or Microsoft Access.
Note: To use a heterogeneous database as a
publication database, you need a software component that
implements a dedicated interface - the replication
distributor interface. The examples that are delivered
with SQL Server show how to implement this software.
There are two ways to use a heterogeneous database as a
subscription databse:
Use ODBC or OLEDB and create a push subscription using
the wizard on the Publisher. This is the simplest way to
publish data to a heterogeneous Subscriber.
Create a publication and then create an application with
an embedded distribution control. The embedded control
implements the pull subscription from the Subscriber to the
Publisher.
For Subscribers, the subscribing database has no
administrative capabilities regarding the replication being
performed.
A closer look at the architecture of the SQL Server
elements reveals that the implementation corresponds to a
cascading synchronization with two replication links. One
link is from the publication database to the distribution
database, and the other link is from the distribution
database to the subscription database. The distribution
database plays the role of a cascade intermediary
target/source (CITS). It can serve additional replications
links to other targets. The
Master-Slave Cascading Replication pattern
descibes this design and the terms associated with it..
This pattern inherits the benefits and liabilities from
the pattern Master-Slave Transactional Incremental
Replication. Because this pattern actually implements a
cascading replication, it also inherits the benefits and
liabilities of the Master-Slave Cascading Replication
pattern. The use of this pattern also results in the
following additional benefits and liabilities:
Benefits
Integrated logging mechanism. If you track the
transaction log, you do not need a logging mechanism for
changes in the application or database.
Integrated into SQL Server Enterpise Manager. The
configuration and execution of the replication in SQL Server
is integrated in the SQL Server administration, and can be
easily done with the SQL Server Enterprise Manager.
Liabilities
Additional database to be managed. The
implementation requires the distribution database as an
additional database that must be managed.
Testing Considerations
After you set up the replication link as described in
this pattern, you must test it thoroughly. Your test cases
should cover these scenarios and others:
Correct transmission of transactions that update the
replication set
The largest transaction that might occur
The longest disconnection between Publisher and
Subscriber that you expect under production conditions
Network interruption between the Publisher and the
distribution database during transactions on the Publisher,
if you are using a remote distributor. (This does not apply
to the example in this pattern.)
Network interruption between the distribution database
and the Subscriber during a a running transmission
A hard abort of the database system on the Subscriber
during a running transmission
After each test, make sure that the Subscription is in
the correct state.
Also make sure that transactions that do not update the
replication set are not transmitted.
Finally, put the highest expected load on the Publisher
and verify that the replication still runs correctly.
Security Considerations
To secure the connection between the Distributor and the
Subscribers, do one of the following:
Use a SQL Server user account to connect to the
Distributor.
Place all computers in one Microsoft Windows domain, or
have a common user with the same password on all computers.
Then use a trusted Windows connection between Distributor
and Subscribers.
If you prefer a security approach that is independent of
the operating system, you should use SQL Server
authentication.
Note: The SQL Server Agents on all systems
must run on a local or a domain account, not on the
local system account. Otherwise, the replication does
not work.
Operational Considerations
The SQL Server Agent manages the different jobs of the
replication. Schedules for these jobs are defined during the
configuration. Alternatively, you can start each job using
the SQL Server Enterprise Manager.
The Replication Monitor in SQL Server Enterprise Manager
provides the following information:
A list of all publications and attached agents with the
time and the duration of the last execution and information
about the last action
A list of all agents grouped by task with current status
and a history of actions
The event log entries created by Replication Alerts,
which watch the replication process.
If a row was deleted on the subscription database and the
transmission tries to change this row, an error occurs. In
this case, there are two solutions:
Manually restore the row in the subscription database by
using an INSERT statement based on the data from the same
row in the publication database.
Have the replication automatically use a new snapshot to
rebuild the subscription database. This snaphot must be
newer than the first erroneous transmission.
If the replication fails, do the following:
Check to see if the accounts have network access rights.
Check the history of the subscription to determine the
reason the synchronization failed.
Check the Replication Monitor for highlighted
replication failures.
The replication usually runs on a defined schedule. If
you want to test the replication or start the merge
replication immediately, however, you need to start the
replication manually.
Hints: It is common practice to have the
distribution database on the same server as the
Publisher, for example to ensure frequent and rapid
movement of the transactional information from the
transaction log to the distribution database.
If there are many subscribers and a high rate of
operations on the publisher, the Distributor can become
a bottleneck. In this case, you should configure the
Distributor on an additional server to separate the
transmissions from the activity caused by the
applications.
Transferring a large amount of data over slow
conections can be problematic. An alternate solution is
to create the initial snapshot files and transaction
data using removeable media, replicate the database, and
then establish the transactional replication over the
slow network.
Related Patterns
For more information, see the following related patterns:
Patterns that May Have Led You Here
Move Copy of Data. This pattern is the root
pattern of this cluster; it presents the overall
architecture for maintaining copies of data.
Data Replication. This pattern presents the
architecture of data replication, which is a particular way
of moving copies of data.
Master-Slave Replication. This pattern describes
design considerations for transmitting data from the source
to the target by overwriting potential changes in the target
on a higher level than the Master-Slave Snapshot
Replication.
Master-Slave Transactional Incremental Replication.
This pattern presents a solution that transmits only the
changes from the source to the target on a
transaction-by-transaction basis.
Other Patterns of Interest
Master-Slave Snapshot Replication. This
pattern presents the design of snapshot replications, which
this pattern uses to prepare the target.
|