Master-Slave Cascading Replication

Version 1.0.0

GotDotNet community for collaboration on this pattern

Complete List of patterns & practices

Context

You are designing a replication solution for the following requirements:

A replication set is to be replicated from a single source to many targets that all require substantially the same replication data.

The replicated data in the targets is read-only, or if it is updated at the targets by any applications, it is accepted that these updates can be overwritten by later transmissions. This is called a master-slave relationship.

Hence the replication flow is one-way, from the source to the targets, and neither conflict detection nor conflict resolution are triggered at the targets because of target changes.

Figure 1 summarizes this overall replication scenario.

Figure 1: Overall replication scenario

You know you could design direct replication links from the source to each target, but the potential impact on the source, and possibly the source availability, is a concern. Therefore, you want to find another approach that reduces this concern and is also an efficient way to replicate this common replication set to many targets.

Problem

How can you optimize the replication to a set of targets in a master-slave environment, and minimize the impact on the source?

Forces

Any of the following compelling forces would justify using the solution described in this pattern:

Too many passes on the source. Every replication link that starts from a source requires a pass over the replication set to acquire it. The resources (for example, CPU time and I/O activity) needed for the required number of passes might not be available on the source database server, or they may cost too much.

Very large replication set. Even with a moderate number of replication links to the source, the total overhead on the source database server can become unsustainable if the amount of data to be transmitted to the targets is large.

Significant growth in replication needs anticipated. Concerning both of the preceding forces, you anticipate a significant growth in the number of targets and amount of data to be transmitted. Therefore it is important to implement a replication topology that can sustain the predicted growth.

Need to offload replication set from source as quickly as possible. Acquiring data impacts source resources and you must minimize the duration of the impact. For example, if you are replicating across a slow communications link, you may prefer to offload the source quickly and then replicate to the target from this offloaded set.

No direct connection between source and target. Due to your network topology, you might not be able to directly link the source and target, but you can connect to a third place.

The following enabling forces facilitate the move to the solution, and their absence could hinder such a move:

Targets can tolerate the delays implied by replication. The timeliness with which the data arrives at any one of the targets depends on the replication link, which frequently includes a network link. Adding more replication links from the source to the final target generally increases the delay until changes made to the source replication set appear at the target.

Great similarity in the replication sets to be replicated. The core of this pattern is that all the replication data comes from the same original source replication set. Within this fundamental constraint, each replication link can have its own replication set to be replicated, which can differ from the replication set of other replication links. Although the structure differences between each source/target pair might be fairly small, the overall differences could be significant. Data Replication requires that the source and the target of every replication building block be very similar. Master-Slave Cascading Replication requires the similarity of all databases along the whole chain of replication links to be high. Otherwise, an Extract-Transform-Load (ETL) approach would be more useful.

Solution

Increase the number of replication links between the source and target by adding one or more intermediary targets between the original source and the end target databases, as Figure 1 shows. Specifically, this arrangement adds the concept of cascade intermediary target/source (CITS) to the topology, as Figure 2 shows. These intermediaries are data stores that take a replication set from the source, and thus act as a target in a first replication link. Then they act as sources to move the data to the next replication link and so on until they reach the cascade end targets (CETs).

Figure 2: Master-Slave Cascading Replication with a single intermediate target/source

Figure 2 shows a very simple example of a Master-Slave Cascading Replication topology. Each Acquire, Manipulate, and Write (AMW) box in the figure represents a replication link. For more information about the replication building block, see the Data Replication pattern.

In general, several CITSs can be connected to the same source and a CITS can also be connected to several other CITSs. Regardless of the number of CITS, Master-Slave Cascading Replication arranges them in a tree with the source as the root, CITS as inner nodes, and CETs as the leaf nodes.

For discussion purposes, it is helpful to define a few more specific terms for the replication links in a topology:

Initial link. The initial link connects a source to a CITS.

Intermediary link. The intermediary link connects a CITS to another CITS.

End link. The end link connects a CITS to a CET.

The characteristics of the end links are the same as if the targets were connected to the source directly. This means that the end links can be configured for full or incremental replication depending on the requirements, and that they can start a transmission immediately after every transaction, periodically, or on demand.

Hint: The addition of CITS to the replication topology, however, impacts the service level offered to the CETs. The initial and intermediary links must transmit any data or changes early enough for any of the following intermediary or end links. Thus, it is common practice to design an immediate replication here. If all end links only do periodic or on-demand replication, a periodic replication on the initial and intermediary links would be sufficient. For these reasons, you should not design an on-demand replication on an initial or intermediary link, because the timeliness of some of the CITS and their corresponding targets would depend on a user or operator starting the transmission.

The choice of the replication frequency also impacts the choice of the replication refresh policy. If the initial and intermediary links have been configured for immediate replication, you will have to use incremental replication to transmit only the changes. Incremental replication is also generally the best choice to transmit changes for periodic replication at the initial and intermediary links. If the replication sets are small enough, another option is to use a snapshot replication on the initial and intermediary replication links.

Next Considerations

To design a Master-Slave Cascading Replication topology for your environment, you must do the following:

Determine the number of CITS to use.

Design the replication links from the source to the CITS and from the CITS to the CETs.

Determine how much data is required for each CITS.

Define the data structure of the CITSs.

Define the manipulation in each replication link.

The following sections explore these issues.

Number of CITSs

A single CITS removes most of the load from the source database server because there is only a single replication link from the source to the CITS. Thus, the only overhead to the source is that single replication link.

However, if you design just a single CITS, you introduce two new single points of failure: the CITS and the additional replication link. An additional CITS helps to mitigate this effect because you can design an alternative chain of replication links from the source to each of the targets. Although the additional replication links that are now connected to the source cause a slight increase in replication overhead compared to a single replication link, the overall availability increases because the alternative chain acts as a backup to the standard chain.

Figure 3: Master-Slave Cascading Replication with an alternative chain (dotted arrows)

Hint: After you have two (or more) CITSs connected to the source, you can connect parts of the CETs to each of them. This achieves some load balancing on the CITSs because every CITS serves fewer CETs. In case of a failure, the CETs are served by one of the remaining CITSs.

The replication links to both CITSs must transmit the same replication set. Additionally, the CITSs must not be written to by any process but the replication link from the source. If one of these conditions fails, the CITSs could have different data. In that case, they would not be able to serve as substitutes for each other.

More CITSs can also be added if a single CITS cannot serve all CETs. Adding CITSs also increases scalability because you can add new CITSs to accommodate a growing number of CETs. If the number of CITSs consequently impacts the source in an unsustainable way, you can even add another layer of CITSs. This increases the chain length by one replication link, but again frees the source database server from the additional load.

Hint: Adding CITSs can also help you optimize for different replication characteristics because the CITSs can be structured in different ways. If, for example, some of the CETs require snapshot replication, while others require incremental replication, you can optimize the structure of one of the CITS for storing change data and the other one for storing the data itself. The CETs requesting the changes will connect to the first CITS, while the CETs requesting snapshots will connect to CETs that are optimized for the snapshots. Generally, you should look for clusters of CETs with similar replication characteristics and then design a dedicated CITS for each of these clusters. Thereafter, you can optimize every CITS to best support the replications links that have similar characteristics.

Limiting the Number of Replication Links

You must have at least one chain of replication links between the source and every CET to transmit the data or its changes. As described earlier, you can design an alternative chain of replication links from the source to every target to achieve higher availability for the whole system. Do not overdo it by designing too many alternative chains, however, because the additional replication links increase the load on the source. It is best to design at most one standard chain of replication links plus one alternative chain. Furthermore, designing additional replication links should be reserved for when you feel that normal data availability techniques, such as clustering, storage area networks, or hot standbys, are not suitable.

Amount of Data for Each CITS

The source replication set which is stored on each CITS must satisfy the requirements of all the CETs connected. Thus, the amount of data stored on each CITS is the logical union of data requested by any of its CETs and the type of replication being used.

Data Structure for Each CITS

To determine the data structure for each CITS involves, choose one of the following design options:

Matching the data structure of the CITS to the source. This enables the movement of data from the source to the CITS without any additional manipulation overhead. This design is important if the main goal of your cascading replication is to remove any avoidable load from the source.

Matching the data structure of the CITS to the CET superset. In this case, the manipulation is performed only once, namely within the replication link from the source to the CITS. The targets can be fed easily by the contents of the CITS. This provides a higher overall efficiency with the tradeoff of some impact on the source that could have been avoided.

Designing a data structure that differs from both the source and the CETs. If all replication links to the CETs perform incremental replication, the CITSs do not have to store the data - only the changes. In this case, the data structure of the CITSs can be designed for the storage of changes only.

Examples

The following examples present two possible configurations of Master-Slave Cascading Replication.

Different Lengths of Replication Chains

This first example assumes that you have a single source and a large number of CETs. A small number of the CETs receive snapshots, while the others are served by incremental replication. The snapshot replication is transmitted by way of a single CITS. The number of CETs served by incremental replication is too large to be served by a single CITS, however. To minimize the impact on the source, you could design two levels of CITS, each with a single CITS that is connected to the source. Figure 4 shows the resulting replication topology where thick arrows represent replication links with snapshot replication and thin arrows represent replication links with incremental replication.

Figure 4: Master-Slave Cascading Replication topology with different chain lengths

Two Sources and Conflict Detection and Resolution

Figure 5 shows a replication topology in which a CET participates in two master-slave cascading replications.

Figure 5: Master-Slave Cascading Replication from two sources

If the replication sets of Source 1 and Source 2 do not intersect, then replication from Source 1 by way of CITS 1 to CET 2 always affects different records than those from Source 2 by way of CITS 2. Thus, no special attention is required in CET 2 to handle both replication chains.

However, if the replication sets of Source 1 and Source 2 do intersect, the same CET 2 record can be affected by both the replication from Source 1 through CITS 1 and Source 2 through CITS 2. Resolving the discrepancy requires the ability to detect and resolve conflicts in CET 2. The same applies if two or more sources feed the same CITS.

Note: The conflict detection and resolution is not triggered by updates having occurred at the target, which is why this is not a master-master pattern. In this case, the trigger is that different updates occurred at two sources. However, the concepts described in Master-Master Replication still apply to solving this problem.

Resulting Context

This pattern inherits the benefits and liabilities from the Data Replication pattern and has the following additional benefits and liabilities:

Benefits

Source is freed from most of the replication load. This is the most important benefit of a Master-Slave Cascading Replication. Only the first replication link adds load to the source. The remaining replication links do not burden the source. The CITS generally should not serve any applications so that conflicting operational demands between applications and replication services can be avoided.

CETs can be relatively autonomous. Using a CET is a good way to provide data to other organizations because you can pass raw data on to the organizations and they can use the data however they want. Because you cannot force another organization to pull the data frequently, though, this could impact your source database system (for example, if the organization connected to your database directly). Master-Slave Cascading Replication liberates the source from this impact; a CITS is more appropriate to handle the impact because it does not serve any applications.

Adding more targets does not impact the source. As your business requires more CETs you can add them without overburdening the source.

Liabilities

Increased latency. Because the chains from the source to the targets are longer compared to direct replication, the delays in getting the replication set to the CETs can increase. Most implementations of this pattern use an immediate replication on the replication links to minimize this liability.

Potential for decreased availability. The longer chains from the source to the targets have an impact on the overall availability as well. As the number of links in the chain increases, the opportunity for failures increases. You can address this liability by adding a second CITS and alternative chains in case of failures. A second CITS also offers the opportunity for load balancing by connecting half of the targets to each of the CITSs.

Additional administration and management.Master-Slave Cascading Replication adds databases and replication links that must be administrated and managed. The whole replication environment should be controlled by management tools for an automatic surveillance of the ongoing operation.

Extra storage cost. The CITS will add storage requirements to the overall environment.

Additional change management. Structural changes to the source or the CETs require more attention because the CITS have to be adjusted appropriately. You should precisely plan and design the changes on all affected databases.

Operational Considerations

By applying Master-Slave Cascading Replication, most of the replication overhead is loaded on the CITS. Hence, it is common practice that the CITSs do not serve any applications. Instead, the applications are connected to the source and the targets only. All applications requiring write access of the database must be connected to the source.

Related Patterns

For more information, see the following related patterns:

Patterns That May Have Led You Here

Move Copy of Data. This is the root pattern of this cluster. It presents the fundamental data movement building block that consists of source, data movement set, data movement link, and target. Transmissions in such a data movement building block are done asynchronously (or eventually) after the update of the source. Thus, the target applications must tolerate a certain amount of latency until changes are delivered.

Data Replication. This pattern presents the architecture of a replication.

Master-Slave Replication. This pattern presents the solution for a replication where the changes are replicated to the target without taking changes of the target into account. It will eventually overwrite any changes on the target.

Patterns That You Can Use Next

Implementing Master-Slave Transactional Incremental Replication Using SQL Server.

Other Patterns of Interest

Master-Slave Snapshot Replication. This pattern presents a solution that transmits the whole replication set from the source to the target on each transmission.

Master-Slave Transactional Incremental Replication. This pattern presents a solution that transmits only the changes from the source to the target on a transaction-by-transaction basis.