<<Project Name>>

Backup and Recovery Plan

Customer Name

Directions for using template:

Read the Guidance (Arial blue font in brackets) to understand the information that should be placed in each section of this template. Then delete the Guidance and replace the placeholder within <<Begin text here>> with your response. There may be additional Guidance in the Appendix of some documents, which should also be deleted once it has been used.

Some templates have four levels of headings. They are not indented, but can be differentiated by font type and size:

Heading 1 – Arial Bold 16 font
Heading 2 – Arial Bold Italic 14 font
Heading 3 – Arial Bold 13 font
Heading 3 – Arial Bold Italic 12 font

You may elect to indent sections for readability.

Author
Author Position
Date

Version: 1.0

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.

Microsoft and Visual Basic are either registered trademarks or trademarks of Microsoft in the United States and/or other countries.

Revision & Sign-off Sheet

Change Record

Date	Author	Version	Change Reference

Reviewers

Name	Version Approved	Position	Date

Distribution

Name	Position

Document Properties

Item	Details
Document Title	Backup and Recovery Plan
Author
Creation Date
Last Updated

Description: The Backup and Recovery Plan presents the aspects of the solution relevant to backup and recovery, identifies and describes weaknesses in the system, and describes backup methods and recovery steps. This plan should encompass several different scenarios, accounting for different types of failure. This could include steps for replacing hardware, rebuilding/modifying/replacing the operating system and applications, restoring data, or hot backup systems that stand in for a failed solution.

Justification: This plan is a key component of the solution. Having the plan in place ensures that comprehensive backup and recovery steps will be included in the deployment process. This leads to a solution that meets its availability requirements even if something does fail. It also prevents the compounding of failures when they do occur. Continuous service by the solution will increase customer satisfaction and confidence in that solution.

{Team Role Primary: Release Management is responsible for developing the Backup and Recovery Plan. Development also plays a primary role in creating the plan’s content to ensure the feasibility of the technical implementation. Program Management will incorporate the Backup and Restore Plan into the Master Project Plan.

Team Role Secondary: All team roles are responsible for reviewing the plan’s content to ensure its execution is feasible.}]

Summary

Justification: Some project participants may need to know only the plan’s highlights, and summarizing creates that user view. It also enables the full reader to know the essence of the document before they examine the details.]

Objectives

[Description: The Objectives section defines the objectives of the backup and recovery process. This information should be derived from information about the current operational environment as well as business requirements and functional specifications. One consistent objective critical to the customer is to ensure reliable solution operations with a minimum of down time.

Justification: Identifying the objectives signals to the customer that Microsoft has carefully considered the present operational situation, the business requirements, and the solution and created an appropriate backup and recovery approach.]

Description of Solution

[Description: The Description of Solution section presents key aspects of the solution that are relevant to the backup and recovery process.

Justification: These solution aspects will drive the development of a viable backup and recovery plan.]

Recovery Response Time

[Description: The Recovery Response Time section defines for each type of solution failure the time estimated (minimum, average, maximum) to recover and resume operations.

Single Points of Failure

[Description: Critical solution components without redundancy constitute single-points-of-failure; that is, their failure or degradation causes the solution to fail or to become degraded. The Single Points of Failure section identifies solution components (hardware, operating system, applications, infrastructure, procedures, people) that are single-points-of-failure.]

Latency

[Description: Latency is the hidden and often unpredictable time from a failure occurrence (of a critical solution component or an entire solution) to the point where its affect on other components or systems has been recognized. The Latency section defines for each type of failure the other components and systems that may be affected, describes the effect, and estimates the ranges of latency times.]

System Redundancy

[Description: When critical solution components (hardware power supplies, CPUs, data storage devices, key people) fail or become degraded, solution failures can be avoided or minimized by providing redundant copies of these components that can be brought on-line quickly or that operate in parallel to their counterparts. The System Redundancy section identifies the critical solution components for which the solution provides redundancy and describes how the redundant components will be brought on line.]

Data Integrity

[Description: The Data Integrity section describes how the solution will fully account for the methods for handling data integrity – such as queuing or real time backup. The importance of data integrity becomes fundamental where solutions use systems that record online transactions or have elements that use data that represent a snapshot from an earlier day's processing.

Justification: Data integrity must be planned for to prevent data loss or corruption that may result in significant disruption in the solution, thus impacting the users and potentially the business.]

Business Cost While Systems Are Down

[Description: The Business Cost While Systems Are Down section estimates by periods of time the costs to the business of the solution being unavailable because of failure, preventative maintenance, or other reasons.]

Backup and Recovery Methods

[Description: The Backup and Recovery Methods section describes the methods planned to backup the hardware, operating system(s), applications, infrastructure, resources, and data that comprise the solution. The description should include for each of these solution component classes: the type of backup, location of backups, backup procedures, and backup responsibilities. For each backup method, describe the procedures for using the backup to restart the solution and recover the state of its operations and the solution data.]

Restore from Backup Media

[Description: At predetermined checkpoints (after key events or time periods) a solution may backup (store) a snapshot of its operational state and the information it has processed. Restoring the solution state and information from backup media (e.g., tape) enables past information to be reconstructed and the solution to resume operation with a minimum of lost data and time. The Restore from Backup Media section identifies solution checkpoints and the procedures for using backup solution status information to recover from solution failures or degradation.]

Replay Log Files

[Description: Operations personnel and operating systems maintain logs (log files) of solution events and their time of occurrence. Replaying log files often enables past information to be reconstructed. The Replay Log Files section describes the log files that operations will maintain, the procedures used to record events and time in the logs, and the procedures employed to reconstruct solution information from the log files.]

Fail Over

[Description: The use of a fail-over system (redundant system[s] operating in parallel with a primary system) prevents the loss of a minimal amount of data and is used to reconstruct the data on the primary system. The Fail Over section identifies and describes fail-over systems, the procedures for keeping fail-over systems current with the primary system and for starting up their operations, and the procedures for reconstructing lost or corrupted data.]

Recovery Steps

[Description: The Recovery Steps section describes the detailed procedures (with steps and decisions) for restarting solution operations and restoring solution data for the state of the solution determined at the closest checkpoint prior to failure.]

Restoring Service from Backup Systems

[Description: The Restoring Service from Backup Systems section describes how service will be restored by using stand by (backup) systems. This can consist of using having a "hot stand by" with automated fail over or by swapping the failed system with spare systems already configured for use.]

Hot Stand By

[Description: The Hot Stand By section describes the hot standby systems ready for use when needed.]

Spare Systems

[Description: The Spare Systems section describes the spare systems, identifies where they are located, and details the steps required to bring up the solution on a spare system.]

System Recovery

[Description: The System Recovery section describes how system recovery occurs.]

Data Recovery

[Description: The Data Recovery section defines how data will be recovered. The requirements for data recovery are primarily dependent on the application: