Directions for using
template:
Read the Guidance
(Arial blue font in brackets) to understand the
information that should be placed in each section of this template. Then delete
the Guidance and replace the placeholder within <<Begin text here>>
with your response. There may be additional Guidance in the Appendix of some
documents, which should also be deleted once it has been used.
Some templates have four levels of headings. They are not indented, but can be differentiated by font type and size:
You may elect to indent sections for readability.
Author |
|
Author
Position |
|
Date |
|
Ó 2002 Microsoft Corporation. All rights reserved.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
Microsoft
and Visual Basic are either registered trademarks or trademarks of Microsoft in
the
Change Record
Date |
Author |
Version |
Change Reference |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reviewers
Name |
Version Approved |
Position |
Date |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Distribution
Name |
Position |
|
|
|
|
|
|
|
|
Document Properties
Item |
Details |
Document Title |
Backup and Recovery Plan |
Author |
|
Creation Date |
|
Last Updated |
|
[Introduction to the
Template
Description:
The Backup and Recovery Plan presents the aspects of the solution relevant to
backup and recovery, identifies and describes weaknesses in the system, and
describes backup methods and recovery steps. This plan should encompass several
different scenarios, accounting for different types of failure. This could
include steps for replacing hardware, rebuilding/modifying/replacing the
operating system and applications, restoring data, or hot backup systems that
stand in for a failed solution.
Justification:
This
plan is a key component of the solution. Having the plan in place ensures that
comprehensive backup and recovery steps will be included in the deployment
process. This leads to a solution that meets its availability requirements even
if something does fail. It also prevents the compounding of failures when they
do occur. Continuous service by the solution will increase customer satisfaction
and confidence in that solution.
{Team
Role Primary: Release Management is responsible for developing the
Backup and Recovery Plan. Development also plays a primary role in
creating the plan’s content to ensure the feasibility of the technical
implementation. Program Management will incorporate the Backup and
Restore Plan into the Master Project Plan.
Team
Role Secondary: All
team roles are responsible for reviewing the plan’s content to ensure its
execution is feasible.}]
[Description:
Provide an overall summary of the contents of this
document.
Justification:
Some
project participants may need to know only the plan’s highlights, and
summarizing creates that user view. It also enables the full reader to know the
essence of the document before they examine the details.]
<<Begin text here>>
[Description:
The Objectives section defines the objectives of the backup and recovery
process. This information should be derived from information about the current
operational environment as well as business requirements and functional
specifications. One consistent objective critical to the customer is to ensure
reliable solution operations with a minimum of down time.
Justification:
Identifying the objectives signals to
the customer that Microsoft has carefully considered the present operational
situation, the business requirements, and the solution and created an
appropriate backup and recovery approach.]
<<Begin text here>>
[Description:
The
Description of Solution section presents key aspects of the solution that are
relevant to the backup and recovery process.
Justification:
These
solution aspects will drive the development of a viable backup and recovery
plan.]
[Description:
The Recovery Response Time section defines for each type of solution failure the
time estimated (minimum, average, maximum) to recover and resume
operations.
<<Begin text here>>
[Description:
Critical solution components without redundancy constitute
single-points-of-failure; that is, their failure or degradation causes the
solution to fail or to become degraded. The Single Points of Failure section
identifies solution components (hardware, operating system, applications,
infrastructure, procedures, people) that are
single-points-of-failure.]
<<Begin text here>>
[Description:
Latency is the hidden and often unpredictable time from a failure occurrence (of
a critical solution component or an entire solution) to the point where its
affect on other components or systems has been recognized. The Latency section
defines for each type of failure the other components and systems that may be
affected, describes the effect, and estimates the ranges of latency
times.]
<<Begin text here>>
[Description:
When critical solution components (hardware power supplies, CPUs, data storage
devices, key people) fail or become degraded, solution failures can be avoided
or minimized by providing redundant copies of these components that can be
brought on-line quickly or that operate in parallel to their counterparts. The
System Redundancy section identifies the critical solution components for which
the solution provides redundancy and describes how the redundant components will
be brought on line.]
<<Begin text here>>
[Description:
The Data Integrity section describes how the solution will fully account for the
methods for handling data integrity – such as queuing or real time backup. The
importance of data integrity becomes fundamental where solutions use systems
that record online transactions or have elements that use data that represent a
snapshot from an earlier day's processing.
Justification:
Data integrity must be planned for to prevent data loss or corruption that may
result in significant disruption in the solution, thus impacting the users and
potentially the business.]
<<Begin text here>>
[Description:
The
Business Cost While Systems Are Down section estimates
by periods of time the costs to the business of the solution being unavailable
because of failure, preventative maintenance, or other
reasons.]
<<Begin text here>>
[Description:
The Backup and Recovery Methods section describes the methods planned to backup
the hardware, operating system(s), applications, infrastructure, resources, and
data that comprise the solution. The description should include for each of
these solution component classes: the type of backup, location of backups,
backup procedures, and backup responsibilities. For each backup method, describe
the procedures for using the backup to restart the solution and recover the
state of its operations and the solution data.]
[Description:
At predetermined checkpoints (after key events or time periods) a solution may
backup (store) a snapshot of its operational state and the information it has
processed. Restoring the solution state and information from backup media (e.g.,
tape) enables past information to be reconstructed and the solution to resume
operation with a minimum of lost data and time. The Restore from Backup Media
section identifies solution checkpoints and the procedures for using backup
solution status information to recover from solution failures or
degradation.]
<<Begin text here>>
[Description:
Operations personnel and operating systems maintain logs (log files) of solution
events and their time of occurrence. Replaying log files often enables past
information to be reconstructed. The Replay Log Files section describes the log
files that operations will maintain, the procedures used to record events and
time in the logs, and the procedures employed to reconstruct solution
information from the log files.]
<<Begin text here>>
[Description:
The use of a fail-over system (redundant system[s] operating in parallel with a
primary system) prevents the loss of a minimal amount of data and is used to
reconstruct the data on the primary system. The Fail Over section identifies and
describes fail-over systems, the procedures for keeping fail-over systems
current with the primary system and for starting up their operations, and the
procedures for reconstructing lost or corrupted data.]
<<Begin text here>>
[Description:
The Recovery Steps section describes the detailed procedures (with steps and
decisions) for restarting solution operations and restoring solution data for
the state of the solution determined at the closest checkpoint prior to
failure.]
[Description:
The Restoring Service from Backup Systems section describes how service will be
restored by using stand by (backup) systems. This can consist of using having a
"hot stand by" with automated fail over or by swapping the failed system with
spare systems already configured for use.]
<<Begin text here>>
[Description:
The Hot Stand By section describes the hot standby systems ready for use when
needed.]
<<Begin text here>>
[Description:
The Spare Systems section describes the spare systems, identifies where they are
located, and details the steps required to bring up the solution on a spare
system.]
<<Begin text here>>
[Description:
The System Recovery section describes how system recovery
occurs.]
<<Begin text here>>
[Description:
The Data Recovery section defines how data will be recovered. The requirements
for data recovery are primarily dependent on the
application:
<<Begin text here>>