Module 2-4 Disaster/Contingency Planning PDF
Document Details
Uploaded by StimulatingSpinel
Tags
Summary
This document covers disaster/contingency/operational/crisis planning, including disaster planning and recovery, backup/restore processes, offsite storage, and alternate power. It also discusses what a disaster is and various types of disasters to consider.
Full Transcript
**Module 2-4. Disaster/Contingency/Operational/Crisis** OBJECTIVE 4a. Identify basic facts and terms about disaster, contingency, operational, and crisis situations. - Disaster Planning and Recovery - Backup/Restore Process - Offsite Storage - Alternate Power **DISASTER PLANNING AND...
**Module 2-4. Disaster/Contingency/Operational/Crisis** OBJECTIVE 4a. Identify basic facts and terms about disaster, contingency, operational, and crisis situations. - Disaster Planning and Recovery - Backup/Restore Process - Offsite Storage - Alternate Power **DISASTER PLANNING AND RECOVERY** Like every large network, AF networks are vulnerable to a variety of disruptions ranging from mild (e.g., short-term power outage, disk drive failure) to severe (e.g., equipment destruction, fire) from a variety of sources such as natural disasters to terrorists actions. Vulnerabilities can be minimized or eliminated through technical, management, or operational solutions as part of the organization's risk management effort. However, it is virtually impossible to eliminate all risks. In many cases, critical resources may reside outside the organization's control (Ex: electric power or telecommunications) and the organization may be unable to ensure their availability. Thus, effective planning, execution, and testing are essential to mitigate the risk of system and service unavailability. **What Is a Disaster?** The textbook definition of a disaster is "a sudden, unplanned event that causes great damage and loss to an organization." The time factor determines whether the interruption in IT service delivery is an inconvenience or a disaster. The time factor varies from organization to organization, of course. What does the face of disaster look like? What types of disasters should you consider? The list below is by no means complete, but it should give you an appreciation of the types of disaster you might wish to evaluate. Air Conditioning Failure Blackout Blizzard Bomb Threat Chemical Spill Civil Unrest Communications Failure Computer Crime Disgruntled Employee Earthquake Explosion Fire Flood Hardware Failure Heat Human Error Hurricane Ice Storm Labor Dispute Lightning Strike Malicious Activity Mud Slide Plane Crash Power Outage Sabotage Sewage Backup Software Failure Sprinkler Failure Tornado Train Derailment Vandalism Virus Water Time is of the essence when recovering your company's lost data. While IT is busy recovering your company data, your customers might be busy contacting other suppliers. Just-in-time manufacturing, distribution, and electronic commerce all have put a premium on systems availability and access to corporate data. **What Is Disaster Recovery?** Disaster recovery is your IT response to a sudden, unplanned event that will enable your organization to continue critical business functions until normal IT-related services can resume. Disaster recovery must address the continuation of critical business operations. A major incorrect assumption made in our industry is that disaster recovery can be fully realized by simply prearranging for hardware replacement with your business partner or channel distributor. **Planning for Disaster** Continuity and contingency planning are critical components of emergency management and organizational resilience. They are often confused in their use. **Continuity Planning** Normally applies to the mission/business itself. It is concerned with the ability to continue critical functions and processes during and after an emergency event. **Contingency Planning** Normally applies to information systems and provides the steps needed to recover the operation of all or part of designated information systems at an existing or new location in an emergency. In general, universally accepted definitions for information system contingency planning and the related planning areas have not been available. Occasionally, this leads to confusion regarding the actual scope and purpose of various types of plans. To provide a common basis of understanding regarding information system contingency planning, Table 2-4 identifies several other types of plans and describes their purpose and scope. Because of the lack of standard definitions for these types of plans, the scope of actual plans developed by organizations may vary from the descriptions below. This guide applies the descriptions and references in sections below to security and emergency management-related plans. **Continuity of Operations (COOP)** COOP plans are mandated for organizations by the National Security Presidential Directives and by the Federal Executive Branch National Continuity Program and Requirements. These Federal Directives distinguish COOP plans as a specific type of plan not be confused with ISCPs, DRPs, or BCPs The COOP is one of the most important contingency plans, and it focuses on restoring an organization's mission essential functions at an alternate site and performing those functions for up to 30 days before returning to normal operations. Additional functions, or those at a field office level, may be addressed by a BCP. Minor threats or disruptions not requiring relocation to an alternate site are typically not addressed in a COOP plan. Standard elements of a COOP plan include: - Program plans and procedures - Continuity communications - Risk management - Vital records management - Continuity facilities - Human capital - Essential functions - Test, training, and exercise - Order of succession - Devolution - Delegation of authority - Reconstitution **BACKUP/RESTORE PROCESS** Everyone should be aware of the importance of backing up critical data. If you aren't aware, you will likely become painfully aware on the day after one of your servers or partitions crashes and there is no recoverable data. As a System administrator, you need to bring all your key processes and procedures together through a backup solution that is reliable, flexible, architecturally compliant, and recoverable. Data is the backbone of today's organizations. Information is your organization's most valuable asset. Therefore, immediate recovery and access to data after an outage is the key to every business's survival. When data is lost, damaged, or simply unavailable, it negatively affects and even completely halts your business. Adequate backups protect against permanent loss of data, but the total time it takes to rebuild and recover from a disaster can be a catastrophic business consequence if a solution is not in place. **Data Backup Strategy** The goal of all data backup jobs is to ensure lost data, no matter how it got lost, can be recovered quickly, efficiently and as completely as possible. The backup process is a critical function to ensure business continuity for the system. The backups must be executed regularly and conform to a concise recovery program design. The system administrator must have the ability to restore all of its data to a consistent usable state, which minimizes the impact of your applications. That is your primary goal. You should perform a system audit of your backup and recovery program design and verify its completeness. **Performing Data Backups** How often you run a backup is dependent on how much data you can afford to lose if the hard drive fails and the data is lost. Depending on the mission you are supporting, a few hours of data may be too much data to lose. You have to remember the backup is as critical as the current data. If the data changes at a rapid rate, you may have to do backups at a rapid pace as well. **Data Backup Methods** **Full Data Backup** The Full Backup is the starting point for all other backups. The Full Backup contains all files and folders, regardless of whether the data is new or was unmodified. Each time you execute the Full Backup, the entire data set is copied. After backing up the files, a Full Backup unchecks (turns "off") the file's archive bit (flag). Because this backup method stores all files and folders every time it is run, frequent full backups result in faster and simpler recovery operations. However, this type of backup also requires the most disk space and time. This is due because unchanged, already backed up data, becomes multiplied several times. On the other hand, for some types of data (i.e., when you create a complete disk image) a Full Backup is the only effective option. **Incremental Data Backup** An Incremental Backup processes only new files or files changed since the last Full or Incremental Backup. It saves only those files whose archive bit is checked. After backing up these marked files, an Incremental Backup unchecks the archive bit for every files saved. When the data needs to be restored, the Full Backup is laid down first and then each Incremental Backup is laid down on top of it in the proper sequence An advantage of Incremental is only the files created or modified are backed up allowing the process to be faster and it requires less storage space. The disadvantage is it's the slowest of the backup methods during the restoration process. **Differential Data Backup** Like Incremental Backups, a Differential Backup copies only selected files and folders changed since the last Full Backup as indicated by the archive flag. The copied files and folders are then marked for subsequent backups regardless of whether or not that file was subsequently changed. In other words, a Differential Backup does not uncheck the archive flag for files it has backed up. Therefore, this method copies all modified files since the last full backup. When data need to be restored, the Full Backup is laid down first, and then the most recent Differential Backup is laid down on top of it. This approach requires more disk space as each Differential Backup created will be larger than the previous one. It does enable faster restoration of data as compared to Incremental Backups. **PRIORITY RESTORATION PLAN** As an AF network administrator, there may be a requirement to follow an established contingency plan. Certain network systems and devices are of primary consideration in a contingency backup and restoration plan. The priority systems included in any contingency plan are: - E-mail servers - DHCP servers - Domain Controllers - File servers containing mission critical information - Web servers - Specialized systems necessary in a war zone environment This list is just an example of items that should be included in the contingency plan. Each organization should have a list of critical devices needing protecting. The list can be compiled from a network study group made up of IT professionals/management/technicians in the network center. The primary means for backing up these critical systems falls to the backup application/file system. In this system, all, or most of the data residing on the previous listed systems is placed on a backup medium. The Contingency Plan should prioritize the systems for the basis of protecting them from disasters and prioritizing the order for bringing those systems back online after a disaster/contingency. By prioritizing, the network administrator may make more informed, tailored decisions regarding contingency resource allocations and expenditures saving time, effort, and costs to continue an organization's mission with minimal delay. It is important to remember you must not only have back-ups of the data, but also have the necessary software and hardware to restore services in a timely manner. Backups may be rendered in effective if you don't have the necessary software and hardware. Administrators should keep copies of all software licenses and, at minimum, a listing of all equipment required to restore mission effectiveness. Some software requires dongles or hardware keys that must be physically attached to the system for it to function. Ensure these hardware devices are available should processing be moved to an alternate facility. **OFFSITE STORAGE** **ALTERNATE SITES** Although major disruptions with long-term effects may be rare, they should be accounted for in any contingency plan. Thus, the Continuity of Operations Plan must include a strategy to recover and perform system operations at an alternate facility for an extended period. [General Categories of Alternate Sites] In general, three types of alternate sites are available: \- Dedicated site owned or operated by the organization \- Reciprocal agreement or memorandum of agreement with an internal or external entity \- Commercially leased facility Regardless of the type of alternate site chosen, the facility must be able to support system operations as defined in the Continuity of Operations Plan. The three-offsite facilities types may be categorized in terms of their operational readiness. Based on this factor, sites may be identified either cold sites, warm sites, or hot sites. These sites are not owned by the organization. They require a MOA prior to being listed in the plan. **Types of Alternate Sites** **Cold Sites** Cold Site typically consists of a facility with adequate space and infrastructure (electric power, telecommunications connections, and environmental controls) to support the IT system. The space may have raised floors and other attributes suited for IT operations. The site does not contain IT equipment and usually does not contain office automation equipment such as telephones, fax machines, or copiers. The organization using the cold site is responsible for providing and installing necessary equipment as well as telecommunications capabilities. It may take weeks to get the site activated and ready for work. The cold site could have equipment racks and dark fiber (fiber that does not have the circuit engaged) and maybe even desks, but would require equipment from the client since it does not provide any. Although the cold site is the least expensive option, it takes the most time and effort to actually get up and functioning after a disaster. **Warm Sites** Warm Sites are partially equipped office spaces containing some or all of the system hardware, software, telecommunications, and power sources. The warm site is maintained in an operational status ready to receive the relocated system. The site may need to be prepared before receiving the system and recovery personnel. In many cases, a warm site may serve as a normal operational facility for another system or function. In the event of Continuity of Operations Plan activation, the normal activities are displaced temporarily to accommodate the disrupted system. **Hot Sites** Hot Sites are office spaces appropriately sized to support system requirements and fully configured. They are ready to operate within a few hours. The only missing resources from a hot site is the data (which will be retrieved from a backup site) and the people to process the data. The equipment and system software must absolutely be compatible with the data being restored from the main site and must not cause any negative interoperability issues. These sites are a good choice for an organization needing to ensure a site is available as soon as possible. Most hot-site facilities support annual tests done by the organization to ensure the site is functioning in the necessary state. This is the most expensive of the three types of offsite facilities and can have problems if a company requires proprietary or unusual hardware and software. **Mirrored Sites** Some organizations choose to have redundant sites or mirrored sites meaning one site is equipped and configured exactly like the primary site. This serves as a redundant environment. These sites are owned by the organization and are mirrors of the original production environment. This is one of the most expensive backup facility options because a full environment must be maintained even though it usually is not used for regular production activities until after a disaster takes place triggering the relocation of the mission to the redundant site. Many organizations are subjected to regulations dictating they must have redundant sites in place so expense is not an issue in these situations. **Mobile Hot Site** Another type of facility-backup option is a rolling hot site or mobile hot site. This type of site is a large truck or a trailer turned into a data processing or systems allowing for immediate processing. The trailer can be brought to the company's parking lot or another location. Another solution is prefabricated buildings that can be quickly and easily assembled. Military organizations have rolling hot sites or trucks preloaded with equipment because they often need the flexibility to relocate quickly some or all of their processing facilities to different locations around the world depending on where the need arises **Multiple Processing Centers** Another option for organizations is to have multiple processing centers. An organization may have ten different facilities throughout the world, which may include products and technologies. They are able to move all data processing from one facility to another in a matter of seconds when an interruption is detected. This technology can be implemented within the organization or from one facility to a third-party facility. Certain service bureaus provide this type of functionality to their customers. So if an organization's data processing is interrupted, all or some of the processing can be moved to the service bureau's servers. **Hotbox** A hotbox, or recovery box, is a tool to help aid in the process for disaster recovery. This box would contain specific items that technical staff need if the building is not accessible to rebuild your servers. The box would contain an up-to-date copy of the disaster recovery plan, all server software infrastructure components, and the recovery CDs that have been identified and designated for use when a disaster occurs. Seal the box to ensure security best practices. Set up a two-hotbox rotation. The contents of both boxes are identical. One box resides at the offsite storage; the other resides in the computer room for localized support issues and maintenance. For additional redundancy, a third box can be stored at your hot site or alternate processing facility. When changes to the contents of the hotboxes are necessary, the box at the home site is first updated. The new content is then shipped to your offsite storage provider and swapped with the current box stored. The box and an inventory listing of its contents are both critical vital records and should be documented as such in the DR plan. Here are some items that could be included in the hotbox: \- A complete, printed copy of the disaster recovery plan. \- A complete copy of the DR plan, suitable for viewing through a file browser. This copy will also be on a CD-ROM. \- A sealed envelope containing the root, QSECOFR or administrator passwords for every computer system. Whenever the seal is broken on an envelope containing these passwords, all of the passwords within that envelope must be changed to avoid security problems. \- Software including but not limited to OS's, applications, backup's, licensees and standalone equipment \- Documentation CDs: o Networking documentation set o Red Books documentation set o CISCO router and firewall documentation o Device support facilities documentation Set o ERP software manuals (Peoplesoft, SAP, BPCS, etc.) **ALTERNATE POWER** Unscheduled Service Interruptions (USI) are those unscheduled network, equipment, or application outages or degradations caused by such things as environmental problems (e.g., fire, flood, loss of power, loss of air conditioning), equipment malfunctions, system crashes, etc. In some cases, electrical outage impacts identified during the preparation of a Contingency Plan may be mitigated or even eliminated through preventive measures that deter, detect, and/or reduce impacts to the system using an Uninterruptable Power Supplies (UPS) or with backup power (i.e. generator). **Electrical Power Supply** A conditioned electrical power source provides a continuous power stream at the proper voltage, and phasing to your server room. Power conditioning maintains the quality of the electricity for greater reliability of your servers. It involves filtering out stray magnetic and electrical fields that can induce unwanted inductance and unwanted capacitance, and surge-suppression to prevent severe voltage spikes. To further ensure power into your facility, select two separate demarcation points for hydro power into your building. \- Here are questions to consider about your power supply: \- What is the present electrical load capacity in your facility? \- Are you maxed out? Future capacity might not come cheap! \- What capability does the facility have to support itself if power from the main power grid becomes unavailable? \- What considerations do you make for static electricity and space heaters? **UPS** A UPS is a critical component of the server room's power-support system. The UPS, sometimes referred to as the battery backup, is a device that maintains a continuous supply of electrical power to all your connected equipment. The UPS equipment is placed between the outside flow of power (your primary power source) from your regulated commercial power utility and the computer room hardware. The UPS provides a dual purpose. First, it provides protection from power irregularities, such as spikes or brown-outs. These fluctuations in power can cause costly damage to computer equipment. The second role of the UPS is to act as a bridge, supplying interim power between the time when external power is lost and your alternate power source (like a diesel generator) kicks in. UPS system batteries supply power in lieu of power from your utility for only a predetermined period of time. Typically, this is long enough for the power to be restored (in a short outage) or for the backup generator to be brought online for a prolonged outage. A UPS provides protection from a momentary power interruption only. The amount of time it can supply electricity depends on the battery capacity of the UPS and the number of servers or other piece of equipment connected to it. There are several common power problems that UPS units correct: \- Power failure---The total loss of utility power \- Power sag (drop)---A short-term under-voltage \- Power surge (spike)---A quick burst of over-voltage \- Under-voltage (brownout)---Low-line voltages for an extended period of time \- Over-voltage---Increased voltages for an extended period of time \- Frequency---A variation of the power waveform \- Harmonic distortion---A power frequency superimposed on the power waveform **Diesel and Natural-Gas Generators** You must always ask yourself, what capability does your facility have to support itself if power from the main utility grid goes down? The alternative usually is to invest in an emergency-power generator solution. We have all witnessed the importance of power and the potential for future power disruptions. The cost justification has become quite simple for a generator. Emergency generators provide power in the event of a prolonged interruption of utility power. Generators must be sized according to the load they are expected to support and the length of time they are expected to support that load. Many people assume that a generator does not provide protection from a momentary power interruption or spike, as this is the role of a UPS. The UPS provides enough standby power to allow the generator to properly come online to service your equipment. Generators can be switched into service either manually through human intervention or automatically via an automatic transfer switch. In a highly redundant power supply configuration such as in Figure 2-9, there must be at least two generators and two UPSs, each one with the capacity to generate enough electricity at full load to power the entire computer room and potentially critical locations throughout your office facility. The real question is, how often are the generators tested? Are the backup generators tested under full load to simulate real power demands for several hours? Put into place a regular testing schedule for the diesel generators to ensure that staff members are trained in the operation and that the generators work when you need them most, which is during a power outage.