Intro to Data Center Design PDF
Document Details
Uploaded by saleemonline
Tags
Summary
This document provides an introduction to data center design, covering fundamental principles such as simplicity, flexibility, scalability and modularity for creating effective and efficient data centers.
Full Transcript
A Modern Pantheon A modern data center still shares many aspects with ancient architecture, structurally and in service. The form literally follows the function. The purpose of both the Pantheon and a data center is to provide services. To provide services, its requirements for continual functioning...
A Modern Pantheon A modern data center still shares many aspects with ancient architecture, structurally and in service. The form literally follows the function. The purpose of both the Pantheon and a data center is to provide services. To provide services, its requirements for continual functioning must be met. This is the design team's primary concern. The design of the data center must revolve around the care and feeding of the service providing equipment. These functional requirements of the data center are: A place to locate computer, storage, and networking devices safely and securely To provide the power needed to maintain these devices To provide a temperature-controlled environment within the parameters needed to run these devices To provide connectivity to other devices both inside and outside the data center In the design philosophy of this book, these needs must be met and in the most efficient way possible. The efficiency of the data center system relies entirely on the efficiency of the design. The fundamental principles of a data center philosophy should be your guiding principles. The phrase "design philosophy" could have many different meanings. For the purposes of this book we'll use the following definition: A design philosophy is the application of structure to the functional requirements of an object based on a reasoned set of values. Fundamentals of the Philosophy There are five core values that are the foundation of a data center design philosophy: simplicity, flexibility, scalability, modularity, and sanity. The last one might give you pause, but if you've had previous experience in designing data centers, it makes perfect sense. Design decisions should always be made with consideration to these values. Keep the Design as Simple as Possible A simple data center design is easier to understand and manage. A basic design makes it simple to do the best work and more difficult to do sloppy work. For example, if you label everything— network ports, power outlets, cables, circuit breakers, their location on the floor—there is no guess work involved. When people set up a machine, they gain the advantage of knowing ahead of time where the machine goes and where everything on that machine should be plugged in. It is also simpler to verify that the work was done correctly. Since the locations of all of the connections to the machine are pre-labeled and documented, it is simple to record the information for later use, should the machine develop a problem. Figure 1-1. Simple, Clean, Modular Data Center Equipment Room Design for Flexibility Nobody knows where technology will be in five years, but it is a good guess that there will be some major changes. Making sure that the design is flexible and easily upgradable is critical to a successful long-term design. Part of flexibility is making the design cost-effective. Every design decision has an impact on the budget. Designing a cost effective data center is greatly dependent on the mission of the center. One company might be planning a data center for mission critical applications, another for testing large-scale configurations that will go into a mission critical data center. For the first company, full backup generators to drive the entire electrical load of the data center might be a cost-effective solution. For the second company, a UPS with a 20-minute battery life might be sufficient. Why the difference? If the data center in the first case goes down, it could cost the company two million dollars a minute. Spending five million on full backup generators would be worth the expense to offset the cost of downtime. In the second case, the cost of down time might be $10,000 an hour. It would take 500 hours of unplanned downtime to recoup the initial cost of five million dollars of backup generators. Design for Scalability The design should work equally well for a 2,000, 20,000, or 2,000,000 square foot data center. Where a variety of equipment is concerned, the use of watts per square foot to design a data center does not scale because the needs of individual machines are not taken into consideration. This book describes the use of rack location units (RLUs) to design for equipment needs. This system is scalable and can be reverse-engineered. Use a Modular Design Data centers are highly complex things, and complex things can quickly become unmanageable. Modular design allows you to create highly complex systems from smaller, more manageable building blocks. These smaller units are more easily defined and can be more easily replicated. They can also be defined by even smaller units, and you can take this to whatever level of granularity necessary to manage the design process. The use of this type of hierarchy has been present in design since antiquity. Keep Your Sanity Designing and building a data center can be very stressful. There are many things that can, and will, go wrong. Keep your sense of humor. Find ways to enjoy what you're doing. Using the other four values to evaluate design decisions should make the process easier as they give form, order, and ways to measure the value and sense of the design decisions you're making. Primarily, they help to eliminate as many unknowns as possible, and eliminating the unknowns will make the process much less stressful. Top Ten Data Center Design Guidelines The following are the top ten guidelines selected from a great many other guidelines, many of which are described throughout this book. 1. Plan ahead. You never want to hear "Oops!" in your data center. 2. Keep it simple. Simple designs are easier to support, administer, and use. Set things up so that when a problem occurs, you can fix it quickly. 3. Be flexible. Technology changes. Upgrades happen. 4. Think modular. Look for modularity as you design. This will help keep things simple and flexible. 5. Use RLUs, not square feet. Move away from the concept of using square footage of area to determine capacity. Use RLUs to define capacity and make the data center scalable. 6. Worry about weight. Servers and storage equipment for data centers are getting denser and heavier every day. Make sure the load rating for all supporting structures, particularly for raised floors and ramps, is adequate for current and future loads. 7. Use aluminum tiles in the raised floor system. Cast aluminum tiles are strong and will handle increasing weight load requirements better than tiles made of other materials. Even the perforated and grated aluminum tiles maintain their strength and allow the passage of cold air to the machines. 8. Label everything. Particularly cabling! It is easy to let this one slip when it seems as if "there are better things to do." The time lost in labeling is time gained when you don't have to pull up the raised floor system to trace the end of a single cable. And you will have to trace bad cables! 9. Keep things covered, or bundled, and out of sight. If it can't be seen, it can't be messed with. Hope for the best, plan for the worst. That way, you're never surprised. Chapter 2. Data Center Design Criteria "It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth." —Sherlock Holmes, by Sir Arthur Conan Doyle The criteria for a data center are the requirements that must be met to provide the system capacities and availability necessary to run the business. Due to the special circumstances of each facility, it would be difficult to give a comprehensive list of all criteria involved in data center design. The possibilities are vast, and it isn't the intention of this book to give a definitive set of design plans to follow, but rather to guide you toward your final design by listing and describing the most probable criteria. The goal of this chapter is to arm you with the knowledge you need to begin the design process. This chapter contains the following sections: "Scope, Budget, and Criteria" "System Availability Profiles" "Insurance and Local Building Codes" "Determining the Viability of the Project"Scope, Budget, and Criteria An important distinction to make at this point is what really constitutes the elements of a data center. When we talk about the data center, we are talking about the site, the Command Center (if one is to be added), the raised floor (if one is to be added), the network infrastructure (switches, routers, terminal servers, and support equipment providing the core logical infrastructure), the environmental controls, and power. Though a data center contains servers and storage system components (usually contained in racks), these devices are contents of the data center, not part of the data center. They are transient contents just as DVDs might be considered the transient contents of a DVD player. The data center is more of a permanent fixture, while the servers and storage systems are movable, adaptable, interchangeable elements. However, just as the DVD is of no value without the player and the player is of no value without the DVD, a data center without equipment is an expensive empty room, and servers with no connection are just expensive paper weights. The design of the data center must include all of the elements. The essential elements are called the criteria. Project Scope Most often, it is the project scope that determines the data center design. The scope must be determined based on the company's data center needs (the desired or required capacities of the system and network infrastructure), as well as the amount of money available. The scope of the project could be anything from constructing a separate building in another state with offices and all the necessary utilities, to simply a few server and storage devices added to an existing data center. In either case, those creating the project specifications should be working closely with those responsible for the budget. Budget Designing a data center isn't just about what the company needs or wants, it's what they're willing to pay for. Using project scope as a starting point, the criteria for the data center can be loosely determined, and a comparison between how much this will cost and the budget will determine the viability of the project. Is there too much money or too little? (Okay, in theory you could get more money for the data center than you need, but this rarely happens.) Then the balancing act begins. If there isn't enough money in the budget to cover the cost of essential elements, either more money must be allocated, or some creative modifications must be made to the project scope. The process for determining a budget, deciding what parts of the data center will receive what portion of it, and putting together a center based on designated funds is one of negotiation, tradeoffs, compromises, and creativity. Also, there is probably more than one budget for the data center, and how the money is allocated depends on numerous factors specific to the company. Planning a data center is part of larger business considerations, and both designers and those setting the budget must be flexible. Accountants telling the data center designers, "Here's how much you get. Make a data center," probably won't work. By the same token, designers demanding enough money for the ideal data center probably won't meet with approval by the accountants. When negotiating for funds, the best idea is to have several alternative plans. Some questions and considerations that must be examined in the beginning might include: What is the budget for the data center? Are the project scope and the budget a realistic balance? Is there enough money to create an adequate center for the company's needs? How much do you actually need to create the center? How will funds be distributed? Can funds be redistributed? Factor in running costs, servicing, and maintenance contracts with maintenance suppliers. Factor in redundancy of power/services/HVAC/UPS. Consider carefully all possible future modifications, upgrades, changes in power needs, and system additions in the design. The toughest thing about designing a data center is working within the budget. The budget will force you to make compromises and you must figure out whether or not you are making the right compromises. You might be able to cut costs by removing the backup generators from the budget, but you must weigh the risk of such a decision. There is the possibility that the data center power might fail and systems would be out of action without backup power. Every compromise carries a degree of risk. Do the risks outweigh the cost? Figuring out how to meet the budget is where your finance people and risk analysts really come into play. Use their expertise. Here are a few questions you might work out with your finance and risk team. If cost exceeds budget, can anything be removed or replaced with a less expensive alternative? Are all redundant systems really necessary? How much will projected failures (downtime) cost compared to initial costs for redundant systems? Is a separate Command Center necessary? Can amortization schedules be stretched from, for example, three years to five years so there is money available for other needs? Can certain areas be expanded or upgraded later? What is the best time to bring the facility online? In the U.S., amortization doesn't begin until you occupy the space. Would it be better to take the amortization hit this fiscal year or the next? A final point to consider: As with many aspects of data center design, the money spent on planning is invariably money well spent. It costs money to build a data center, and part of that expenditure comes right up front in coming up with a budget. Money spent on creating an accurate budget can actually save money in the long run. Build Budget and Run Budget The build budget is the money allocated to build and bring up the data center. The previous three sections describe what is covered by the build budget (or budgets, if separate). But you must also consider the run budget which is the amount of money allocated for yearly operating costs, maintenance, repair, ISP network connectivity, service and support agreements on computers, storage and network equipment, and the cost of electricity. These should be considered as part of the run budget. Criteria The most important criteria for a data center can be put into the following categories: Location (or site) Essential criteria Secondary criteria Location It would seem that the site you choose for your data center would be considered one of the essential criteria. It's true that where you choose to locate the data center site (region/building) is important, but this choice is based on many different factors. For example, a company wants to build a new data center near their corporate offices in Cleveland, Ohio. To meet project scope on the essential criteria, it is determined that several million dollars more are needed, just to secure the site location. Suddenly, building in Cleveland doesn't seem as critical if a few million dollars can be saved by locating the building one hundred and sixty miles away in Milford Center where land prices are much cheaper. Also, connectivity through the company's network infrastructure has made it possible for a data center to be located wherever it is practical and affordable. A data center can even use multiple locations, if necessary, connecting through the network. In this way, location is a very flexible and negotiable criteria. Essential Criteria There is a hierarchy of essential criteria. All data centers must have the following four elements in whatever capacities are needed or available. Though they are listed in order of importance, a data center cannot run without all of them working interdependently. It is only their values that are negotiable. Physical capacity. You must have space and weight capacity for equipment, and therefore, the other three criteria. There must be space for the equipment and the floor must be able to support the weight. This is a constant. Power. Without power nothing can run. Power is either on or off. Connections to different parts of the grid and/or utilizing a UPS increases uptime. You must have physical capacity to have room for power and the equipment that needs power. Cooling. Without cooling nothing will run for long. This is either on or off, though redundancy increases uptime. You must have physical capacity and power to run HVACs. Bandwidth. Without connectivity, the data center is of little value. The type and amount of bandwidth is device dependent. You must have physical capacity, power, and cooling to even consider connectivity. Unless the data center will be used for non-mission-critical operations, the last three criteria should be designed to be up and running 100 percent of the time. The use of these elements is non-negotiable, but their values are negotiable. Consider a decision about power redundancy. A UPS system (batteries that kick in when the power goes out) is less expensive than creating a power generation plant, but it has a limited run time. For a missioncritical operation, the 20 minutes of power a UPS might give you could be insufficient. Let's say the UPS costs $1 million, and the power generation plant costs $3.5 million. The track record of the power company shows that they're down an average of 15-minutes once a year. For your company, a 15-minute power outage equals two hours for the outage and recovery time. Two hours of downtime costs the company $500,000. With a UPS system, there would be no outage because the 20 minutes afforded by the batteries would easily cover for the 15 minute outage and there would be no recovery time needed. Therefore, it would take two years to recover the $1 million dollar cost of the UPS, whereas it would take seven years to recover the cost of the power generation plant. If the power company has a greater problem with power outages, the generators make sense. Or relocating to an area with more dependable power might make more sense. Secondary Criteria The essential criteria must be included in the design in whatever values are available. However, there are invariably other criteria that must be considered, but they are secondary. The level of importance of secondary criteria is wholly dependent on the company and project scope. It's conceivable that the budget could be trimmed, for example, in fixtures, but it's likely that you'll want to budget in overhead lighting so data center personnel won't have to work with flashlights held between their teeth. Still, you can see that some criteria is very flexible. Examples of secondary criteria are: Fixtures such as plumbing and lighting Walls, doors, windows, offices, loading dock All of the miscellaneous hardware, security cameras, card readers, door knobs, equipment cabinets, etc. Equipment such as forklifts and pallet jacks A Command Center These will vary depending on whether you're building a new structure or retrofitting an old one, but what is key is the negotiating value of these elements. The equation for the total budget is: or Using Rack Location Units A concept that will help the data center designer considerably in determining the essential criteria (how much equipment can the center support and what capacities are necessary to support the equipment) is that of rack location units (RLUs). These are numbers based on the operating requirements of each rack in the data center. A rack could be considered to have specific RLU values based on its essential requirements (power, cooling, etc.) and these numbers could be used in relation to other devices with the same, or similar, requirements. In a data center with varied equipment, more than one RLU definition is usually required. For example, all of the storage racks in one section of the data center might be considered to be all RLU-A racks, and all the server racks might be considered RLU-B racks. This is a very important design concept to understand and is covered in greater detail in Chapter 4, "Determining Data Center Capacities." System Availability Profiles Companies with lower access considerations (businesses that aren't open all hours, such as retail chain stores) might have fewer availability requirements than, for example, businesses that do national banking, government agencies, and the health care industry. The availability needs of the data center equipment should be determined in the project scope. Knowing which devices, or groups of devices, are mission-critical (needed 24x7x365) and which devices are in any availability level below mission-critical is important in determining many aspects of data center design, primarily system redundancies. These include: Device redundancies. The number of backup devices that must be available in the event of equipment failures. Power redundancies. The number of feeds from different parts of the grid, the number of UPS systems, etc., that must be installed to make sure these systems stay running. Cooling redundancies. The number of extra HVAC units that must be available in the event that one or more units fail. Network redundancies. The amount of network equipment that must be available in the event of failure. The number of connections to your ISP. The number of network feeds needed to multiple ISPs in the event that one has a catastrophic failure. In most situations, a data center won't have a single availability profile. Several jobs could be going on from machine to machine, and some tasks have greater availability levels than others, some highly critical. Some might need to be highly available, but are less critical. Determining risk of all the operations is key to making many design decisions. Consider the following example. The Chekhovian Bank of Molière decides to upgrade their computer systems and install a data center to keep up with their massive transaction needs. When deciding how to outfit the data center, the question of how available the equipment must be comes up. There are several operations of the data center and they all have different availability profiles. Historical data of the company's operations and general trends help to determine the availability profile of their machines. The following graph shows the Chekhovian Bank's projected availability profile. Figure 2-1. Availability Profile of the Chekhovian Bank Data Center Here is an analysis of the this profile: ATM transactions which are highly utilized (mission-critical) must be available around the clock. Redundant systems are essential. Security and equities trading must be constantly available during business hours (mission-critical) and moderately available the remaining parts of the day. Redundant systems are essential. Home loans are important but some occasional downtime won't be disastrous. Redundancy is a good idea, though this is where corners can be cut. The Community Services Web site should be up and running around-the-clock so people can access the information, but this is a non-critical service and some downtime won't hurt. Redundancy is probably not worthwhile. The Community Services email mailers are sent only once a week in the evening and, though important, it won't hurt the company if the mailers go out late on occasion. No redundancy is required. Risk-assessment analysts are hired to look at each part of the profile to determine the cost of downtime in each area and help decide the best course of action. They determine that the servers for ATM transactions and equity trading are mission-critical. The cost of either department going down will cost the bank $500,000 per minute of down time. Using the RLU model, the data center designer can calculate that these systems require 200kW of electricity. The cost of a 200kW generator is $2 million. The cost of a 20-minute UPS for 200kW is $450,000. So, for $2.45 million the bank can provide power to its configurations. Since all it would take is a 5minute outage to lose $2.5 million, a generator and a UPS are considered a viable expenditure. The servers for the Home Loan portion of the bank require 100kW of power and the risk analysts determine that an outage to this department will cost $5,000 per minute. The cost of a 100kW generator would cost $1 million. A 20 minute UPS for 100kW would be $300,000. The risk analysts also went to the Artaudian Power & Electric Company and got historical information on power outages in the area during the last five years. This data shows that they will average 2 outages a year, but the duration of these outages will be less than ten minutes. Also, the ATM and equity trading groups need a 200kW 20-minute UPS. This UPS can be upgraded to a 300kW twenty minute UPS for only $150,000. At two 10-minute outages a year, the cost of this UPS upgrade will pay for itself in a year and a half. This upgrade is deemed viable but the 100kW generator is not, because it would take 200 minutes of outages of more than 20 minutes to recoup the expenditure. The systems that run the Community Services web site and mailers represent no significant loss of revenue for the bank if they are down for even a few days. It is determined that no additional cost for increased availability will be approved for these systems. The cost of services to increase availability is a continuum. Each step in increasing availability has a cost. At some point, the cost of the next step might not be worth the amount of system downtime. So, determining what the availability profile of a configuration will be is determined by the cost of having this configuration unavailable. As mentioned at the beginning of the "Budget" section, it is not about providing your customers with what they want. They always want it all. It's about how much money they are willing to spend to get what they want. It's a cost-effective trade-off. Insurance and Local Building Codes Insurance and local building codes will have an effect on many design decisions and should be considered in every aspect of the design process by the entire design team, including all building contractors. The contractors on the team will probably be aware of the constraints and specifications of the insurance carrier and local building codes, but the insurers and building authorities must approve the final plans. In the U.S., you need code approval twice; first for the building plans, then, after the construction is complete. The later approval ensures that everything was installed according to code as it was documented in the approved plans. It is important for everyone working on the project to be aware of these constraints to avoid unnecessary changes to the plans at the last minute. The best assurance of time well spent is to have a continual dialog with insurers and building authorities during the design phases. Codes are covered in greater detail in Chapter 13, "Codes and Construction." Determining the Viability of the Project There are times when too many compromises must be made to make the data center project viable. It might be something obvious (you can't get enough power from the local power company or there are frequent flooding problems), or it might be a number of small factors that, when looked at collectively, show that the project is a bad risk. Consider the following possible constraints on the project: Inadequate budget Retrofit problems such as grounding, cable routing, inadequate floor to ceiling height, no way to set up seismic restraints, etc. Better business decision to use co-location or ISP, if only temporarily Inadequate pool of qualified employees Overly expensive location Inadequate district or too remote Inadequate or inappropriate space Inadequate power. Can't connect to separate parts of the grid for redundancy Inadequate cooling capacity Inadequate ISP service Local building codes, insurance, or fire regulations are too restrictive Too many weather or seismic problems High history of fires Most of these problems have to do with the inadequacies of the location. For more information, see Chapter 5, "Site Selection." Chapter 3. Designing a Data Center "It is a capital mistake to theorize before one has data." —Sherlock Holmes, by Sir Arthur Conan Doyle This chapter describes the most important design decisions that must be made in planning a data center. A few of the topics are described in more detail in later chapters. This chapter contains the following sections: "Design Process" "Data Center Structural Layout" "Data Center Support Systems" "Physical and Logical Security" "System Monitoring" "Remote Systems Management" "Planning for Possible Expansion"Design Process The design stages for the data center usually take the skills of architects, accountants, structural, mechanical, electrical, HVAC, system, and network engineers, project managers, and procurement personnel. Add also the probability of sales personnel, insurance carriers, and risk management analysts. Overseeing the project is a data center design engineer whose task is to accommodate the requirements of the system and network engineers, and to work with the other members of the team to ensure that the data center requirements (based on the project scope) are met. As in any other design process, this is an iterative and recursive process. You have an initial set of criteria and you use this set of criteria to determine requirements. You define rack location units (RLUs, described in Chapter 4, "Determining Data Center Capacities") to ensure that the requirements match or exceed the criteria. At certain points other criteria will emerge. These, in turn, change the requirements. And additional or different RLUs will be needed to verify these requirements meet or exceed this new criteria. This is how the process is iterative. Other times, requirements change, and this changes the criteria which in turn changes the requirements. This is how the process is recursive. After several passes through this iterative recursion, a stable set of criteria and requirements will emerge. The changes become smaller in scope, and the process continues as before, albeit with a finer level of granularity. Just when you think you have a handle on the whole design, somebody tries to get code approval for something, won't get it, and you end up very close to square one. You then have a great screaming match with a white board marker because you're convinced it picked that exact moment to dry up on you. You're certain that its reason for doing this was just to annoy you (the fact that you left the cap off for three days is irrelevant). Finally, you decide to see how far you can throw it across the parking lot. Then, you and a few friends head off to the pub for a few pints. You become more rational and realize, "Oh, it's not that bad… We can just add another network POD in this other row and that will fix the problem, and I can figure that out tomorrow morning in fifteen minutes." Things get back to only mild insanity for a few days until a similar event triggers similar behavior. Over time, the problems get smaller and eventually the design meets the criteria. While the description of events above might seem a little over the top (you usually end up throwing your dead white board marker across your office rather than the parking lot), it is not that far from the truth. If you are embarking on designing and building a data center, remember this above all else: Find ways to have fun, enjoy the process, and learn to see the humor in some of the bizarre situations you'll find yourself in. If you don't, you might as well get a long-term lease on a padded cell and start your fittings for a jacket with sleeves that tie behind the neck. Design Drawings It should be kept in mind that the design of a data center should be structured but fluid, not only during the design process, but after construction. Computer environments constantly evolve to accommodate company needs, changes in technology, and the business landscape. Professional, detailed plans are necessary in the design stages, but it is important to keep updated working drawings of the data center and all support systems. Computer Aided Design (CAD) software is typically used. It is more efficient than drawing by hand, and creates plans that are clearly readable, easily reproduced, and easily modified. These blueprints allow for the continued updating of architectural, electrical, mechanical, and computer systems. The drawings can be used in site evaluations and future planning. Blueprints are particularly important when the project involves outside contractors. Some of the primary contractors are: Architectural firms. They might supply actual drawings of the building, showing a wall here, door there, lobby over there, where carpet will be installed, where concrete will be used. This represents the physical building. Interior designers. They create the "look" of the place, sometimes matching company specifications for consistency of styles, from trim to carpet. Structural engineers. They make sure the building will use materials and construction techniques that will keep the roof from collapsing under the weight of all those cooling towers. Electrical design firms and engineers. They deal with lighting plans, electrical distribution, wireways under the floor, breaker subpanels, power transformers, wiring for the fire detection system, and smoke alarms. HVAC design firms. They determine HVAC unit placement and whether they should be 20-ton or 30-ton, determine proper installation of piping that brings chilled fluids to units, and where cooling towers, compressors, and heat exchangers will be located. Some of these tasks, such as electrical and HVAC, might be handled by the same firm. It could depend on who is available in the area. It is a good idea to employ a project management firm to coordinate all of these different contractors. Figure 3-1. Large Scale Design Drawings from the General Contractor or the Project Management Company Thanks to the Internet, you can access the drawings electronically (Adobe® PDF format works well for this). This can reduce the time of the design/review/change process considerably. The CAD drawings are usually held by the building contractor who helps coordinate all the other subcontractors. PDFs are good, but, a few times in the cycle, you will need actual blueprints which are larger in scale than most computer monitors. These allow you to see very fine details that might be lost in a PDF file. Also, they provide a place to make notes directly on the drawings for later use. During the design process, you should also have several dozen pads of Post-It Notes for temporary comments on the blueprints and to bring certain details to the attention of others. You should also have a large white board with lots of dry erase markers in a variety of colors. (Remember to put the caps back on the markers when not in use.) Designing for Data Center Capacities A major problem in designing a data center is determining how to support equipment of known quantity and capacities, or determining the quantities of equipment of unknown capacities for a data center of known capacities. In other words, how do you make the equipment fit the room, or how do you make the room fit the equipment? There are many factors to consider and often these factors are limitations. Looking at the problem from the point of view of capacities is helpful, but you can also think of these as limitations, which is usually the case. The room might only be so big and the power company might only be able to give you so much electricity. Some of the major limitations are: Budget District Insurance and building code Power Cooling Connectivity Site Space Weight A delicate balancing act must occur between many of the members of the design and build team to determine the capacities and limitation,, and to work with them. With this knowledge, factors can be juggled to decide how to implement what is available to meet the project scope. If the limitations are too great, the project scope must change. This book offers a useful (some might say essential) tool for designing based on data center capacities called RLU. This is discussed in greater detail in Chapter 4, "Determining Data Center Capacities." Data Center Structural Layout The data center must be designed to accommodate diverse hardware designs and requirements, and possibly equipment from different manufacturers. Determining RLUs is the best way to decide how the space will be filled with equipment, and with this information, where the equipment will be placed. The following general guidelines should be used in planning the initial layout of the room, keeping future planning in mind. Note Though the plans for the data center do not include the storage and server equipment it will contain, it is necessary to know what the equipment will be to make many of the design decisions for the data center. Structural Considerations There are any number of structural issues to consider when designing a data center. Here is a sampling of some actual issues you might face: Building in an area with a subfloor to ceiling height of ten feet. By the time you add two feet for the raised floor, the height is reduced to eight feet. Now add the twelve inches needed for light fixtures and fire suppression systems, and your space is reduced to seven feet. The racks that will occupy this space are seven feet tall and exhaust heat out the top, or rather, they would if there was room. These racks will overheat real fast. This is not a realistic space in which to build a data center. Building in the basement of a building that overlooks a river. After construction is complete, you find out that the river overflows its banks every few years and you don't have any pumps in the basement to get the water out. Building in the 43rd floor of a high rise building along the San Andreas fault line. This is not a big deal until a magnitude 7 quake hits the area and you end up with several racks embedded in the walls because the building moves a good five feet in all directions at the level of the 43rd floor. If another space is not available, seismic restraints should be used. Building in a space with the restrooms built right in the middle. This really happened. The space was shaped like a square donut with the rest rooms occupying a block in the middle. How do you efficiently cool a donut-shaped space? Having toilets in the middle of your data center is not the right way to add humidity to your HVAC system. If you must live with this type of room shape, you must. But if you have any say in the matter, look into other locations. Aisles aren't wide enough for newer or bigger machines. The people who move the equipment end up ripping massive holes in the walls trying to make the tight turns required to get from the loading dock to the staging area. Maybe a few dozen light fixtures along the corridor are taken out as well. Your building maintenance crews will get very angry when this is done on a weekly basis. Know how much space is needed to move and turn the racks and design in adequate aisle space. This means anticipating larger and heavier machines. Not knowing the structural load rating of raised floors and ramps. Imagine this: You acquire a space with an existing raised floor and ramps. This means a big chunk of the cost and design process has been taken care of! The day arrives when the storage and server racks begin moving in. Unfortunately, no one checked into the load rating for the floor and ramps. While rolling in a heavy rack, a portion of the floor gives way, taking the rack and several people with it into a big hole. You learn quickly about liability issues. Know the total weight that will go on the floor and ramps, and make sure existing floors and ramps meet these specifications. Raised Floor A raised floor is an option with very practical benefits. It provides flexibility in electrical and network cabling, and air conditioning. A raised floor is not the only solution. Power and network poles can be located on the floor and air conditioning can be delivered through ducts in the ceiling. Building a data center without a raised floor can address certain requirements in ISP/CoLo locations. Wire fencing can be installed to create cages that you can rent out. No raised floor allows these cages to go floor to ceiling and prohibits people from crawling beneath the raised floor to gain unauthorized access to cages rented by other businesses. Another problem this eliminates in an ISP/CoLo situation is the loss of cooling to one cage because a cage closer to the HVAC unit has too many open tiles that are decreasing subfloor pressure. However, some ISP/CoLo locations have built facilities with raised floor environments, because the benefits of a raised floor have outweighed the potential problems listed above. Drawbacks to the no-raised-floor system are the very inefficient cooling that cannot easily be rerouted to other areas, as well as the problems associated with exposed power and network cabling. A raised floor is a more versatile solution. Raised floors are covered in more detail in Chapter 6, "Implementing a Raised Floor." Aisles and Other Necessary Open Space Aisle space should allow for unobstructed passage and for the replacement of racks within a row without colliding with other racks. The optimal space would allow for the turn radius required to roll the racks in and out of the row. Also, rows should not be continuous. Unbroken rows make passage from aisle to aisle, or from the front of a rack to the back, very time consuming. Such clear passage is particularly important in emergency situations. The general rule of thumb for free floor space is between 40 and 50 percent of the square footage. FIGURE 3-2 gives an example of an appropriate layout. Figure 3-2. Proper Aisle Space and Non-Continuous Rows How aisle space is designed also depends upon air flow requirements and RLUs. When designing the center, remember that the rows of equipment should run parallel to the air handlers with little or no obstructions to the air flow. This allows for cold air to move to the machines that need it, and the unobstructed return of heated air back to the air conditioners. Be sure to consider adequate aisle space in the initial planning stages. In a walls-within-walls construction where the data center is sectioned off within a building, aisle space can get tight, particularly around the perimeter. Command Center Though an optional consideration, for some companies a separate Command Center (also called a Command and Control Center) is useful for controlling access to the consoles of critical systems. This is just one of the many security devices used in the data center. In disaster recovery scenarios or other critical times, the Command Center is a key area. In many corporations where computer technology is at the core of their business, this Command Center also serves as a "war room" in times of crisis. However, with companies moving to geographically distributed work forces, having only one way to monitor and work on equipment in the data center might not be a practical alternative. Being able to hire from a talent pool on a global scale increases your chances of getting better people because the pool is larger. This is also useful if you are in an area prone to bad weather. A person might not be able to get into the Command Center, but if the data center is remotely accessible and they have power and a phone line, they can still work. As more companies move to electronic ways of doing business, Command Centers are becoming public relations focal points. They can be designed as a glassed in box that looks into the computer room to give personnel a way to monitor security and allow visitors a view of the equipment without entering the restricted and environmentally controlled area. If the data center is a key component of the company's image, the Command Center can be designed to look "cool," an important PR tool. Whether it looks into the data center computer room or not, a modern, high tech Command Center room is an impressive location for executives to talk to the press, television, analysts, and shareholders. From a security standpoint, the Command Center is practical because physical access to the data center can be monitored from within the Command Center and possibly access can be allowed only through the Command Center. Since the Command Center could be the only way to connect to the administrative network, logical access to that network can be controlled within the Command Center as well. Data Center Support Systems A data center must provide certain services: Locations on the floor that can support the weight of the racks Power to run the racks Cooling to keep the racks from overheating Connectivity to make the devices in the racks available to users Planned redundancies If any one of these services fail, the system will not run effectively, or at all. These support systems are how a data center supplies its intended services. They are also interdependent. If you can't place the server in the data center, it won't run. If you can't get enough power to run the server, it won't run. If you can't cool the server, it won't run for long, a few minutes at best. If you can't connect the server to the people who need to use it, what good is it? All of these requirements must be met simultaneously. If one of them fails, they all might as well fail. Your data center can only be as effective as its weakest support system. Space and Weight You have to be able to place the servers in the data center and, depending on the type of server, you might need even more space than its physical footprint to cool it. This is the cooling footprint. Weight is also a major consideration. If you have space for the machine, but your raised floor can't handle the weight load, it will crash through the raised floor. The ramps or lift you use to get the machine onto the raised floor must also be able to handle the weight load of the system. Power Requirements It is essential that the data center be supplied with a reliable and redundant source of power. If computers are subjected to frequent power interruptions and fluctuations, the components will experience a higher failure rate than they would with stable power sources. To assure that power is up constantly, multiple utility feeds, preferably from different substations or power utility grids, should be used. Also, the data center should have dedicated power distribution panels. Isolating the data center power from other power in the building protects the data center and avoids power risks outside your control. The power distribution system is covered in more detail in Chapter 7, "Power Distribution." HVAC and Air Flow Requirements Placement of the HVAC (air conditioning) units is highly dependent on the size and shape of the data center room, as well as the availability of connections to support systems. The primary concern in placement is for optimal effectiveness in dealing with the planned load. Air flow must be considered in the layout of the HVAC systems as well. Reducing obstructions under the floor will provide the best air flow to the areas where the air is needed. Air flow is also governed by under-floor pressure, so the placement and distribution of solid and perforated tiles on the raised floor should be carefully considered. You must maintain higher air pressure under the floor than in the data center space above the floor. Air conditioning and HVAC placement is covered in more detail in Chapter 8, "HVAC and Other Environmental Controls." Network Cabling Network cabling is essential to a data center. It must supply not only TCP/IP connectivity, but connectivity to Storage Area Networks (SAN) as well. Storage systems are becoming increasingly "network aware" devices. Whether this has to do with managing storage through TCP/IP networks or with using these devices on SANs, the requirements of the network cabling must be flexible and scalable. Most of these requirements can be met using Cat5 copper and multi-mode fibre. However, some single-mode fibre might also be needed to support WAN requirements. Understanding what equipment will go where and knowing the cabling requirements of each piece of equipment is integral to building data centers. Of all of these support systems, upgrading or adding more network cabling inside the data center is the least intrusive support system upgrade. Network cabling is covered in more detail in Chapter 9, "Network Cabling Infrastructure." Planned Redundancies It is important to consider all of the possible resources that will be needed for redundancy. Particularly, consider redundancy for power and environmental support equipment. Redundant systems allow for uninterrupted operation of the center during electrical and HVAC upgrades or replacements. A new HVAC unit can be run simultaneously with the hardware it is replacing rather than swapping the two. Redundancy assures that power and environmental controls are available in the event of power or equipment failures. Plan for at least the minimal amount of redundancy, but also plan for future redundancy based on projected growth and changes within the center. Will the focus of the center change from a development to a mission critical facility? Will redundant HVAC units be necessary and, if so, where will they be placed? Should greater capacity electrical wiring be pre-installed for future systems? It is important that the intentions for redundancy be maintained as the demands of the data center change and grow. Extra floor space or support systems that were planned for redundancy should not necessarily be used for expansion if this strategy means increasing the chances of downtime due to failures. Make sure the blueprints clearly indicate the intended purpose of the space and systems. The biggest problem with allocating less redundancy to create more capacity is in the area of sub-panel and circuit breaker space. You should allocate space for at least one additional subpanel and breakers in the mechanical room for each megawatt of power you have in the data center. Also, consider redundancy for UPS and emergency power generators. While these are large expenditures and twice as large if they are totally redundant, in a mission critical data center where the cost of even one minute of downtime can cost millions of dollars, they could be a prudent investment. Use the resources of your risk analysts to determine the cost-effectiveness of these redundant systems. Physical and Logical Security Two types of security must be addressed in the data center design. It is important to limit access of unauthorized people into the data center proper, and to prevent unauthorized access to the network. Physical Access Restrictions Access to the data center should be strictly regulated, limited to personnel necessary to keeping the equipment in operation. It should not be necessary for anyone else to enter the data center. Those allowed access should have a clear understanding of the sensitivities of the hardware to avoid accidental contact with buttons, cable connections, terminals, or emergency response controls. All points of access should be controlled by checkpoints, and coded card readers or cipher locks. Figure 3-3 shows these two restricted access features for entry into secure areas. Figure 3-3. Cipher Lock (Left) and Card Reader (Right) at Restricted Access Doorways For added security, cameras can be installed at entry points to be monitored by security personnel. Logical Access Restrictions The ability to access the physical console of a system over a network has many advantages, including: The ability to administer machines in a different region, even a different country The ability to work remotely, from house, hotel, or even a conference However, this also means that anyone on the network could gain unauthorized access to the physical console. Ways to reduce this risk include: Creating several levels of authentication Placing limits on who can log in to the console servers Putting consoles on an administrative network that can be accessed only from the Command Center, and only over authentication through a VPN Network security is an important issue, but it's not within the bounds of this book to recommend network security practices. There are, however, many articles on the subject at http://www.sun.com/blueprints/online.html. At this website you'll also find information on "The Solaris™ Security Toolkit" by Alex Noodergraaf and Glenn Brunette. System Monitoring Monitoring system status, health, and load is a useful tool for understanding how each system is working, by itself and in relationship to other connected systems. It is not within the scope of this book to cover the "how" of system monitoring, as there are many other sources for this information. However, whatever software you use for monitoring should conform to industry standard interfaces like Simple Network Monitoring Protocol (SNMP). Even HVAC systems and UPS systems can be connected to the network and run SNMP agents to give useful information on the health of the data center and support systems. Remote Systems Management Remote systems management, like remote access, has many advantages. It offers the ability to work remotely, whether you are snowed in at home or at a hotel attending a conference. It allows you to get the best people available from the largest labor pool. Monitoring and Management systems like Sun™ Management Center (Sun MC), BMC Patrol, and others allow you to monitor and manage devices from pagers and cell phones from anywhere around the world. Effective systems management is critical to a smoothly running data center, and even more critical when managing remotely. System configuration information (hardware, software, patches, etc.) is the foundation for remote systems management. It is critical that this information be accurate and reliable. Elizabeth Purcell, a Sun performance availability engineering systems engineer, presented a paper on this topic at the October 2001 SuperG conference. For more indepth information on this subject, see Ms. Purcell's paper reproduced in Appendix A, "Managing System Configurations." Remote systems management is also handy if you have multiple data centers around the world. For example, you might have data centers in Los Angeles, London, and Tokyo. Each city is 8 hours from the next. The administrative networks at each of these Command Centers have connections to the administrative networks of the other two sites. The London crew has coverage of all systems in all three locations from 9:00 a.m. to 5:00 p.m. GMT. At 5:00 p.m. when the crew in London is done for the day, it is 9:00 a.m. in Los Angeles and the L.A. crew takes the next shift. At 5:00 p.m. in L.A. it is 9:00 a.m. the next day in Tokyo. The Tokyo crew takes over till 5:00 p.m. when the London crew is back at work at 9:00 a.m. London time. This is built-in 24 hour, 7 day a week coverage of all data centers, with no one crew having to work "graveyard" hours. The biggest disadvantage to remote systems management is the possibility of security violations such as someone cracking into the administrative networks. While tools like firewalls and secure shell can help reduce this risk, it is highly recommended that you have your own data center personnel who specialize in this area of technology. Or, you can outsource this type of work to firms that specialize in data, system, and network security. This flexibility is not without risk and cost; however, you might find that these risk and costs are more than offset by the flexibility and productivity you can achieve through remote systems management. Planning for Possible Expansion In general, the next generation of hardware will take up less room for the same and larger computational and storage capacities. Therefore, more capacity can be put into an existing space. However, since computing needs rise faster than computing power, don't expect the space needs of the data center to shrink. Most data centers have been able to continue within the same area without having to take up more real estate. However, power and cooling requirements increase. Even if you have the physical space to expand, you might not be able to accommodate the additional power or cooling requirements of expansion. Also, sometimes a direct addition to an operational data center is an even a tougher design and construction challenge than building a new facility. What is more likely is that a future expansion would be treated as a separate space from the existing data center, and you can use the networking infrastructure of the existing data center to "link up" the expansion data center with the existing one. Using RLUs to determine data center capacities is the best method for planning for future expansion. RLUs will give you the tools to define your space, structural needs, in-feeds (including power and cooling), etc. and therefore give you a clear picture of remaining capacities. For more information on defining RLUs, see Chapter 4, "Determining Data Center Capacities."