Terotechnology Past Paper PDF 2007

Summary

This document is a preface and table of contents for a terotechnology textbook. The book is for industrial technology students or professional engineers, and focuses on plant maintenance and reliability during product design.

Full Transcript

MARCH 2007, NJORO KENYA Preface Various studies have indicated that for large manufacturing systems or pieces of equipment, maintenance and its support account for as much as 60 to 75 percent or more of their life cycle costs. The increasing demands on high...

MARCH 2007, NJORO KENYA Preface Various studies have indicated that for large manufacturing systems or pieces of equipment, maintenance and its support account for as much as 60 to 75 percent or more of their life cycle costs. The increasing demands on high quality products have brought the maintenance problem into even sharp focus. This, therefore, has put more emphasis on maintainability during product design. Terotechnology is the process of optimising the life cycle costs of an asset or equipment. In the process of optimising life cycle costs a thorough understanding of plant reliability and maintainability is very crucial. An attempt, therefore, has been made to present reliability and maintainability concepts in this book to meet the challenges of modern manufacturing system design. In doing so every effort has been made to treat the topics discussed in such a manner that the reader will need minimum previous knowledge to understand the contents. This introductory book describes how to design for plant reliability and ease maintenance from the first stages of product design. In addition the book stresses how to: Improve product performance Minimize downtime Reduce the frequency of maintenance Lower the cost of maintenance Use life cycle costing in the process of choosing and purchasing equipment Decide when and how to replace an equipment Acquire data for reliability growth The book is intended for Bachelor of Industrial Technology students and other engineering students, as well as professional engineers and design and maintenance managers. This book will enable manufacturing firms to stay ahead of the rest in product quality, efficiency and profitability. Dr. Charles M.M. Ondieki Msc. (Mech. Engg.), PhD (BAdmin.) Lecturer, Egerton University, Njoro Kenya 2 Contents ITEC 236: TEROTECHNOLOGY....................................................................................1 Dr. Charles M.M. Ondieki........................................................................1 MARCH 2007, NJORO KENYA..................................................................2 Preface........................................................................................................................................2 Dr. Charles M.M. Ondieki.........................................................................................................2 Msc. (Mech. Engg.), PhD (BAdmin.).......................................................................................2...................................................................................................................................................2 Contents......................................................................................................................................3 1. Terotechnology.......................................................................................................................5...................................................................................................................................................5 1.1 Introduction......................................................................................................................5 2. Plant Reliability......................................................................................................................5 2.1 Specifications for Reliability............................................................................................6 2.2 Reliability of Parts and Components................................................................................6 2.3 Parts in series....................................................................................................................6 2.4 Reliability and Quality.....................................................................................................6 2.5 The Role of Design in Reliability....................................................................................7 2.6 Improving Reliability.......................................................................................................7 2.6.1 Design Methods Used to Improve Reliability...........................................................7 2.6.2 Causes of Unreliability..............................................................................................9 2.7 Cost of Reliability...........................................................................................................9 Reliability..........................................................................................................................10 2.8 Basic stages in the achievement of reliability................................................................10 2.9 Reliability and Failure Patterns......................................................................................11 (e) Calculating Reliability when Failure Rate is Constant...................................................12 2.10 Redundancy..................................................................................................................15 B..................................................................................................................16 3.0 Maintainability...................................................................................................................21 3.1 The importance, Purpose, and Results of maintainability efforts..................................21 3.2 Maintainability cost Considerations...............................................................................22 3.3 Maintainability Costs.....................................................................................................22 3.4 Maintainability Design Considerations..........................................................................22 3.5 Maintainability Tools.....................................................................................................24 3.6 Maintainability and Safety..........................................................................................26 3.7 Safety and Human Behaviour.....................................................................................26 3.8 General Maintainability Design Guidelines...................................................................27 3.9 Maintainability of new Equipment)...............................................................................27 3.10 Designs for Ease of Operation....................................................................................27 3.11 Design for ease of maintenance....................................................................................29 3.12 Design for Serviceability..............................................................................................31 4.0 Plant Maintenance..............................................................................................................31 4.1 Preventive Maintenance.................................................................................................33 4.2 Breakdown Programs.....................................................................................................33 4.3 Replacement and Maintenance.......................................................................................34 4.4 Maintenance Models....................................................................................................34 4.5 Maintenance cost estimation models..............................................................................37 4.5 Availability.....................................................................................................................39 3 4.5.1 Availability and Scheduled maintenance................................................................39 4.5.2 Losses caused by non-availability of the system.....................................................39 4.6 Downtime and Maintenance Strategies..........................................................................40 4.6.1 Mean Downtime (MDT) and Mean time to repair (MTTR) [or Repair Time].......40 4.6.2 Active Repair Time.................................................................................................40 4.6.3 Factors Influencing Downtime................................................................................41 4.7 Comparisons of Maintainability and maintenance costs...............................................42 4.8 Comparisons of Reliability and maintenance Costs.......................................................42 4.8.1 Factors affecting Reliability and Maintenance Costs..............................................43 4.9 Reliability – Centred Maintenance (RCM)....................................................................43 4.9.1 Basic steps in RCM Process....................................................................................44 4.9.2 Methods of Monitoring Equipment condition.........................................................46 4.9.3 The Benefits of RCM Application..........................................................................48 5. Life Cycle Cost and the Cost of an Equipment....................................................................49 5.1 The costs of quality........................................................................................................49 The Life Cycle Costs...........................................................................................................49..........................................................................................................................................52 5.2.1 Life Cycle Costing (LCC).......................................................................................52 5.2.2 Life cycle costing steps are:....................................................................................53 5.2.3 Advantages and Disadvantages of Life-cycle costing.............................................53 5.2.4 Why Use LCC?.......................................................................................................54.............................................................................................................................................54 5.2.5 The Conversion or Decommission Phase of Life Cycle Costing...........................54 5.2.6 Life Expectancies....................................................................................................55 5.2.7 Life Cycle Cost models...........................................................................................56 Example 1.................................................................................................................................58 Manufacturer B’s Electric generator....................................................................................59 5.3 Cost and Performance of Equipment.............................................................................60 5.3.1 Factors related to reliability.....................................................................................60 5.3.2 The Design Profile..................................................................................................61 6.0 Maintainability and Reliability terms and definitions........................................................61 7.0 Depreciation and Equipment Replacement........................................................................63 7.1 Economic Life and Obsolescence..................................................................................63 7.2 Depreciation...................................................................................................................64 7.3 Obsolescence..................................................................................................................64 7.4 Causes of Depreciation...................................................................................................64 Depreciation.....................................................................................................................64 7.4.1 Methods of calculating Depreciation......................................................................65 7.5 The Decision Whether to Purchase................................................................................67 7.6 Installation of new Equipment.......................................................................................68 7.7 Equipment Replacement................................................................................................68 7.7.1 Reasons for Replacement of Equipment.................................................................68 7.7.2 Equipment Replacement Policy..............................................................................69 7.7.3 Guidelines in Replacement Analysis.......................................................................69 7.7.4 Methods used for Replacement...............................................................................69 Solution................................................................................................................................70 Machine A................................................................................................................................71 Machine B................................................................................................................................71 (f) MAPI Method.............................................................................................................72 8.0 Acquisition of Failure Data...............................................................................................72 8.1 Reasons for data collection..........................................................................................73 8.2 Information and Difficulties...........................................................................................73 8.3 Best Practice and Recommendations.............................................................................75 4 8.4 Analysis and Presentation of Results.............................................................................76 8.5 Sources of Reliability Information.................................................................................76 1. Terotechnology 1.1 Introduction Terotechnogy is the process of optimising the life cycle cost of physical assets. Life cycle cost is the sum of all costs incurred during the life time of an asset that is, the total of procurement and ownership costs. Life cycle costs are categorised as: cost of acquisition, cost of use, and cost of administration. Life Cycle Cost (LCC) of any physical asset is influenced by the plant reliability and plant maintainability. Maintainability is the action taken during the design and development of assets to include features that will increase ease of maintenance and will ensure that when used in the field the asset will have minimum downtime and Life-cycle support costs i.e. its serviceability, reparability, and cost-effectiveness of maintenance are increased. Reliability is the probability that an item will carry out its stated function adequately for the specified time interval when operated according to the designed conditions, i.e. to define reliability of any equipment: We must state the planned working life e.g. a new car might be very reliable if we only expect it to last for 5 years; less reliability over a period of 10 years; and completely unreliable if we are expecting a useful life of say 40 years. Similarly we shall need to know the intended conditions of use, and the routine maintenance which is required, e.g. if a car engine seizes because there is no water in the radiator this is a failure of maintenance rather than a failure of reliability; if a car is driven carelessly and fails this is a misuse failure. Mean Time between Failures (MTBF) and Mean Time to Repair (MTTR): The Mean Time between Failures (MTBF) tells us how long on average, equipment operates before it fails, and this we want to be as long as possible. MTBF, therefore, depends on reliability. The Mean Time To Repair (MTTR) tells us how long on average, it takes to put the equipment right after it has failed, and this we want to be as short as possible. MTTR, therefore, depends on maintainability. 2. Plant Reliability Reliability is the probability that an item will carry out its stated function adequately for the specified time interval when operated according to the designed conditions. Since no two equipments are identical due to manufacturing differences however the designer and control engineers try to eliminate any defects, reliability is given in percentages (for mathematical reasons, it is expressed in decimals of 1.00). Suppose that out of every 100 cars of a particular type, 99 prove to be trouble free if used and maintained correctly, and one fails to work as intended. Then we can say that the reliability of each car is 99 percent, meaning that the chances are 99 in 100 that it will prove reliable. The longer we expect anything to last the more likely it is to fail during that time i.e. reliability falls as time increases. 5 Reliability at time t = R(t) = (No. surviving at instant t)/ (No. at start when t=o) 2.1 Specifications for Reliability It is usually best to express a customer or market specifications in terms of the service to be performed, or result to be achieved, rather than of the hardware envisaged. The specification must contain full information about everything, which is required. The required reliability must be expressed in figures. There are three main ways of expressing reliability in a specification: (i) Directly in terms of reliability for a specified useful life. Because reliability is related to a particular life span, this is not always convenient, and MTBF or failure rate is usually preferred. (ii) MTBF or MTTF – This method is common, especially in the electronics industry, where the failure rate is often approximately constant. (iii) Failure Rate – Since the failure rate is directly related to the MTBF, it can be used provided it is reasonably constant. The Reliability of large installations is not necessarily quoted as a single figure. The central unit, for example, may be assigned a higher reliability than some of the ancillary and support equipment. Ideally, the minimum acceptable reliability or MTBF or the maximum failure rate should be quoted, but when the system is of very new design it may be more realistic to get a target reliability, MTBF or failure rate. In deciding what this should be, we must consider the conditions of use, the duty cycle, what the maintenance requirements are likely to be and how long the system can be out of use while maintenance is done. 2.2 Reliability of Parts and Components A system will be made of parts and components, and since in some cases the failure of one of these may cause the whole system to fail, it must be ensured that each is as reliable as possible. Further the greater the number of parts, the greater the risk of including one which is faulty. Hence there are two basic rules: (i) Use as few parts as possible. (ii) Ensure that each part is reliable. 2.3 Parts in series Suppose we have a system consisting of a number of parts and: we know the reliability of each part; Every part is vital in the sense that, if one fails, the whole system will fail. Example: Consider a transformer and rectifier set, used to convert mains electricity to a suitable voltage and frequency. Suppose each part has a reliability of 0.9. If we require only a transformer and nothing else, then the system will have the same reliability as the one part it contains. Therefore, for 1 part Reliability = 0.9 If however we require a rectifier, then we have two things, which can go wrong. Therefore, for 2 parts Reliability = 0.9 x 0.9 = 0.81 and for 3 parts reliability = (0.9)3 = 0.73 and for 10 parts reliability = (0.9)10 = 0.35 2.4 Reliability and Quality Quality is sometimes defined as “fitness for purpose” and can be broken roughly into: Physical features, e.g. whether an item has a satisfactory appearance, all its dimensions within limits etc. Performance, i.e. whether it works correctly. 6 Reliability is the probability that an item will perform as required, under stated conditions, for a stated period of time. Hence since performance is an aspect of quality, we might say that reliability is the probability an item will retain its quality, under stated conditions, for stated period of time. Thus quality and reliability are very closely related. Hence the quality of a product from the manufacturer will affect its reliability. The quality of the product is also affected by: (i) the method of manufacture (ii) Production equipment (iii) Inspection and test equipment (iv) Supplies and/or selection of raw materials and parts etc. (All these assume that the design and development of the product has been done correctly). 2.5 The Role of Design in Reliability According to the definition of reliability, design is keystone. The design strategy used to ensure reliability can fall between two broad extremes. The fail-safe approach is to identify the weak spot in the system or component and provide some way to monitor that weakness. When the weak link fails, it is replaced. At the other extreme is an approach where all the product components are designed to have equal life so the system will fall apart at the end of its useful lifetime. The obsolete worst-case approach is frequently used where the worst combination of parameters is identified and the design is based on the premise that all can go wrong at the same time. This is a very conservative approach, and it often leads to over design. Two major areas of engineering activity determine the reliability of an engineering system. First, provision for reliability must be established during the earliest design concept stage, carried through the detailed design development, and maintained during the many steps in manufacture. Once the system becomes operational, it is imperative that provision be made for its continued maintenance during its service. 2.6 Improving Reliability Because overall system reliability is a function of the reliability of individual components; improvement in their reliability can increase system reliability. System reliability can be increased by the use of backup components (i.e. redundancy). Failures in actual use can often be reduced by upgrading user education and refining maintenance recommendations or procedures. It may be possible to increase the overall reliability of the system by simplifying the system (thereby reducing the number of components that could cause the system to fail) or altering component relationships (e.g. increasing reliability of interfaces). Generally the potential ways to improve reliability are: Improve component design Improve production and/or assembly techniques Improve testing Use redundancy Improve preventive maintenance procedures Improve user education Improve system design. 2.6.1 Design Methods Used to Improve Reliability The following methods are used in engineering design practice to improve reliability (and therefore minimize failure): 7 i) Margin of safety Variability in the strength properties of materials and in loading conditions (stress) leads to a situation in which the overlapping statistical distributions can result in failures. Therefore, variability in strength has a major impact on the probability of failure, so that failure can be reduced with no change in the mean value if the variability of the strength can be reduced. ii) Derating The analogy to using a factor of safety in structural design is derating electrical, electronic, and mechanical equipment. The reliability of such equipment is increased if the maximum operating conditions (power, temperature, etc.) are derated below their nameplate values. As the load factor of equipment is reduced, so is the failure rate. Conversely, when equipment is operated in excess of rated conditions, failure will ensue rapidly. iii) Redundancy Redundancy is the most effective way of increasing reliability. In parallel redundant designs the same system functions are performed at the same time by two or more components even though the combined outputs are not required. The existence of parallel paths may result in load sharing so that each component is derated and has its life increased by a longer than normal time. Another method of increasing redundancy is to have inoperative or idling standby units that cut in and take over when an operating unit fails. The standby unit wears out much more slowly than the operating unit does. Therefore, the operating strategy often is to alternate units between full-load and standby service. The standby unit must be provided with sensors to detect the failure and switching gear to place it in service. The sensors and/or switching units frequently are the weak link in a standby redundant system. iv) Durability The material selection and design details should be performed with the objective of producing a system that is resistant to degradation from such factors as corrosion, erosion, foreign object damage, fatigue, and wear. This usually requires the decision to spend more money on high-performance materials so as to increase service life and reduce maintenance costs. Life cycle costing is the techniques used to justify this type of decision. v) Damage tolerance Crack detection and propagation have taken on great importance since the development of the fracture mechanics approach to design. A damage-tolerant material or structure is one in which a crack, when it occurs, will be detected soon enough after its occurrence so that the probability of encountering loads in excess of the residual strength is very remote. vi) Ease of inspection The product should be designed so that it is possible to employ visual methods of crack detection. In critically stressed structures special features to permit reliable NDT by ultrasonic or eddy current techniques may be required. If the structure is not capable of ready inspection, then the stress level must be lowered until the initial crack cannot grow to a critical size during the life of the structure. For that situation the inspection costs will be low but the structure will carry a weight penalty because of the low stress level. vii) Simplicity Simplification of components and assemblies reduces the chance for error and increases the reliability. The components that can be adjusted by operation or maintenance personnel 8 should be restricted to the absolute minimum. The simpler the equipment needed to meet the performance requirements the better the design. viii) Specificity The greater the degree of specificity the greater the inherent reliability of design. Whenever possible, be specific with regard to material characteristics, sources of supply, tolerances and characteristics of the manufacturing process, tests required for qualification of materials and components, procedures for installation, maintenance, and use. Specifying standard items increase reliability. It usually means that the materials and components have a history of use so that their reliability is known. Also, replacement items will be readily available. When it is necessary to use a component with a high failure rate, the design should especially provide for the easy replacement of that component. 2.6.2 Causes of Unreliability The malfunctions that an engineering system can experience can be classified into five general categories. 1. Design mistakes: Among others the common design errors are failure to include all important operation factors, incomplete information on loads and environmental conditions, erroneous calculations, and poor selection of materials. 2. Manufacturing defects: Although the design may be free from error, defects introduced at some stage in manufacturing may degrade it. Some common examples are (1) poor surface finish or sharp edges (burrs) that lead to fatigue cracks and (2) decarburization or quench cracks in heat-treated steel. Elimination of defects in manufacturing is a key responsibility of the manufacturing engineering staff, but a strong relationship with the R&D function may be required to achieve it. Manufacturing errors produced by the production work force are due to such factors as lack of proper instructions or specifications, insufficient supervision, poor working environment, unrealistic production quota, inadequate training, and poor motivation. 3. Maintenance: Most engineering systems are designed on the assumption they will receive adequate maintenance at specified periods. When maintenance is neglected or is improperly performed, service life will suffer. Since many consumer products do not receive proper maintenance by their owners, a good design strategy is to make the products maintenance-free. 4. Exceeding design limits: If the operator exceeds the limits of temperature, speed, etc., for which it was designed, the equipment is likely to fail. 5. Environmental factors: Subjecting equipment to environmental conditions for which it was not designed, e.g., rain, high humidity, and ice, usually greatly shortens its service life. 2.7 Cost of Reliability Reliability costs money, but the cost nearly always is less than the cost of unreliability. The cost of reliability comes from the extra costs associated with designing and producing more reliable components, testing for reliability, and training and maintaining a reliability organization. The figure below shows the cost to a manufacturer of increasing the reliability of a product. The costs of design and manufacture increase with product reliability. Moreover, the slope of the curve increases, and each incremental increase in reliability becomes harder to achieve. The costs of the product after delivery to the customer, chiefly warranty or replacement costs, reputation of the supplier, etc., decrease with increasing 9 reliability. The summation of these two curves produces the total cost curve, which has a minimum at an optimum level of reliability. Other types of analyses establish the optimum schedule for part replacement to minimize cost. Total Cost Cost Cost of Design And Manufacture Costs after Delivery Rm Reliability Figure: Influence of Reliability on Cost Rm – Optimum Reliability 2.8 Basic stages in the achievement of reliability Achievement of reliability can be divided into eight stages: (i) The customer or market specification – we must ascertain as accurately as possible precisely what our customer require. (ii) Prepare the design and express it as a manufacturing specification – The designer must specify what has been made in order to satisfy the requirement of the customers. (iii) Prove the design – Wherever the design is a departure from previous practice, we shall need tests on prototypes to show by demonstration that what is proposed will achieve the reliability demanded. (iv) Manufacture to specification – A high standard of quality control will be necessary to ensure manufacture to specification at minimum cost, in the time required. (v) Packaging and transport – we must ensure that the equipment is packaged and then transported to the customer’s site without incurring any significant damage. (vi) Purchase, storage, installation and commissioning of new equipment – we next look at reliability form the customer’s point of view, and consider how he decides what to purchase and how it is stored, installed and commissioned for use. (vii) Operation and maintenance – The customer must use and maintain the equipment as intended, employing operators with adequate skills and training. If there are any difficulties, the manufacturer should be anxious to help, not merely in the interests of good customer relations but also in order to learn for the future. 1 (viii) Reliability management and Prediction – Finally we consider the overall management of reliability, and how reliability of a proposed design can be predicted. 2.9 Reliability and Failure Patterns (a) Definition: When an item no longer works as intended we say it has failed. Failure, therefore, is the termination of the ability of an item to perform its required function. (b) Classification of Failures Failures are classified according to the: i) Cause – A misuse of failure is a failure attributable to the application of stresses beyond the stated capability of the item i.e. ill treated. An Inherent Weakness failure is a failure attributable to weakness inherent in the item itself, when subjected to stresses within the stated capabilities of the item i.e. failure is probably due to a design or manufacturing fault. ii) Suddenness – A sudden failure is one which could not be anticipated by prior examination. A gradual failure is one, which could be anticipated by prior examination i.e. it is possible to predict that it will occur since it takes place gradually. iii) Degree – A partial failure is one resulting from deviations in characteristics beyond specified limits, but not such as to cause complete lack of the required function i.e. the item does not work as well as it should, but it has not completely failed. A complete failure is one resulting from deviations in characteristics beyond specified limits, such as to cause complete lack of the required function. iv) Combination of the above terms – A catastrophic failure is one which is both sudden and complete. A degradation failure is one, which is both gradual and partial. (c) Failure rate Failure rate is the probability of failure in unit time of an item, which is still working satisfactorily. From a sample of 1000 parts, suppose 100 hours after the start of a test we notice that 221 parts have failed, leaving 779 still working. Then: Observed Reliability over 100 hours = (No. Surviving at t= 100)/(No. at start of the test at t=0) = 779/1000 = 0.779; After the test had run 200 hours the number of failures might have risen to 400, and then: Observed Reliability over 200 hours = 600/1000 = 0.600 Suppose that at the instant when t= 200 hours, we are able to find out that parts are failing at exactly 1.5 per hour. Since 600 are still working, we can express the rate at which they are failing as: Failure rate 8 = (No. failing per hour at instant t)/ (No. still surviving at instant t) 8 = 1.5/600 = 0.0025 per hour =25 x 10-4 per hour. (d) Failure Probability Density Function The failure probability density function gives, for any instant of time t, the probability of failure in unit time of an item which was working satisfactorily at time t= 0. In the example 1 above, the parts were failing at exactly 1.5 per hour at the instant when t = 200 hours, so we can write: Observed value of failure probability density function for instant t = (No. failing per hour at instant t)/(No. at start (t=0) :. F(t) = 1.5/100 = 15x 10-4 per hour, and total f(t) = 1.00. NB: Reliability is given by: Reliability at time t= R (t) = (No. Surviving at instant t)/ (No. at start when t=0) Failure rate = 8(t) = (No. failing per hour at instant, t)/ ( No. still surviving at instant t) :. R (t) x 8 (t) = (Non failing per hour at instant t)/ (No. at start when t = 0) = f (t) or f(t) = R (t) x 8(t) (=Reliability x failure rate); for the instant when t = 200 hours. R(t) 600/1000 = 0.6; 8 (t) = 25 x 10-4 per hour :. F(t) = 0.6 x 25 x 10-4 = 0.0015 failures per hour or f(t) = 15 x 10-4 failures per hour. (e) Calculating Reliability when Failure Rate is Constant Reliability at time t = (Number surviving at instant t)/(Number at start when t=0) If we start with, say 1000 parts and suppose that exactly 100 hours after the start of test we notice that 221 parts have failed, leaving 779 still working, then Observed Reliability over 100 hours = 779/1000 = 0.779. If after the test had run 200 hours the number of failures might have risen to 400, and then we should have; Observed Reliability over 200 hours = 600/1000 = 0.6. Suppose that at the instant when t=200 hours, we are able to find out that parts are failing at a exactly 1.5 per hour; since 600 are still working, we can express the rate at which they are failing as follows: Observed instantaneous failure rate = (Number failure per hour at instant t)/(Number still surviving at instant t) Therefore, Instantaneous failure rate (or Hazard Rate) λ =1.5/600=25x10-4 per hour. Having counted how many parts failed during each hour of the test, we could have said that x1 per cent of the 1000 parts failed during the first hour, x2 per cent during the second hour, and so on up to say, xn per cent during the hour when the 1000th part failed. Since all the parts have now failed, if we add x1+x2+x3+……..+xn, the total must be 100% or 1.00. At the instant t=200 hour, parts were failing at exactly 1.5 per hour: Observed value of failure probability density function for instant t = (Number failing per hour at instant t)/(Number at start when t=0). Therefore, f(t)=1.5/1000=15x10-4 per hour. If we add up the fraction, which fail for every instant of time, the total must be 1.00. For a continuous mathematical function we can write this as ƒt=∞f(t)=1.00, But f(t)=R(t)x λ, or f(t)=λR(t) e.g. at t=200; R(t)=600/1000=0.6, and t=0 λ=1.5/600=25x10-4 ; Therefore, f(t)=λR(t)=25x10-4x0.6=15x10-4 (as above). Reliability is cumulative. When we say that reliability of 100 parts over the period time t=0, to time t=t1, was observed to be 0.70, we mean that if we add up all those which failed between the start of the test and time t, there would be a total of 30 failures out of 100. Suppose we continued to a new time t2, and during this extension a further 3 parts failed, so that the reliability is now 0.67, therefore: Observed change in reliability from time t1 to t2=New reliability – Previous reliability 1 = 0.67 - 0.70 = -0.03 (The negative sign means that the reliability is reduced). But this is the observed value of failure probability density function for instant t, i.e. (Number failing per hour at instant t)/(Number at start when t=0) = 3/100 = 0.03. Therefore, the value of the failure probability density function f(t) = -(the rate of change of reliability), i.e. f(t)= -dR(t)/dt=λR(t); Therefore, dR(t)/R(t)= -λdt, or ƒdR(t)/dt= t= 0ƒt –λdt, Therefore, lnR(t)=-λt; or R(t)=exp(-λt), and f(t)= λR(t)=λexp(-λt) Example: A part has a constant failure rate of 0.001 per hour. Calculate its reliability over 500 operating hours. Solution: R(t)=exp(-λt)=exp(-0.001x500)=exp(-0.5)=0.61. We can also calculate the fraction of parts originally put on test, which are failing per hour at this instant when the time is 500 hours, since f(t)= λR(t)=λexp(-λt)=0.001x0.61=0.00061=61x10-4 failures per hour. (f) Relationship between Failure Rate λ, and Failure Probability Density Function f(t) Consider the probability of an item failing in the interval between t and t+dt. This can be described in two ways: (a) The probability of failure in the interval t to t+dt given that it has survived until time t which is λ(t)dt; where λ(t) is the failure rate. (b) The probability of failure in the interval t to t+dt unconditionally, which is f(t)dt, where f(t) is the failure probability density function. The probability of survival to time t is the reliability R (t). The rule of conditional probability therefore dictates that: λ(t)dt=f(t)dt/R(t); therefore, λ(t)=f(t)/R(t). However, if f(t) is the probability of failure in dt then: t= 0 ƒtf(t)dt=probability of failure 0 to t=1-R(t) t t= 0ƒ f(t)dt =1-R(t), or f(t)= -dR(t)/dt, therefore, λ(t)= -(dR(t)/dt)/R(t) t R(t) t= 0ƒ λ(t)dt= - ƒ 1 dR(t)/R(t)= -lnR(t); [NB when t=0, R(t)=1 and at t the reliability is R(t)] If failure rate is assumed to be constant: t LnR(t)= - t= 0ƒ λ(t)dt= -λt; therefore, R(t)=exp(-λt) Therefore, MTBF, θ=t=0ƒt=∞R(t)dt=t=0ƒt=∞exp(-λt)dt=1/λ [NB: Failure Rate, λ, is the probability of failure in unit time of an item, which is still working satisfactorily, i.e. 1.5/600=25x10-4 failures per unit time. Whilst, Failure Probability Density Function, f(t), is the probability of failure in unit time of an item which was working satisfactorily at time t=0, i.e. 1.5/1000=15x10-4 failures per unit time]. (g) The Bathtub Curve If a large number of a particular item/product is put on life test and the test is run until every part has failed, the graph of the observed failure rate against time since the test started is called the bathtub curve, (its name comes from the bathtub resemblance to the shape of a bathtub). For the purpose of performing various reliability studies, the bathtub curve is divided into three region as follows: Useful working life Earlyconstant failure rate wear out Failure period failure rate Rate period 1 period 8 = Failure rate O A 8 B C i) Early failure period 0-A At the start of the test the failure rate may be relatively high, but this usually falls progressively, until at A where the failure rate is approximately constant and at its lowest level. The most common causes of early failures are: Manufacturing faults – these are faults which are not detected before dispatch to the consumer. In each case two faults are implied: the product was wrongly made; and its fault was not detected before it left the factory. Manufacturing faults often account for the majority of the early failures. Design faults – when the designer completes a new design, it must be thoroughly tested as a proto type before full-scale manufacture begins. If this is not adequately done, any shortcomings in the design may then reveal themselves as early failures. Misuse – A few failures may be due to accidental customer misuse, before he/she is fully competent in operating the product. Increasing the early failure period prior to dispatch, making improvements in the manufacturing process, and improving quality control activities can all minimize the occurrence of early failures. Some of the reasons for failures in this period include substandard workmanship and parts, poor manufacturing methods, human error, inadequate quality control, and unsatisfactory debugging. NB: This is the period often covered by guarantee, during which the manufacturer agrees to make good anything, which goes wrong. ii) Constant failure period: A-B Once the early failures have been removed, the parts usually settle down to what may be a relatively long period, when the failure rate is approximately constant, from A to B, after which the failure rate begins to rise again, often quite steeply as the parts begin to wear out. Although the failure rate in the constant period is usually low, it can be very troublesome if high reliability is required. We can avoid early failures by good design and manufacture, and by running the parts on load for a time at least equal to OA. We can avoid wear out failures by replacement before time B, but we are still left with the constant failure rate period, right through the normal working life. Failures in this period are unlikely to be due to any single cause, it is usual for failures from a wide variety of causes to occur at random, with no obvious pattern, except that the failure rate is roughly constant. Some of the reasons for failures in this period are undetectable defects, low safety factors, high-unexpected random stress, abuse, and natural failures. iii) The wear out failure period Everything wears out sooner or later, and so after B the failure rate rises again - here the failure rate increases with time. The failures occurring in this period are no longer random and there causes include aging, friction, wrong overhaul practices, poor maintenance, and corrosion. 1 (h) System A system is used to denote any complete installation or equipment. The failure pattern of a system, or indeed any assembly of parts, can be regarded as the sum of the failure patterns of the individual parts, but there may be additional failures as follows: (i) Failures may occur at what are termed the interfaces between two parts. For example there may be failures in soldered jointed or connectors, as well as in the parts themselves. (ii) One part may affect the performance of another. Thus if one part fails, it may overload other perfectly good parts and cause them to fail as well. Whenever a system fails we repair it, probably by replacing the faulty part with a new one. When repair is no longer economical, we buy a new system. Thus each system will be a different age, and each of the large number of parts it contains will have its own failure pattern. The addition of so many small failure patterns is likely to produce a roughly constant failure rate for the whole system. (i) Failure Rate and Mean Time between Failures (MTBF) Consider a batch of n items and that, at any time t, a number k have failed. The cumulative, t, will be nt if it is assumed that each failure is replaced when it occurs. (i) Failure Rate: This is the ratio of the total number of failures to the total cumulative observed time for a stated period in the life of an item. If 8 is the failure rate of the n items then the observed 8 is given by 8 = k/t. (ii) Mean time between failures (MTBF): This is the mean value of the length of time between consecutive failures (computed as the ration of the total cumulative observed time to the total number of failures) for a stated period in the life of an item. If θ is the MTBF of the n items then the observed MTBF is given by θ = t/k i.e. θ = 1/8. (iii) Mean Time to Fail (MTTF): This is the ratio of cumulative time to the total number of failures for a stated period in the life of an item. Again this is t/k. The only difference between MTBF and MTTF is in their usage. MTTF is applied to items that are not repaired, such as bearings, and transistors, and MTBF to items, which are repaired. It must be remembered that the time between failures excludes the down time. (iii) Mean Life: This is defined as the mean of the times to failure where each item is allowed to fail. While MTBF and MTTF can be calculated over any period as, for example, confined to the constant failure rate portion of the Bathtub Curve, mean life, on the other hand, must include the failure of every item and therefore takes into account the wear out end of the curve. Only for constant failure rate situations are they the same. 2.10 Redundancy (a) Types of redundancy: However care in design of system and parts are taken, we may find that we have still not achieved the overall reliability demanded. This may be primarily because some of the units, which make up the system are insufficiently reliable. Hence we may decide to duplicate them, so that if one unit fails there is another similar unit there to carry on working, and so avoid failure of the whole system. This technique is called redundancy. Redundancy is the provision of more than one means of accomplishing a given function. Example: aircrafts have three identical altimeters so that if one goes wrong readings can be taken from the other two (which should give the same readings); 1 in hospitals there is always an emergency supply (in case the mains electricity fails), often from a standby motor generator; a bicycle wheel has several spokes, several can break without a serious drop in performance or reliability. This is called partial redundancy; a spare wheel is provided for vehicles in case one is punctured etc. There are two main types of redundancy (i) Active redundancy: Here all the alternative means of achieving a given function are energized whenever that section of the system is operating. Thus the altimeter is an example of active redundancy. (ii) Standby Redundancy: Here one of the alternatives is energized at a time, and there is provision so that if one fails another can be switched on. (a) Active Redundancy: Suppose that one unit in our system is insufficiently reliable, so we decide to incorporate three units in active redundancy. They each perform the same function, and are designed so that the system will continue to operate so long as at least one of them is working. The three units may be identical, but as this is not necessarily so, we will designate the reliability of each by R1, R2 and R3. A Unit 1 Unit 2 Unit 3 R1 R2 R3 B The electrical analogy is that current will flow from A to B so long as there is an unbroken circuit through at least one unit. The probability that at least one of them is still working will give the overall reliability of the units (in parallel). We calculate the probability that each unit will fail: Probability that unit 1 fails = F1=(1-R1) Probability that unit 2 fails = F2=(1-R2) Probability that unit 3 fails = F3=(1-R3) Probability that all 3 units fail = Fb=F1 x F2 x F3 Probability the block of 3 units in parallel operates satisfactorily is Rb=(1 - Fb)=1 - (F1 x F2 x F3) Therefore in general when there are k units in parallel Rb = 1 - (F1 x F2 x ---- x Fk) Example 1: suppose R1=R2=R3=0.90 Then probability unit R fails=F1=(1-R1)=(1-0.90)=0.10=F2=F3 Probability all 3 units fail =Fb=F1xF2xF3=0.1x0.1=0.001 Probability block of 3 units operates as intended =Rb=1-Fb=1-0.001=0.999 Thus although one unit alone would only be 0.90 or 90% reliable, three units in parallel are 0.999 or 99.9% reliable. For reasonable gain in reliability there is a limit to how units you can put in active redundancy. Thus: 1 Two units in parallel will in many cases give a useful improvement in reliability. If however we insert a third unit, the additional improvement is much smaller, and in general there is no advantage in putting large numbers of units in parallel. We get the biggest increase when the unit reliability is around 0.35 and 0.50. If the unit reliability is already 0.85 or above, two units in parallel may be useful, but the use of three upwards gives little further improvement. However, when the unit reliability is very low, it is difficult to get acceptable block reliability from the use of redundancy alone because if our units are unreliable the use of more of them only puts in more unreliability and this offsets the gain from redundancy. (c) Detection of a failed unit One practical problem with active redundancy is that of detecting when a unit fails. Thus if two units are in active redundancy and one fails, the other will carry on working and the operator may not even know that a failure has occurred. This is dangerous, because the protection provided by redundancy has now gone. Some time later the second will also fail, and this time the system will fail with it. Therefore it is essential to have a detection device, which will indicate when a unit fails, so that we can ensure that it is restored to working order. (d) The MTBF when active Redundancy is used. If we have two units in active redundancy, the overall MTBF will not be doubled because both are energized together from the start and so it depends upon how long before the second fails. If there are k identical units in parallel, each with a constant failure rate v, then the mean time between failures, (MTBF) from the whole block is given by: MTBF = θb = 1/(8).+ 1/(28 )+1/(38 )+………+1/(k8 ) (e) Combination of series and parallel units In practice a system may present itself a combination series and parallel units. In effect, units will be in parallel whenever there is an alternative path which will enable the system to work satisfactorily, and they will be in series whenever it is essential for all the units concerned to work simultaneously; Take an example of the system shown below: A R4=.90 R1=.95 R5=.95 R2= R3=.80.80 R6= R8=.85.85 R7= R9=.85.85 B 1 Look for units in series within a parallel configuration, and calculate the reliability of a single equivalent unit. Reduce each parallel arrangement to a single equivalent unit. Repeat the above, always dealing with the smallest recognisable configurations first. R6 and R7 are in series, within a parallel arrangement. Equivalent reliability = R67 = R6 x R7 = 0.72; Similarly R89 = 0.72 Next deal with the two parllel sections. Since R2 = o.80, F2 = (1.0 - 0.80) = 0.20, and the probability that R2 and R3 both will fail = F23 = F2 x F3 = 0.20 x 0.20 = 0.04, and therefore, reliability of R2 and R3 = R23 = (1.0 - -0.04) = 0.96 Similarly for R67 and R89, we have : F67 = (1.0 – 0.72) = 0.28, and F89 = (1.0 – 0.72) = 0.28, and therefore, the probability that both will fail = 0.28 x 0.28 = 0.08. Therefore, the reliability for R67 and R89 = R6789 = (1.0 – 0.08) = 0.92 A R1= R4=.95.90 R5=.95 R23=.96 R6789=.92 B Next take each series limb in turn; R123 = R1 x R23 = ).95 x 0.96 = 0.91 R6789 = 0.90 x 0.95 x 0.92 = 0.79 We now have a straightforward parallel configuration; F123 = (1.0 – 0.91) = 0.09 F456789 = (1.0 – 0.79) = 0.21 Therefore, the probability the whole system will fail = Fs = 0.09 x 0.21 = 0.0189, and the reliability of the whole system = Rs = (1.0 – 0.0189) = 0.9811 (f) Interconnections within series/parallel arrangements A A RA1 RA2 RA1 RA2 RB1 RB2 RB1 RB2 B B 1st alternative: 2nd alternative: A parallel arrangement A parallel arrangement Without connections With connections 1 Suppose every unit has a reliability of 0.90: With 1st alternative - the whole system will work if either: RA1 and RB1 both work, or RA2 and RB2 both work. However there are two more possible combinations, which are apparently unacceptable, namely: RA1 and RB2 RA2 and RB1 We could make both these latter combinations acceptable merely by modifying the configuration to the 2nd alternative in which there is a cross connection in the middle of the network. Reliability of the whole system in the 1st alternative = 0.964 and reliability of the whole system in the 2nd alternative = 0.9801. Thus the 2nd alternative is better than the 1st alternative for the same units, in the same application. (g) Standby Redundancy. When standby redundancy is used, only one path is energized at a time and the remainder are idle although ready to be brought into action should a failure of the first demand it. This is the case with the standby motor generator in a hospital operating theatre. At first sight it is a more attractive proposition than active redundancy, because the spare units remain new and unused until required instead of wearing out all the time the equipment is operating. The supposed advantage can be set out in terms of the MTBF. For one path only, MTBF=1/8 =θ For two paths in active redundancy, MTBF=1/8 +1/(28 )=3/(2 θ) For two paths in standby redundancy, MTBF=1/8 +1/8 =2 θ However, before any conclusions about the merits of standby redundancy are made, the following facts should be considered: (i) The standby unit is not carefully packed away in a specially designed storeroom. Usually it is out on the job, alongside the unit, which is operating. Therefore it can be affected by the environment due to, say vibration, grease, dust etc. Thus the assumption that it will remain in perfect working order, ready for immediate use, may be quite incorrect. It may well have an appreciable failure rate while waiting to be used, and the longer it remains on standby duty the greater the risk that it will not work should it be required. (ii) For many systems, the moment of switch on is particularly hazardous. Stray surges are liable to wander around the circuits and may well damage any part, which is already a bit doubtful. With active redundancy, however, the second circuit is switched on with the first and there is time to check that it works before it is required. (iii) Some parts deteriorate if left unloaded for long periods. Large electric furnaces are best kept at least partially energized, since heater failures often occur if for any reason they have to be completely switched off and allowed to cool. (iv) It may be possible for two units in active redundancy to share the work, so that each only has to work at half load. This effectively de-rates both of them, and so may lead to a longer working life. 1 (v) In many cases the operator may not notice the failure and starts the standby unit. Usually it will be necessary to install further devices to: (a) detect that the main unit has failed (b) Switch in the standby automatically. If either a or b above do not work correctly, the standby unit will not come into operation, even though it is in perfect working order. The detection device will normally be energized whenever the main unit is working, and there are two obvious ways in which it can fail: (a) it can fail to detect that the main unit has ceased to function (b) It can cause the standby unit to switch in when the main unit is still working correctly. The switchover device will not normally be energized, but it must function correctly when required. It can also fail in two ways: (a) it can fail to switch in the standby unit when required (b) It can switch in the standby unit when it is unnecessary. Clearly in choosing between active and standby redundancy, we shall have to consider each individual case in its merits, but much often hinges on two factors: (i) The reliability of the detection and switching device, of the standby unit. We use redundancy because we are not satisfied with the reliability of one working unit on its own. Unless therefore the detection and switching device are appreciably more reliable than the main working unit, we may be introducing as much unreliability into the system as we are removing by having the standby unit. (ii) It is a fundamental assumption in using a standby unit that its failure rate when shut down, but probably exposed to normal operating conditions, is appreciably lower than if it were energized under the same conditions. If this is not so, then active redundancy may well be on better proposition. (h) Partial Redundancy In active or standby redundancy it is assumed that the system will continue to work provided that at least one path is still in operation. There are however, many cases where, although some failures are permitted, more than one path must continue to work if the system as a whole is to work, and this is called partial redundancy. The following are examples. (i) The suspension bridge is hung from the chains by a considerable number of vertical members. It might be possible for a few of these vertical members to break without endangering the bridge, but clearly this could not go on until only one was left. (ii) An aircraft with four engines could almost certainly land safely if three were still working, but this might be very difficult if only one survived. (iii) Usually several spokes of a bicycle wheel can break without a serious drop in performance or reliability. However, a minimum number of spokes (roughly evenly distributed around the hub) must remain. (i) Redundancy and Cost Although improving reliability and maintainability, redundant units require more space and weight, capital cost is increased and the additional units need more spares and generate more maintenance. Systems availability is thus improved but both preventive and corrective maintenance costs with the number of units. 2 3.0 Maintainability Maintainability is the action taken during the design and development, and installation of a manufactured product to include features that will increase ease of maintenance, reduce required man-hours, tools, logistic costs, skill levels and facilities and ensure that when used in the field the product will have minimum downtime, and life-cycle support costs. From this definition, the general principles maintainability, therefore, include lowering or eliminating altogether the need for maintenance, reducing life cycle maintenance costs, lowering the number, frequency, and complexity of required maintenance tasks; establishing the extent of preventive maintenance to be performed; reducing the mean time to repair (MTTR); and providing for maximum interchangeability. On the other hand maintenance refers to the measures taken by the users of a product to keep it in operable condition or repair it to operable condition. 3.1 The importance, Purpose, and Results of maintainability efforts The objectives of applying maintainability engineering principles to engineering systems and equipment include: Reducing projected maintenance time and costs through design modifications directed at maintenance simplifications. Determining man-hours and other related resources required to carry out the projected maintenance. Using maintainability data to estimate item availability or unavailability. When maintainability engineering principles have been applied effectively to any product, the following results can be expected. Reduced downtime for the product and consequently an increase in its operational readiness or availability. Efficient restoration of the product’s operation condition when random failures are the cause of downtime. Maximizing operation readiness by eliminating those failures that are caused by age or wear-out. Because engineering should consider maintenance requirements before designing a product, maintainability design requirements can be determined by processes such as maintenance engineering analysis, the analysis of maintenance tasks and requirements, the development of maintenance concepts, and the determination of maintenance resource needs. Because equipment downtime consists of many components and sub-components, there are numerous engineering and analytical efforts required to reduce downtime. The three main components of equipment downtime are logistic time, administrative time, and active repair time. 1. The three main components of equipment downtime are logistic time, administrative time, and active repair time. (a) Logistic time is that portion of equipment downtime during which repair work is delayed because a replacement part of other component of the equipment is not immediately available. Logistic time, therefore, is largely a matter of management. By developing effective procurement policies can minimize it. (b) Active repair time is that portion of equipment downtime during which the repair staff is actively working to effect a repair. Its six elements are fault location time, preparation time, failure verification time, actual repair time, 2 part acquisition time, and final test time. Usually, the length of active repair time reflects factors such as product complexity, diagnostic adequacy, nature of product design and installation, and the skill and training of the maintenance staff. (c) Administrative time is that portion of equipment downtime not taken into consideration in action repair time and in logistic time. This time (that normally include wasted time) is a function of the structure of the operational organization and is influenced by factors such as work schedules and the non- technical duties of maintenance people. 3.2 Maintainability cost Considerations In many cases, the cost of acquiring a product is less than the cost of ownership over the product life cycle. Cost of ownership includes operation costs (such as the cost of personnel, facilities, and utilities), maintenance costs, the cost of test and support equipment, retirement and disposal costs, technical data costs, the cost of training operations and maintenance personnel and the cost of spares, inventory, and other support materials. Clearly, reducing the cost of ownership is critical if equipment is to be cost-effective. The opportunity for creating savings in a products life cycle cost decreases dramatically in the progress from the concept design and advance planning phase to the production and construction phase. 60% to 70% of the projected life cycle cost can sometimes be locked in by the completion of the preliminary design phase. This means the greatest impact on costs comes from decisions made during the early design phases. 3.3 Maintainability Costs Maintainability is an important factor in the total cost of equipment. An increase in maintainability can lead to reduction in operation and support costs. For example, a more maintainable product lowers maintenance time and operating costs. Furthermore, more efficient maintenance means a faster return to operation or services, decreasing downtime. Ways to improve equipment maintainability are: Design of built-in test points, Use of reduced maintenance parts, Increase in automatic test equipment use, Increase in self-checking features, Easier access for maintenance, Improvement and number of detailed troubleshooting manuals, and Discard-at-failure maintenance. Elements to invest in (elements of investment cost) so as to increase maintainability are: Prime equipment, System engineering management, Repair parts, Support equipment; Data, Training system test and Evaluation; and New operational facilities. 3.4 Maintainability Design Considerations 2 A cost effective and supportable design must take into account the maintainability considerations that arise at each phase in the life cycle of the system or product. Careful planning and systematic effort are needed to bring attention to important maintainability design factors such as maintainability allocation, maintainability evaluation, maintainability and design characteristics; maintainability parameters, and maintainability demonstration. Each of these factors involves various sub-factors – e.g. packaging, standardization, inter- changeability, human factors, safety, and testing and check out all play a role in the final products maintainability design characteristics in every aspect of maintainability design. The maintainability design characteristics are the features and design characteristics that help reduce downtime and enhance availability. The goals of maintainability design include minimizing preventive and corrective maintenance tasks; increasing ease of maintenance. Decreasing support costs; and reducing the logistical burden by decreasing the resources required for maintenance and support, such as spare parts, repair staff, and support equipment. The most frequently addressed maintainability design factors, ranked in descending order, are: accessibility, test points; controls; labelling and coding; displays; manuals; check lists, chart and aids; test equipment; tools; connectors; cases; covers and doors; mounting and fasteners; handles; and safety factors; Other factors are standardization, modular design, inter-changeability ease or removal and replacement, indication and location of failures, illumination, lubrication, test adapters and test hook ups, servicing equipment, adjustments and celebrations installation, functional packaging, fuses and circuit breakers; cabling and wiring, weight, training requirements, skill requirements, required number of personnel, and work environments. (a) Standardization This important design feature restricts to a minimum the variety of parts and components that a product system will need. Standardization should be a central goal of design, because the use of non-standard parts may lead to lower reliability and increased maintenance. Some of the primary goals of standardization include: maximizing the use of common parts in different products, minimizing then number of different types of parts, components, assemblies and other items; maximizing the use of inter-chargeable and standard or off-the- self-parts and components; minimizing the number of different models and makes of equipment in use; controlling and simplifying inventory and maintenance; reducing storage problems, and the effort spent on part coding and numbering. The benefits of standardization are: (i) Reduce manufacturing costs, design time and maintenance time and cost. (ii) Reduce the danger of incurrent use of parts. (iii) Facilitate cannibalising maintenance approaches. (iv) Reduce procurement, stocking, and training problems. (v) Leads to greater reliability. (vi) Reduce errors in wiring and installation caused by variations in characteristics of similar items. (vii) Reduces the chance of accidents that stem from wrong and unclear procedures. (viii) Eliminates need for special or close tolerance parts. (b) Inter-changeability This is an important maintainability design factor that is made possible through standardization. Inter-changeability means that, as an international aspect of design, any component, part, or unit can be replaced within a given product or piece of equipment, by any similar component, part, or unit. There are two types of inter-changeability; functional inter- 2 changeability and physical inter-changeability. In functional inter-changeability, two specified items serve the same function. In physical inter-changeability, two items can be mounted, connected and used effectively in the same locations and the same manner. (c) Modularisation Modularisation is the division of a system or product into physically and functionally distinct units to allow removal and replacement. Each system or sub-system, from the highest to the lowest level, can be designed as a removable entity. Questions of cost, practicality, and function dictate the degree of modularisation. However, modular construction will reduce training costs or provide other concrete benefit. Modularisation allows use of disposable modules, which are designed to be discarded other than required once they fail (because repair is either costly or impractical). (d) Simplification Probably the most difficult element of maintainability to achieve, but the most important, is simplification. Simplification should be the constant goal of design. Even a complex product or piece of equipment should appear simple and straightforward to the user. A good designer incorporates important functions of a product into the design itself and uses as few components as sound design practices will allow. (e) Accessibility Accessibility is the relative ease with which a part or piece of equipment can be reached for service, replacement, or repair. Lack of accessibility is an important maintainability problem and a frequent cause of ineffective maintenance. (f) Identification Adequate labelling or marketing of parts, controls, and test points facilitates maintenance tasks such as replacement and repair. If a repair person is unable to readily part points, or controls, maintenance tasks become more difficult, take longer to perform, and are more likely to be performed incorrectly. 3.5 Maintainability Tools Two methods developed to analyze both reliability and maintainability is Failure Mode and Effects Analysis (FMEA) and Fault Tree Analysis (FTA). FMEA is a structured qualitative analysis of a system, subsystem, component or function that highlights potential failure modes, their causes, and the effects of a failure on system operation. When FMEA also evaluates the criticality of the failure, that is, the severity of the effect of the failure and the probability of its occurrence [Criticality assessment ranks potential failures identified during the system analysis based on the severity of their effects and the likelihood of their occurrence], the analysis is referred to as Failure Mode, Effects, and Criticality Analysis FMECA) and the failure modes are assigned priorities. There are three distinct types of FMECA: System level, design level, and process level FMECA. Of these three levels, the highest level of analysis is the system level FMECA, which usually consists of a collection of subsystem FMECAs. Performed in the initial design concept phase, the system level FMECA highlights potential system or subsystem failures so that they can be prevented. The design level FMECA helps identify and prevent failures stemming from the product design. It analyzes the design that has been developed and examines how failures of individual items would affect the system functioning or operation. The purpose of the process level FMECA is to analyze the process by which the product or system is to be built and assess how potential failures in the 2 manufacturing or service process would affect the product/system functioning or operation. All the three types of FMECA consist of the following basic steps: Understanding system parts, operation, and mission. Identifying the hierarchical, or indenture, level at which the analysis is to be performed. Defining each item expected to be analysed – for example, component, module, or subsystem. Establishing associated ground rules and assumptions – for example, system mission and operational phases. Identifying possible failure modes for each item. Determining the effect of each item’s failure for every possible failure mode. Determining the effect of group failures – failures of more than one item – on system operation and mission. Identifying methods, procedures, or approaches for detecting potential failures. Determining any provisions or design changes that would prevent failures or mitigate their effects. 1. Failure Mode and Effects Analysis (FMEA) This is a reliability analysis technique that also applies to maintainability analysis. The technique systematically determines the basic causes of failure and defines measure to reduce their effects. Furthermore it can be applied to any system level. The failure mode is the specific way in which the item fails to carry out its intended mission. The failure cause is the reason the failure took place. The failure effect is the result of the failure for each failure mode. Failure Mode: Examples are open or short circuits; reduced output, loss of function, and loss of output. Failure Cause: Examples are wear, vibration, contamination, and voltage surge. Failure Effect: Examples are loss of communication, mission abort, reduced control, and injury or damage to personnel or equipment. FMEA is basically a qualitative approach to determining the reliability, maintainability, and safety of a given design by taking into consideration potential failures and their resulting effects. The seven major steps in performing FMEA are: Define the boundaries and detailed requirements for the system or piece of equipment under consideration. List all of its components. List all possible failure modes, describe each, and identify the component that would be involved. Assign a failure rate to each component failure mode. List the effects of each failure mode. Enter remarks for each failure mode. Review each critical failure mode and take appropriate action. In using this analysis, the effect identified can be quite different depending on the objective of the analysis. For example: In Reliability Analysis. The effect considered is the effect on the system’s or equipment’s performance or ability to function. In Maintainability Analysis. The effects considered include the symptoms through which failure to be pinpointed and the components that will require replacement as the result of the failure. 2 In Safety Analysis. The effects to be considered are damage to other systems and equipment and possible danger to people. Some of the advantages of the FMEA method are that it employs a systematic procedure to categorize hardware failure and identifies all possible failure modes and their effects on performance, personnel, and equipment. It is useful for comparing design, simple to understand, and helps identify methods of detecting the various possible failures. 3.6 Maintainability and Safety Safety means either freedom from hazards or protection against hazards. It is one of the most important factors in designing for maintainability. As individuals perform maintenance tasks, they are exposed to hazards or accidents. Many of these hazards and accidents are due to careless design or design that does not give adequate attention to human factors and safety features. Other factors include hazardous environmental conditions and the creation of hazards by maintenance and operating personnel themselves when they perform their assigned tasks carelessly. The key to overcoming many of these difficulties is to “design in” safety features that will protect operators, maintenance personnel, and the equipment itself. (It should be remembered that equipment that is dangerous to people is by definition not maintainable). During the equipment design phase, professionals have many methods at their disposal for eradicating or minimizing hazards to people and equipment. The basic objective of all these approaches is hazard identification and control. Some of the safety analysis techniques are hazard analysis, failure mode and effects analysis, and fault tree analysis. Hazard Analysis Method – This safety analysis tool determines the safety requirements for people, procedures, and equipment used in testing, operations, maintenance, and logistic support. This method also determines the compliance of system and equipment with specified safety requirements and criteria. (For FMEA and FTA see elsewhere) Comparison of FMEA and FTA No. FMEA FTA 1. It is a hardware-oriented method. It is an event-oriented approach. 2. It has a broader scope with It has a restricted scope with restricted depth of analysis. in-depth analysis. 3. It is an optimum approach for It is an optimum approach for multiple failures. single failures. 4. It does not require analysis of It provides documentation to failures that have no effect on the ensure that each and every operation under investigation. potential single failure has been investigated. 5. It does not require investigation of It highlights all external all external influences. influences contributing to loss, for example, environment, test procedures, and human errors. 3.7 Safety and Human Behaviour The safety of the people who operate and maintain equipment is of utmost importance. During the design phase, appropriate information on human behaviour can ultimately lead to safer and more maintainable design. The following are important measures for reducing accidents due to human error: Designing error-free mating parts 2 Providing correct tools and making regular adjustments to safety equipment Developing effective support procedures Proper inspecting of all tasks Making each individual conscious of hazards involved in his/her assignments Making workers safety conscious Proper training for performing tasks. 3.8 General Maintainability Design Guidelines Some of the important general maintainability design guidelines are: (i) Design to minimise requirements for tools, maintenance skills, adjustments, and other aspects of maintenance. (ii) Group sub system for easy location and identification (iii) Provide trouble shooting techniques, test points, etc (iv) Used standard parts to extent possible (v) Provide for visual inspection (vi) Avoid the use of large cable connectors (vii) Use plug-in modules (viii) Design for safety. 3.9 Maintainability of new Equipment) What to consider for the maintainability of a new system: (a) Speed of repairs: Ensure that maintenance and repairs can be carried out easily. Ensure that spares are easily available. It may be costly to have the whole system out of service merely because a vital spare cannot be obtained. On the other hand it is expensive to hold a running stock of essential spares. Ensure that maintenance personnel with the necessary skills are easily available. (b) Performance after repairs: Some delicate or complex pieces of equipment do not appreciate the disturbance caused by a major repair and never really work as well afterwards – hence it is important to purchase a reliable system in the first place. 3.10 Designs for Ease of Operation (a) The Competence of the Operator: In designing equipment, we must consider how skilled or otherwise they operators are likely to be. The system should be designed to be as easy as possible to operate correctly, and as difficult as possible to operate incorrectly. If two controls should never be operated simultaneously then it helps to interlock them, so that the operation of either puts the other out of action. Alternatively the controls may be spaced so far apart that one operator cannot reach both at the same time. However, it is essential to devise the equipment so that no injury or damage is caused by any mistakes, which are made. (b) The effect of fatigue and working conditions: The more tiring equipment is to operate the greater the risk of a mistake and the shorter the period for which an operator can work without a rest. Thus: Constant bending, either to pick up materials or to operate very low controls, is unnecessarily tiring. Heavy weights should be moved mechanically rather than by the operator’s own exertion. 2 If the equipment generates excessive heat, it is in principle better to design so that it is not dissipated in the direction of the operator, rather than to leave the customer to provide protective clothing. The latter is usually uncomfortable, and so may not always be worn. (c) The layout of controls and tools: Where tools must be used in a particular order it is a great help if they are presented to the operator in that order. Controls tool should be arranged logically. It helps if all the controls in a particular group are different so that the operator knows by feel, which he is holding. Controls should be labelled as simply as possible, but with enough information so that the operator understands at once what they do, without consulting the instruction manual to find out. (d) Indicators: Include dials, gauges, lamps bells, etc. which will show the working the working state of the equipment. The majority are either visual or aural. Thus a light may come on or a bell may ring to warn the operator that a particular cycle of manufacture is complete. A light to show a particular part of system is on or off is cheaper and easier to observe. However, a meter conveys more information and therefore it must be specified whenever an operator needs it. (e) The arrangement of meter dials on the display panel: Meter dials are easiest to read if they are mightily at eye level. When they are very high or low they are not only tiring, but there will inevitably be errors due to parallax. Avoid highly reflective surfaces, such as charmed meter rims or shiny instrument panels, since these may dazzle the operator. Ensure that meters are calibrated directly in the units to be observed. A dial should be calibrated so that it can be easily read to the required accuracy. Digital type maters, which display actual numbers, are preferable as they are quicker to read and less prone to operator error. (f) Electrical Controls and Responses: Frequently a control is related to a particular meter, so that if, say, the voltage shown is low, the operator turns up the knob, which controls it. In such cases: The knob should turn the same way as the needle, so that turning it clockwise causes the needle to go clockwise. The knob and the dial should be placed as close together as possible, and there should be no doubt which knob control which dial e.g. the control should not be round the back of the equipments, when the dial is in front. There should not be too much delay between the alternation of the control and the response of the meter. (g) Mechanical controls and Responses: As with the electrical systems mechanical equipment apply same principles as: Wheels and levers should more logically with respect to the function they control Controls should be placed where the operator can see at once the effect of operating them. He should not have walk round to see whether soothing which he has started up is in fact running. Response to the operation of controls should be quick enough to avoid hunting (i.e. delaying them overshooting or undershooting). Try to avoid controls, which require a considerable physical effort to operate them. 2 NB: Must controls tend to use a mixture of mechanical and electrical/electronic principles. (h) Alarm signals: In both electrical and mechanical equipment we should include alarms to warn the operation when all is not well. In some cases (where delay in shutdown can be serious) an alarm as well as an automatic shutdown will be required, because once the plant has stopped adjustments and repairs must be immediately put in hand to get it working again. (i) Ergonomics or human engineering: Fit the job to the operator’s natural abilities rather than assuming that the operator will somehow cope with badly designed equipment. (j) Imaginary Malfunctions: Sometimes when a system does not appear to work properly, it is merely because it has not been correctly operated. The operator may not realise this. This is because the operator’s skill is not quite equal to the demands of the system. Designing the system so that it is as easy as possible to operate can help this. 3.11 Design for ease of maintenance (a) The Importance of Availability When a breakdown occurs, the customers chief interest will be to get the system back into service with a minimum of delay and expenses; he is more likely to be concerned with its availability than with its reliability. Hence we must consider how to make the system quick and easy to services. (b) Fault Location Routine A set routine for fault location is essential otherwise a lot of time can be wasted. Therefore a fault finding routine should be incorporated in the production design. It should be as simple as possible, and set out on a step by step basis. The approach will probably be first to determine which block in the system contains the fault, then which unit within that block is faulty and so on so that in as few steps as possible one is able to locate the actual part which has failed. (c) Provision for Fault Location Having decided roughly what the fault location routine is to be, the designer must consider how he can make it easy to carry out. Possibly the best solution is to arrange for lights and meters on the control panel, to indicate precisely where a fault is. The next possibility is to provide check points, so that the required test equipment can be plugged in. It may also be desirable to provide alternative units which can be plugged in and so allow the maintenance engineer to eliminate sections of the actual system in turn. (d) Test Equipment The test equipment should be designed to be as reliable as possible, using techniques like redundancy where necessary and it should be as easy as possible to check and maintain. (e) Fault Correction Once a fault has been located in a system, then repairs and adjustments must be carried out and again the designer can help to make this as easy as possible. Thus: i) Parts that have high failure rates must be easy to identify and get at. 2 ii) Fixing devices must be easy to release ad reassemble. iii) Thus nuts, bolts and screws in awkward places should be avoided. It is difficult to insert or tighten a screw into a position, which on e cannot see or perhaps cannot even feel directly with the fingers. iv) Plugs and sockets are attractive, because they make dismantling and reassembly easier, although their own reliability may present a problem v) Make sure that visual inspection is as ease as possible, and that parts such as nuts and screws do not fall down inside when released immediately fall down inside. vi) Where appropriate, make it possible to replace a complete unit, so that the faulty one can be taken away for repair. (f) Working conditions of the Maintenance Staff The designer must try to anticipate the actual conditions under which maintenance may have to be done. Thus: The sill of the maintenance staff may be very limited indeed, and in remote areas there may be no one whom they can turn for help especially with mobile systems e.g. vehicles. The maintenance equipment available may be similarly limited. The repair workshop at base probably has reasonable working conditions, but out on the job they may be most unreasonable. Consider what spares are likely to be available, remembering that it is much more difficult to obtain special parts than standard ones. (g) Instruction Manuals A system must have a clear concise instruction manual. However, information in the instruction manual should be minimised but clear. Thus: The method of operation should as far as possible, be clear without reference to the manual. Controls should be arranged and labelled so that their use is self-explanatory. The same applies to maintenance. It is much better for adjustment points to be readily visible and accessible, than to have to consult the instruction manual to find out where they are hidden. Thus the objective should be to make manuals as much as possible to be unnecessary or with very little text by: using clear diagrams wherever they would be useful. A simple sketch often saves a lot of writing, which the operator may not fully understand. Trying to put a diagram on the same page as the text to which it relates Keeping the wording concise and simple. Avoid technical terms which may mean little to the operating and maintenance staff. Making sure that it is easy to locate any piece of information which may be required. Provide a good cross-referenced index, so that the users can find what they want even though their terminology may be somewhat different from that used by the manufacturer. (h) After-Sales Service The complexity of modern equipment makes it increasingly impracticable for customers to do more than routine maintenance, and some customers anyway prefer to rely upon the services of the manufacturing company. Hence after-sales services must be provided. (i) Stock control of Spares 3 Spares parts and replacement parts should be stocked and supplied for models in current production as well as for those which have gone out of production but are still used by customers. 3.12 Design for Serviceability Serviceability is concerned with the ease with which maintenance can be performed on a product. Many products require some form of maintenance or service to keep them functi

Use Quizgecko on...
Browser
Browser