Enhancing Software Reliability Modeling PDF
Document Details
Uploaded by Deleted User
National Tsing Hua University
2011
Chao-Jung Hsu, Chin-Yu Huang, Jun-Ru Chang
Tags
Summary
This paper explores enhancing software reliability modeling and prediction by introducing a time-variable fault reduction factor (FRF). The authors analyze real data to understand FRF trends and integrate it into software reliability growth models. The results indicate that adjusting FRF values affects release time and development costs.
Full Transcript
Applied Mathematical Modelling 35 (2011) 506–521 Contents lists available at ScienceDirect Applied Mathematical Modelling journal homepage: www.elsevier.com...
Applied Mathematical Modelling 35 (2011) 506–521 Contents lists available at ScienceDirect Applied Mathematical Modelling journal homepage: www.elsevier.com/locate/apm Enhancing software reliability modeling and prediction through the introduction of time-variable fault reduction factor Chao-Jung Hsu a, Chin-Yu Huang a,b,*, Jun-Ru Chang b a Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan b Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu, Taiwan a r t i c l e i n f o a b s t r a c t Article history: Over the past three decades, many software reliability models with different parameters, Received 8 May 2009 reflecting various testing characteristics, have been proposed for estimating the reliability Received in revised form 18 June 2010 growth of software products. We have noticed that one of the most important parameters Accepted 6 July 2010 controlling software reliability growth is the fault reduction factor (FRF) proposed by Musa. Available online 13 July 2010 FRF is generally defined as the ratio of net fault reduction to failures experienced. During the software testing process, FRF could be influenced by many environmental factors, such Keywords: as imperfect debugging, debugging time lag, etc. Thus, in this paper, we first analyze some Fault reduction factor (FRF) Software reliability growth model (SRGM) real data to observe the trends of FRF, and consider FRF to be a time-variable function. We Software testing further study how to integrate time-variable FRF into software reliability growth modeling. Non-homogeneous Poisson process (NHPP) Some experimental results show that the proposed models can improve the accuracy of Optimal release time software reliability estimation. Finally, sensitivity analyses of various optimal release times based on cost and reliability requirements are discussed. The analytic results indicate that adjusting the value of FRF may affect the release time as well as the development cost. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction Controlling and measuring the software quality before shipping a product are the most difficult problems facing the soft- ware industry. Accordingly, software reliability becomes an important measure to help software managers with quantifying system behaviors and allocating testing resources. Since 1970s, many Software Reliability Growth Models (SRGMs) with dif- ferent assumptions have been proposed for the estimation of reliability growth of software products [1,2]. As a general the- ory of well-developed stochastic process, SRGMs can be used to describe software failure as a random process, characterized by either the time to failures or the number of failures at fixed times. From many published reports, SRGMs have been effectively proven in various real software applications [4–6] as well as recommended by a number of leading companies or research institutions [7,8], such as ANSI/AIAA, AT&T, North Telecom, Motorola, JPL, etc. It is generally accepted that SRGMs are especially useful to describe failure processes and produce reasonable projections when enough failure data are obtained. Additionally, several metrics and optimal release time can then be derived for product improvement. One important parameter that controls the growth of software reliability is the fault reduction factor (FRF) proposed by Musa. FRF is generally defined as the net number of faults removed in proportion to the failures experienced [10,11]. This parameter can be further expressed as the ratio between faults and failures or by the expected number of faults re- moved per failure in the software testing process [2,3]. In fact, FRF should be estimated and computed based on the collected * Corresponding author at: Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. Tel.: +886 3 5742972; fax: +886 3 5723694. E-mail address: [email protected] (C.-Y. Huang). 0307-904X/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.apm.2010.07.017 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 507 project data whenever possible. Previously, Musa argued that the value of FRF may be relatively stable across different pro- jects, but more studies are required [10,13]. We have noticed that Musa’s basic execution time model assumes that FRF af- fects the fault detection process. In practice, the value of FRF has been found to be a constant and less than a unity. However, some research has also shown that the value of FRF can be influenced by a number of environmental factors, such as fault dependency, human learning process, fault spawn, debugging time lag, etc. Thus, the influential degree of these envi- ronmental factors should be taken into consideration for FRF [14–18]. On the other hand, the software testing process could be viewed as a learning process where debuggers could improve their skills, so that environmental factors could decrease their influence on software products [19–23]. Therefore, we con- sider FRF to be a time-variable function that reflects the degree of environmental factors in the software testing process. In this paper, we will examine how to integrate time-variable FRF into software reliability modeling. To this end we will firstly analyze several real data and present three different patterns of time-variable FRF. Furthermore, we further propose new SRGMs and evaluate the accuracy of proposed models with other existing SRGMs. We will also demonstrate the appli- cation of proposed models and perform a sensitivity analysis between time-variable FRF and different software release policies. The rest of this paper is organized as follows. Section 2 surveys the related works of existing SRGMs and FRF. After that, new quantitative models that integrate time-variable FRF will be proposed in Section 3. In Section 4, numerical examples based on real data are used to evaluate the accuracy of the proposed models. In Section 5, sensitivity analyses between time-variable FRF and release time policies are discussed. Besides, a developed tool for software reliability analysis is then presented. Finally, some conclusions are given in Section 6. 2. Related works 2.1. SRGMs In the past three decades, different factors that affect the testing process have spurred much research on developing var- ious SRGMs [1–3,10]. There are some ways to model a software failure process. One general class of SRGMs is the non-homo- geneous Poisson process (NHPP) family of models for which failure processes are described by an NHPP property with mean value function (MVF) at time t, m(t). The derivative of the mean value function, k(t), is the failure intensity of the software which ordinarily decreases as faults are detected and removed. By using different nondecreasing functions m(t), some tra- ditionally NHPP-based SRGMs are described in the following. 2.1.1. Goel and Okumoto (GO) model This model is the most well-known SRGM by assuming that an NHPP could describe a cumulative software failure pro- cess. Its MVF is expressed by Musa et al. : mðtÞ ¼ að1 ert Þ; ð1Þ where a is the expected total number of faults to be eventually detected (i.e., initial faults) and r is the fault detection rate. Furthermore, the failure intensity function is kðtÞ ¼ arert : ð2Þ This model was validated by Navy software failure data. After that, many researchers further generalized and mod- ified the GO model in order to improve the goodness-of-fit to real failure data [21,25–28]. 2.1.2. Delayed S-shaped (DeS) model It is observed that the MVF is often a characteristic S-shaped curve rather than the exponential growth curve [2,29]. In other words, the S-shapedness can be explained by considering test-efficiency improvement during the testing phases. The MVF of the DeS model can be defined as follows. mðtÞ ¼ að1 ð1 þ rtÞert Þ; ð3Þ and the failure intensity function is kðtÞ ¼ ar 2 tert : ð4Þ It is noted that the failure intensity function increases up to t = 1/r and then begins to decrease asymptotically. A case study from applied model selections to failure logs of an ongoing project and considered the DeS as one of appropriate models. 2.1.3. Inflection S-shaped (InfS) model Another S-shaped SRGM is proposed by Ohba , assuming that some of the faults are not detectable before some other faults are removed. The MVF of InfS model can be presented as follows: 508 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 að1 ert Þ mðtÞ ¼ ; ð5Þ 1 þ /ert where / is the inflection factor. The failure intensity function is arert ð1 þ /Þ kðtÞ ¼ : ð6Þ ð1 þ /ert Þ2 In the past, some heuristic arguments about the functional form of this model were discussed by Ohba and validated by on-line test data. 2.2. FRF The terms fault and failure have a close relationship in the testing process [3,10]. A software failure is an incorrect state with respect to the specification or an unexpected software behavior perceived by the user at the boundary of the software system, while a software fault is an incorrect step, process, or data definition in a program which thus causes a software fail- ure. In the past, Musa used the fault reduction factor (FRF) to describe the relationship between faults and failures. The FRF is defined as that the net number of faults removed from the program is only a proportion of failures experienced , where the net number of faults is considered as the corrected faults minus the introduced faults. If no new faults are introduced while removing a fault, then the FRF can be reinterpreted as the average ratio of faults to failures [2,3]. In general, FRF can be expressed by [10,11] n B¼ ; 0 < B 6 1; ð7Þ m where B is the FRF, m is the total number of observed failures, and n is the total number of faults corrected without contain- ing introduced faults. We can clearly see that if B = 1, then the numbers of the faults and failures are equal. In 1993, Malaiya examined the fault exposure ratio that represents the average detectability of the faults in software , and found that this ratio could characterize a program structure (like loopiness and branchiness). It is noted that the value of FRF can also be determined by the fault exposure ratio [10,33,34] k0 B¼ ; 0 < B 6 1; ð8Þ Kfm where k0 is the initial failure intensity, K is the fault exposure ratio, and f is the linear execution frequency of the program. Besides, Friedman et al. proposed that FRF could be expressed by B ¼ Dð1 þ AÞð1 GÞ; 0 < B 6 1; ð9Þ where D is the detectability ratio, defined as the proportion of faults whose causative failures could be found. The quantity A is the associability ratio, defined as the ratio of faults discovered by code reading to faults discovered by testing. G is the fault growth ratio, defined as the increment in failure per-fault corrected. The value of D is close to 1, the value of A is close to 0, and the value of G ranges from 0 to 0.91. In Musa’s basic execution time model, the differential equation can be given as [10,12] d mðtÞ ¼ BzðtÞ ¼ Bu½a mðtÞ; ð10Þ dt where z(t) is the hazard rate function and u is the per-fault hazard rate. From this equation we notice that FRF is proportional to influence the number of detected faults (i.e., the hazard rate function). By solving Eq. (10) under the initial condition m(0) = 0, the mean value function (MVF) is given as mðtÞ ¼ að1 eBut Þ; ð11Þ where m(t) represents the expected number of faults by time t. 3. Software reliability modeling 3.1. Time-variable FRF Musa thought that the value of FRF could be averaged out from similar systems and considered as a constant value [10,12]. In reality, the number of faults is not always equal to that of failures. Some environmental factors, such as the fault spawn or the debugging time lag, may cause the ratio of FRF to vary in different situations [9,11]. Therefore, the influence of these environmental factors should be considered into FRF. On the other hand, the software testing process can be treated as a learning process, because testers are more familiar with the software product, testing environment, and software specifi- cation as time proceeds [21,23,31]. It is known that learning from mistakes is human nature [3,12,22]. Accordingly, there is a C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 509 Table 1 Summary of datasets used for examining FRF (sorted by publication date). Dataset Time unit Detected Corrected Description Reference failure fault DS1a 60 months 146 136 Web-based and integrated accounting ERP system (WebERP) dated from August 2003 to July 2008 DS2a 49 weeks 94 65 Open source project management software (OpenProj) dated from August 2007 to July 2008 DS3 17 weeks 144 143 A middle size system DS4 86 months 4538 4312 Military software system (System P1) DS5a 220 days 400 400 Subset of IBM project (ODC3). Detailed analysis uses orthogonal defect classification (ODC) DS6 21 weeks 136 136 Real time control application (System T1) of Rome air development center a Time-between-failures data recorded with calendar time were preprocessed into failure-count data. close connection between the value of FRF and the influential degree of environmental factors. Below we will discuss these two environmental factors. One factor that may change the value of FRF is imperfect or reintroduction debugging [19,26,35,36]. Some studies have shown that fault introduction may decrease testing efficiency and further affect the software reliability estimation [10,12]. From Eq. (7), if testers remove a fault and unconsciously introduce new ones, the ratio of FRF is lowered because the number of introduced faults increases. Some published data has also presented that FRF in multicomputer systems ranges from 0.925 to 0.993 due to fault spawned [9,14–17]. Another factor that may change the ratio of FRF is the debugging time lag between fault detection and correction pro- cesses. We know that software testing is a labor-intensive process of identifying the cause of software defective behavior and addressing the problem of occurrence. Each time a failure occurs, it is hard to immediately remove the causative fault [28,37,38]. Thus, several failures caused by the same fault may be identified as different ones [13,18,39]. In order to validate whether debugging time lags can affect FRF varying with time, some failure data is used and shown in Table 1. These data consists of the number of detected failures and corrected faults at each time unit. From Eq. (7), the value of FRF at different time units can be calculated. In general, these datasets may contain short-term noises. Therefore, we filter off 10% of the data points at the beginning time period. The remaining 90% of data points are used to calculate the FRF. An exponential smoothing technique is then applied to plot the curve. Fig. 1 shows the curves of FRF for each dataset. From Fig. 1a and b, we can see that the FRF seems to be a stable curve. But in Fig. 1c and d, the curve of FRF has a steady rising trend. Besides, Fig. 1e shows that FRF appears to have a decreasing trend and becomes stable in the end, and Fig. 1f appears to show that the value of FRF first declines and then starts to rise. In summary, we can conclude that FRF has three patterns: constant, increasing, and decreasing. In the next section, we will show how to incorporate time-variable FRF into software reliability modeling. 3.2. Proposed models Assumptions (1) The fault removal process follows the non-homogeneous Poisson process (NHPP). (2) The software system is subject to failures at random times caused by the manifestation of the remaining faults in the system. (3) All faults in a program are mutually independent. (4) The mean number of faults detected in the time interval (t, t+Dt] is proportional to the mean number of remaining faults in the system. (5) The proportionality is a function of time-variable FRF. (6) From these assumptions, we have the differential equation d mðtÞ ¼ rðtÞ½a mðtÞ; ð12Þ dt and rðtÞ ¼ r BðtÞ; ð13Þ where r(t) is the fault detection rate function, and B(t) is the time-variable FRF. Here are three cases of time-variable FRF. CASE 1 (constant FRF): If B(t) is constant in time t, we can obtain BðtÞ ¼ B; 0 < B 6 1: ð14Þ Substitute Eq. (14) into Eq. (13) and solve the differential equation. We have 510 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 1.1 1 1 0.8 0.9 0.6 FRF FRF 0.8 0.4 0.7 0.2 0.6 0 0 1 0 1 Normalized Time Normalized Time (a) DS1 (b) DS2 1.2 1 1 0.8 0.8 0.6 FRF FRF 0.6 0.4 0.4 0.2 0.2 0 0 0 1 0 1 Normalized Time Normalized Time (c) DS3 (d) DS4 1.05 1.2 1 1 0.8 0.95 FRF FRF 0.6 0.9 0.4 0.85 0.2 0.8 0 0 1 0 1 Normalized Time Normalized Time (e) DS5 (f) DS6 Fig. 1. Variation of FRF versus normalized time. Brt mðtÞ ¼ að1 e Þ; ð15Þ From Eq. (15), we can see that m(0) = 0 and m(1) = a. The failure intensity function is dmðtÞ kðtÞ ¼ ¼ aBreBrt : ð16Þ dt CASE 2 (increasing FRF curve): When the software testing process is characterized as a learning process, the impact of envi- ronmental factors on FRF will decrease. Hence, FRF increases with time. We have BðtÞ ¼ 1 ð1 B0 Þekt ; B0 6 BðtÞ 6 1; ð17Þ where B0 is the initial FRF, B0 > 0, and k is the increment parameter, 0 6 k 6 1. Substitute Eq. (17) into Eq. (13) and solve the differential equation. We can obtain ! ðB0 1Þð1ekt Þ r k þt mðtÞ ¼ a 1 e ; ð18Þ From Eq. (18), we can see that m(0) = 0 and m(1) = a. The failure intensity function is dmðtÞ ðB0 1Þð1ekt Þ kðtÞ ¼ ¼ að1 þ ðB0 1Þekt Þrerð k þtÞ : ð19Þ dt CASE 3 (decreasing FRF curve): By contrast, if the impact of environmental factors on FRF continuously increases, the value of FRF will decrease with time. We have C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 511 BðtÞ ¼ B0 ekt ; 0 6 BðtÞ 6 B0 ; ð20Þ where B0 6 1, and k is the decrement parameter, 0 6 k 6 1. Substitute Eq. (20) into Eq. (13) and solve the differential equa- tion. We can obtain ! B0 ð1ekt Þ r k mðtÞ ¼ a 1 e ; ð21Þ and the failure intensity function is B0 ð1ekt Þ dmðtÞ r k kt kðtÞ ¼ ¼ aB0 re : ð22Þ dt From Eq. (21), it is noted that m(0) = 0, but rB0 mð1Þ ¼ a 1 e k 6 a: ð23Þ We can see that m(1) may be less than a. In other words, when the testing efficiency of the testing process is poor (i.e., r or B0 is inferior), or the influence of environmental factors is severe (i.e., k is sharp), the total number of inherent faults in the soft- ware system cannot be eventually detected as testing time approaches infinite. By using the MVF of Eqs. (15), (18), and (21), some useful metrics used to evaluate software quality can then be derived. Two commonly accepted metric are software reliability and mean time between failures (MTBF). The software reliability can be defined as follows [1,2,43] RðDtjtÞ ¼ eðmðtþDtÞmðtÞÞ ; Dt > 0: ð24Þ This equation represents the probability that a software will be successful in the time interval (t, t + Dt]. MTBF is generally defined as the average elapsed time that passes before a failure occurs in a software system. For example, let the MVF be Eq. (21), the instantaneous MTBF and cumulative MTBF can be given as [27,29] 1 1 MTBFI ðtÞ ¼ dmðtÞ ¼ ; ð25Þ B0 ð1ekt Þ dt r k kt aB0 re and t t MTBFC ðtÞ ¼ ¼ : ð26Þ mðtÞ r B0 ð1ekt Þ k að1 e The application of software reliability and MTBF is typically used to trace the quality of the software product [1–3]. Soft- ware projects can improve the development process by collecting these metrics. With determining an objective of software reliability or MTBF, software project managers could analyze the needs and expectations of users. During the testing pro- cess, they can also apply these metrics in determining the amount of software testing time and in investigating the relation- ship between quality and resources, development cost, and schedule requirements. 3.3. Parameter estimation Parameter estimation is considerably important in software reliability prediction. In practice, parameter estimation can be achieved by applying Least Squares Estimation (LSE) and Maximum Likelihood Estimation (MLE) [1–3]. Usually, these two techniques are based on the observed failure data to determine the model parameters. Without the loss of generality, in this section the MVF of Eq. (21) is used as an illustration. The other proposed models can follow this procedure in the same way. For the LSE technique, assume the pairs of data (ti, mi), where mi is the cumulative number of faults detected by time ti for i = 1, 2,... , n, and 0 < t1 < t2 <... < tn. The error measure for the accuracy of fit with the real data is given by X n S ðmðt i Þ mi Þ2 ; ð27Þ i¼1 where m(ti) is the MVF at time ti. The point estimation of the unknown parameters can minimize and differentiate Eq. (27) with respect to model parameters. By setting them to zero, we have kti ! B0 ð1e Þ Pn r k i¼1 2mi 1e @S ¼ 0; a¼ 0 2 1 ; ð28Þ @a B0 ð1e kti Þ Pn r @1 e k A i¼1 2 512 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 and @S @S @S ¼ ¼ ¼ 0: ð29Þ @r @B0 @k The estimated parameters can be obtained by solving Eqs. (28) and (29) simultaneously. For the MLE technique, if we define the expected value number of faults, N(ti), with Poisson distribution, the likelihood function (or joint probability) for NHPP-based models is given as [1,2] Yn ðmðti Þ mðt i1 ÞÞðmi mi1 Þ ðmðti Þmðti1 ÞÞ L PrðNðt1 Þ ¼ m1 ; Nðt 2 Þ ¼ m2 ; :::; Nðtn Þ ¼ mn Þ ¼ e : ð30Þ i¼1 ðmi mi1 Þ! The point estimation of the unknown parameters can maximize Eq.(30). Taking the partial derivatives of the log-likelihood function (natural logarithm of the likelihood function) with respect to model parameters, and setting them equal to zero, we have @ log L m ¼ 0; a¼ n ! ; ð31Þ @a r B0 ð1ektn Þ k 1e @ log L B0 mn ðektn 1Þ Xn ðmi mi1 Þ B0 ui ð1 ekti Þ ui1 1 ekti1 ¼ 0; ! ¼ ; ð32Þ @r B ð1ekt n Þ r 0 k i¼1 kðui1 ui Þ 1e k where kt i rB0 e ui ¼ e k ; ð33Þ and @ log L @ log L ¼ ¼ 0: ð34Þ @B0 @k Solving Eqs. (32)–(35) numerically, we can obtain the point estimates of model parameters. In the next section, we will apply these two techniques to estimate the parameters of the proposed and selected models. 4. Numerical examples 4.1. Failure data and comparison criteria In the following experiments, we choose DS1, DS3, and DS5 from Table 1 to compare the performance of the proposed models with the existing SRGMs, because these datasets are representative of different FRF patterns. Some comparison cri- teria are described in the following: (1) Mean square error (MSE): MSE is used for quantitative comparisons for long-term predictions. It provides a better- understood measure of the differences between actual and predicted values [1,2,19,37], which is given by: 1X n MSE ¼ ½mðti Þ mi 2 : ð35Þ n i¼1 The lower MSE means a smaller fitting error and a better estimated accuracy. (2) Akaike information criterion (AIC): AIC takes account of the statistical goodness-of-fit and penalizes the number of parameters in the prediction system [3,37]. It can be defined as: AIC ¼ 2 log L þ 2P; ð36Þ where log L is the log-likelihood function at its maximum value, and P is the number of free parameters. The lower AIC measure indicates the preferred model. (3) Sign test: The sign test is used to confirm whether one prediction system is significantly better or worse than another. In terms of the LSE, our approach is to test whether the absolute residuals (e.g., r1 and r2) obtained by any two models are significantly different, where the null hypothesis is to test whether |r1| = |r2|. As for the MLE, the maximum likelihood functions (e.g., M1 and M2) for each data point from any two models are calculated for comparison, where the null hypothesis is to test whether M1 = M2. Since the absolute values are positively skewed, and the prediction sys- C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 513 tems are naturally paired samples, a nonparametric method called Wilcoxon Signed Rank Sum Test is used to conduct a sign test. The significance level is set at a = 0.05 (two-sided test). If the p-value is less than the significance level, we could have strong evidence to believe that the two prediction systems are significantly different in making prediction. (4) Boxplot: The boxplot is a graphical technique for exploratory analysis, and can provide a simple means of comparing the predictions from alternative prediction systems. We use this plot to compare the spread of absolute residuals from different models. A boxplot with a small box length or less extreme values has a better prediction capacity. 4.2. Case study 1 – DS1 The estimated parameters of all selected models are listed in Table 2, and the sign test of the LSE and MLE estimations are respectively shown in Table 3 and Table 4. Furthermore, Fig. 2 displays the distribution of the absolute residuals for all mod- els. Three existing SRGMs, including GO, DeS, and InfS models, are used for comparison. Notice that the GO model and Eq. (15) have the same mathematical property in MVF. Hence, the same parameter estimations are derived. It should also be noted that the MSE criterion is only used when the parameters of SRGMs are estimated by LSE; the AIC criterion is only used when the SRGMs are parameterized by MLE. From Table 2, we can see that the Eqs. (15) and (18) are more accurate than the Eq. (21) in both MSE and AIC criteria. Besides, all proposed models obtain smaller MSE and AIC values than the DeS model does, but provide a worse accuracy than the InfS model. For this result, we could further study how much significance to attach to differences in accuracy indicators. In Table 3, the proposed models seem to perform significantly better than the DeS model (p-value < 0.05). In Table 4, the sta- tistical results of Eqs. (18) and (21) also show a significant difference from the GO and DeS models. However, it is noted that the p-value of InfS does not show an enough evidence to perform better than all of the proposed models from Tables 3 and 4. In other words, although there is an accuracy improvement using the InfS model in Table 2, we cannot show statistically significant difference between the InfS and other models. In addition, from Fig. 2 we can see that the DeS and InfS models have a bigger extreme value or a wider box length in the distribution of absolute residuals. With these results above, therefore, all of the proposed models could provide close estimates as compared with the existing SRGMs. Specifically, among the proposed models in Table 2 and in Fig. 2, we find that Eqs. (15) and (18) seem to be preferred to Eq. (21) in this dataset. 4.3. Case study 2 – DS3 Similarly, the estimated parameters and comparison criteria for the selected models using DS3 are summarized in Table 5. The sign test of the LSE and MLE techniques are respectively shown in Tables 6 and 7. Furthermore, the spread of absolute residuals for each model is plotted in Fig. 3. From Table 5, it can be noted that the Eq. (18) has the smallest values of MSE and AIC criteria. Confirming the sign test in Table 6, the statistical result indicates that Eq. (18) is significantly better than the GO, DeS, and InfS models. From Table 7, there is also sufficient evidence to prove that the Eq. (18) has a smaller p-value with respect to some existing models. In addi- tion, from Fig. 3 we can see that Eq. (18) performs the smallest range of interquartile and has no extreme values as compared with other models. Therefore, depending on these results above, Eq. (18) could generally provide the best performance among all of the selected models in this dataset. Table 2 Summary of model parameters and comparisons for DS1. Model LSE MLE a B0 or / MSE a B0 or / AIC r K r k Eq.(15) 600.972 0.907 90.421 598.162 0.912 374.156 3.8E3 4.9E3 Eq.(18) 449.866 0.902 90.979 419.972 0.801 374.726 5E3 0.032 7.3E3 0.026 Eq.(21) 466.735 0.843 93.151 396.795 0.952 380.655 5.4E3 1.01E6 7.2E3 1.2E6 GO 600.972 90.421 598.162 374.156 3.4E3 4.4E3 DeS 153.231 131.135 534.070 385.615 0.043 0.017 InfSa 487.883 2.959 81.538 462.749 11.788 352.875 0.013 0.032 a / is used only for InfS. 514 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 Table 3 Sign test for using LSE (DS1). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) 0.0000* 0.4484 Eq. (18) 0.3204 0.0000* 0.5076 Eq. (21) 0.0748 0.0000* 0.3498 * A significant difference existed between two models at significance level a = 0.05 (two-tailed). Table 4 Sign test for using MLE (DS1). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) 0.0496* 0.9970 Eq. (18) 0.0000* 0.0422* 0.8496 Eq. (21) 0.0000* 0.0390* 0.6120 * A significant difference existed between two models at significance level a = 0.05 (two-tailed). interquartile 40 1.5 interquartile -1.5 interquartile * Extreme o Outlier 1st quartile median 3rd quartile 30 Absolute Residuals 20 10 0 Eq.(15)/GO Eq.(18) Eq.(21) DeS InfS Fig. 2. Boxplot of absolute residuals using LSE (DS1). Table 5 Summary of Model Parameters and Comparisons for DS3. Model LSE MLE a B0 or / MSE a B0 or / AIC r k r k Eq. (15) 154.205 0.993 48.809 166.344 0.996 114.752 0.141 0.118 Eq. (18) 144.308 0.501 33.674 163.390 0.800 112.663 0.189 0.543 0.127 0.748 Eq. (21) 154.390 0.914 48.814 177.549 0.905 114.643 0.153 2.5E-5 0.126 0.018 GO 154.205 48.809 166.344 114.752 0.141 0.118 DeS 144.079 55.289 148.186 121.788 0.341 0.319 InfSa 144.035 0.856 40.735 161.832 0.199 116.669 0.227 0.139 a / is used only for InfS. C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 515 Table 6 Sign test for using LSE (DS3). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) 0.5540 0.8314 Eq. (18) 0.0488* 0.0342* 0.0494* Eq. (21) 0.9434 0.5228 0.8271 * A significant difference existed between two models at significance level a = 0.05 (two-tailed). Table 7 Sign test for using MLE (DS3). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) 0.3812 0.3318 Eq. (18) 0.0528 0.0407* 0.0442* Eq. (21) 0.4348 0.4074 0.3684 * A significant difference existed between two models at significance level a = 0.05 (two-tailed). interquartile -1.5 interquartile 20 1.5 interquartile * Extreme o Outlier 1st quartile median 3rd quartile 15 Absolute Residuals 10 5 0 Eq.(15)/GO Eq.(18) Eq.(21) DeS InfS Fig. 3. Boxplot of absolute residuals using LSE (DS3). 4.4. Case study 3 – DS5 In this case study, DS5 is used to demonstrate the model performance. The parameter estimation and the sign test for all selected models are listed in Tables 8–10. Finally, Fig. 4 graphically summarizes the boxplot of absolute residuals based on LSE techniques. By using LSE, we can see that Eq. (21) has the lowest value of MSE criterion. Likewise, Eq. (21) of the MLE also provides a great improvement compared to other models in the AIC criterion. Besides, from Table 8 we can find that the three proposed models are better than existing models in both MSE and AIC criteria. In Table 9, the statistical results of the sign test confirm that both Eqs. (18) and (21) have a significantly improvement as compared with traditional models (p-value < 0.05), while in Table 10 only Eqs. (15) and (18) are not enough evidence to detect a statistical significance with respect to the GO and the InfS models (p-value > 0.05). Moreover, Fig. 4 shows that all proposed models have less variant absolute residuals against the DeS and the InfS models, especially Eq. (21) has fewer extreme values and smaller box length than the Eqs. (15) and (18) in making prediction. According to comparison results in this dataset, we can conclude that Eq. (21) seems to have better per- formance overall. 5. Software release time management During the software testing process, managers usually need to determine when to release software regarding reliability or cost objectives [29,43–45]. In our model, the time-variable FRF is used to characterize the effect of environmental factors on the testing process. Thus, we mainly study the sensitivity of time-variable FRF. Notice that when changing the value of time- variable FRF, the other parameters are fixed. In the following, the MVF of Eq. (18) is used as the selected model, then several software release policies are discussed. 516 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 Table 8 Summary of model parameters and comparisons for DS5. Model LSE MLE a B0 or / MSE a B0 or / AIC r k r k Eq. (15) 598.777 0.917 125.974 588.770 0.916 806.038 5.4E3 5.6E3 Eq. (18) 598.246 0.997 125.975 571.039 0.800 804.750 5.1E3 5.1E9 5.5E3 0.090 Eq. (21) 598.350 0.965 122.869 589.183 0.997 787.367 5.2E3 2.2E10 5.1E3 2.8E10 GO 598.777 125.974 588.770 806.038 5E3 5.1E3 DeS 400.141 529.591 433.474 873.898 0.021 0.019 InfS* 598.777 0.011 127.851 521.041 0.501 808.669 5E-3 8.1E-3 * / is used only for InfS. Table 9 Sign test for using LSE (DS5). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) – 0.0000* 0.1012 Eq. (18) 0.0128* 0.0000* 0.0396* Eq. (21) 0.0000* 0.0000* 0.0000* * A significant difference existed between two models at significance level a = 0.05 (two-tailed). Table 10 Sign test for using MLE (DS5). vs. (p-value) Eq. (15)/GO DeS InfS Eq. (15) 0.0006* 0.2748 Eq. (18) 0.1798 0.0004* 0.8696 Eq. (21) 0.0040* 0.0000* 0.0284* * A significant difference existed between two models at significance level a = 0.05 (two-tailed). interquartile -1.5 interquartile 60 * Extreme 1.5 interquartile o Outlier 1st quartile median 3rd quartile 50 40 Absolute Residuals 30 20 10 0 Eq.(15)/GO Eq.(18) Eq.(21) DeS InfS Fig. 4. Boxplot of absolute residuals using LSE (DS5). 5.1. Software release time based on reliability requirement Reliability requirement is one important measure that can help developers to manage release time and testing cost. Firstly, the release time of the software can be simply determined by meeting the reliability requirement [1,37,45]. Let T C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 517 be the length of testing time. From Eq. (24), if a reliability requirement R0 is set, we can then obtain an optimal release time TR satisfying R(DT|T) = R0. For illustrative purpose, we set R0 = 0.9 and DT = 1 as the reliability objective in the following experiment. Notice that the parameters based on the MVF of Eq. (18) are estimated by the MLE from Table 5. For example, given a = 163.390, r = 0.127, B0 = 0.800, and k = 0.748, TR can be estimated by 27.615 weeks. Following this estimation pro- cedure, the numerical result of changing B0 and k with respect to optimal release time is shown in Fig. 5. From Fig. 5, it can be seen that the optimal release time TR decreases with the increase of B0 and k respectively. In fact, according to Section 3.2, the parameter B0 can be reinterpreted as the initial testing efficiency in the beginning of the testing process, and the parameter k can be represented as the learning efficiency of testers. If these two parameters have a big va- lue, this implies that many faults can be removed quickly. We can then expect that the release time of software development will be shortened. Besides, we can also see that the improvement of the optimal release time TR can be greatly reduced from 34 to 30 weeks. Thus, the interaction of these two parameters can greatly affect the solution to the software release time based on the reliability requirement. In reality, we know that if an overestimation of a parameter causes an underestimation of the release time, this may seriously result in more failures experienced by customers. Therefore, more attention should be paid on the estimated accuracy of these two parameters. 5.2. Software release time based on cost requirement On the other hand, another constraint for determining software release time is cost requirement. Let C(T) be the cost func- tion of a software system at time T. This function can be formulated as [27–29,45]: CðTÞ ¼ C 1 mðTÞ þ C 2 ðmð1Þ mðTÞÞ þ C 3 T; T P 0; ð37Þ where C1 is the expected cost of removing a fault during the testing phase, C2 is the expected cost of removing a fault during the operation phase, C2 > C1 > 0, and C3 is the expected cost per unit time for testing, C3 > 0. By taking the partial derivative of Eq. (38) with respect to T and equating it to zero, an optimal release time TC that minimizes the cost function can be numer- ically obtained. Similarly, the model parameters of Eq. (18) can be estimated by the MLE from Table 5. For instance, given C1 = $3000, C2 = $9000, C3 = $3830, a = 163.390, r = 0.127, B0 = 0.800, and k = 0.748, we can obtain TC as 27.622 weeks and C(TC) as $626,032. Hence, we can change the time-variable FRF to observe the optimal release time and cost function. The graphic result is shown in Fig. 6. From Fig. 6a and b, it can be seen that when B0 and k increase, the optimal release time and minimum cost are gradually decreased. However, it should be noted that the increment of parameters does not reflect a corresponding percentage of re- lease time or cost decrement. Thus, this sensitivity analysis can somehow provide a guideline for the trade-off between cost and effective problem. Furthermore, we can find that the value of optimal release time based on cost requirement is im- proved from 34 to 30 weeks, and its minimum cost is decreased from $660,000 to $630,000. Therefore, these two parameters still play an important role in determining the release time of cost requirement. We should try not to overestimate these two parameters. 5.3. Software release time based on reliability and cost requirement Finally, the release time of a software system can be extended by satisfying both the reliability requirement and mini- mum cost requirement. With these two constraints, the optimization problem can be formulated as follows [2,29,44]: Minimize CðTÞ; ð38Þ Subject to RðDTjTÞ P R0 : Let TC be the solution of the minimum cost function C(TC), and TR be the solution of R(DT|T) = R0. It can be proven that the optimal release time Tmax is determined by [27,28,43] 34 0.1 TR 32 0.08 (weeks) 30 0.06 0.7 0.75 0.04 k 0.8 0.02 B0 0.85 0.9 0 Fig. 5. Sensitivity analysis between optimal release time (TR) and FRF. 518 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 34 660000 0.1 0.1 32 650000 TC 0.08 C(TC) 640000 0.08 30 (weeks) 0.06 630000 0.06 0.7 0.7 0.75 0.04 k 0.75 0.04 k 0.8 0.02 0.8 0.02 B0 0.85 0.85 0.9 0 B0 0.9 0 (a) Sensitivity Analysis between Optimal Release (b) Sensitivity Analysis between Cost (C(TC)) and FRF. Time (TC) and FRF. Fig. 6. The optimal release time based on cost requirement. T max ¼ MaxfT R ; T C g: ð39Þ Likewise, the model parameters of Eq. (18) can be estimated by the MLE from Table 5. From Sections 5.1 and 5.2, we can obtain TR as 27.615 weeks and TC as 27.622 weeks. Thus, using Eq. (39), Tmax can be estimated by 27.622 weeks. From Eq. (39), we can also combine Figs. 5 and 6 to derive the optimum release time Tmax and the cost C(Tmax). Fig. 7 plots the values of Tmax and C(Tmax) by changing the parameters B0 and k simultaneously. From Fig. 7, there is a similar trend to the previous results of Figs. 5 and 6. We can find that when the value of FRF in- creases, both the optimal release time Tmax and the C(Tmax) tend to decrease. Furthermore, it is noted that in Fig. 7a the black marks are dominated by the release time TR, while the grey marks are occupied by the release time TC. Similarly, the cost function of Fig. 7b also has the same domination of marks. By comparing the effect of these two requirements, the cost requirement seems to take up more marks in Fig. 7. In short, we can conclude that the interaction of changing both B0 and k really has a great influence on the optimal release time based on the reliability and cost requirement. Therefore, the influence and accuracy of these two parameters should be of more concern in the software reliability models. 5.4. Software reliability analysis tool CARATS In this paper we have taken three patterns of time-variable FRF into software reliability modeling. However, the param- eters of proposed models are generally unknown and have to be estimated based on collected failure data. It is noted that the parameter estimation may require complex calculations and generally cause computational overheads when the number of parameters is increased. However, this kind of calculations can be solved numerically. In the past, some Computer-Aided Software Engineering (CASE) tools have been proposed and widely applied to software reliability analysis [2,3,5], such as ROBUST (Reliability of Basic and Ultra-reliable Software sysTem), SMERFS (Statistical Modeling and Estimation of Reliability =TR =Tc =TR =Tc 34 660000 0.1 0.1 Tmax 32 C(Tmax) 650000 0.08 640000 0.08 (weeks) 30 630000 0.06 0.06 k 0.7 0.7 0.04 0.04 k 0.75 0.75 0.8 0.8 0.02 0.02 B0 0.85 B0 0.85 0 0.9 0 0.9 (a) Sensitivity Analysis between Optimal Release (b) Sensitivity Analysis between Cost (C(Tmax)) and Time (Tmax) and FRF. FRF. Fig. 7. The optimal release time and cost based on reliability and cost requirements. C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 519 Fig. 8. Some snapshots of CARATS. Functions for Systems Software Suite), CASRE (Computer-Aided Software Reliability Estimation), PISRAT (Proportional Inten- sity-Based Software Reliability Assessment Tool), etc. We have currently developed an automatic CASE tool named Computer-Aided Reliability Assessment Tool for Software (CARATS). The feature of CARATS includes failure data collection, model’s parameter estimation, model’s fitting analysis. Note that the latest version of CARATS has integrated both traditional SRGMs and our proposed models to assess software reliability. Besides, summary estimates from CARATS can facilitate the assessment of release time policies for managers and engineers, and some management metrics of the program under test can be shown in both tabular and graphical fashion for users. Fig. 8 displays some snapshots of CARATS. 6. Conclusions An accurate SRGM is necessary in that it allows for the analysis of reliability requirements and resource allocations. In this paper, we integrated a time-variable FRF into software reliability modeling and compared the accuracy of the proposed mod- els. Through a survey of different real datasets, we described the time-variable FRF with constant, increasing, and decreasing patterns corresponding to influential degrees of environmental factors. Experimental results show that the proposed models give a better fit to the observed data and are significantly better than the traditional SRGMs. This result also demonstrates that the proposed models can be successfully used to formulate different patterns of environmental factors in SRGMs. In addition, the sensitivity analysis of software release policies indicates that adjusting the time-variable FRF may significantly affect the release time and development cost. Thus, we suggest the accuracy of time-variable FRF should be carefully esti- mated. These results can provide further information for management to determine the trade-off between testing efficiency, resource allocation, and release time problems. With the use of the proposed models, managers can make more reasonable decisions in the software development process. Acknowledgements The work described in this paper was supported by the National Science Council, Taiwan, under Grants NSC 97-2221-E- 469007-052-MY3, NSC 98-2221-E-007-067, and NSC 99-2220-E-007-022. Appendix In the appendix, two datasets used in this paper are presented. The detailed information about these two datasets can be referenced for further study at the SourceForge website. Originally, these two datasets had belonged to time-between- failures data. Each failure was recorded with the failure open date and the failure close date. We first preprocessed these two datasets into the failure-count data. The preprocessed steps for each dataset were to cumulate failure times and then to count the number of failures whose cumulative times occur within a specified time interval. After that, we respectively de- rived the failure data for DS1 and DS2, as shown in Tables A1 and A2. 520 C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 Table A1 The failure data of WebERP project (DS1). Time unit Detected Corrected Time unit Detected Corrected Time unit Detected Corrected (month) failure fault (month) failure fault (month) failure fault 1 1 1 21 53 50 41 72 65 2 7 7 22 53 50 42 74 67 3 7 7 23 53 50 43 74 67 4 9 9 24 53 50 44 80 67 5 9 9 25 53 53 45 84 69 6 9 9 26 53 53 46 84 76 7 12 11 27 53 53 47 84 79 8 18 15 28 53 53 48 84 82 9 18 15 29 53 53 49 85 83 10 18 18 30 53 53 50 86 84 11 18 18 31 53 53 51 89 87 12 21 21 32 54 54 52 90 90 13 22 22 33 56 54 53 90 90 14 22 22 34 58 56 54 92 91 15 27 26 35 59 56 55 108 106 16 30 28 36 60 56 56 120 119 17 45 39 37 63 56 57 128 126 18 47 44 38 70 60 58 129 128 19 49 47 39 71 61 59 139 136 20 51 47 40 71 64 60 146 136 Table A2 The Failure Data of OpenProj Project (DS2). Time unit Detected Corrected Time unit Detected Corrected Time unit Detected Corrected (week) failure fault (week) failure fault (week) failure fault 1 9 0 18 55 31 35 82 41 2 15 2 19 57 31 36 83 41 3 19 11 20 58 31 37 83 44 4 24 12 21 61 31 38 84 44 5 28 12 22 61 31 39 84 44 6 29 13 23 64 33 40 85 44 7 32 14 24 66 33 41 85 45 8 36 15 25 67 33 42 87 45 9 36 20 26 69 33 43 87 46 10 40 21 27 70 33 44 87 46 11 41 22 28 73 33 45 89 47 12 41 22 29 74 33 46 89 62 13 45 24 30 75 36 47 91 62 14 47 25 31 75 38 48 91 65 15 49 30 32 78 38 49 94 65 16 52 30 33 79 40 17 52 31 34 80 41 References M. Xie, Software Reliability Modeling, World Scientific Publishing, 1991. H. Pham, Software Reliability, Springer, 2000. M.R. Lyu, Handbook of Software Reliability Engineering, McGraw-Hill, 1996. P. Carnes, Software reliability in weapon systems, in: Proceedings of the 8th International Symposium On Software Reliability Engineering (ISSRE 1997), Albuquerque, NM, USA, November 1997, pp. 114–115. N.F. Schneidewind, T.W. Keller, Application of reliability models to the space shuttle, IEEE Softw. 9 (4) (1992) 28–33. V. Almering, M. van Genuchten, G. Cloudt, P.J.M. Sonnemans, Using software reliability growth models in practice, IEEE Softw. 24 (6) (2007) 82–88. W.K. Ehrlich, K. Lee, R.H. Molisani, Applying reliability measurements: a case study, IEEE Softw. 7 (2) (1990) 56–64. W.D. Jones, Reliability models for very large software systems in industry, in: Proceedings of the Second International Symposium on Software Reliability Engineering (ISSRE 1991), Austin, TX, USA, May 1991, pp. 35–42. J.D. Musa, A theory of software reliability and its application, IEEE Trans. Softw. Eng. SE-1 (3) (1975) 312–327. J.D. Musa, A. Iannino, K. Okumoto, Software Reliability: Measurement, Prediction, Application, McGraw-Hill, 1987. J.D. Musa, The measurement and management of software reliability, Proc. IEEE 68 (9) (1980) 1131–1143. J.D. Musa, Software Reliability Engineering: More Reliable Software, Faster and Cheaper, second ed., Authorhouse, 2004. P.A. Hamilton, J.D. Musa, Measuring reliability of computation center software, in: Proceedings of the Third International Conference on Software Engineering (ICSE 1978), Atlanta, GA, USA, May 1978, pp. 29–36. M.J. Fries, Software Error Data Acquisition, Technical Report, Rome Air Development Center, 1977, RADC-TR-77-130. D.M. Weiss, Evaluating Software Development by Analysis of Change Data, Computer Science Technical Report, University of Maryland, 1981, TR-1120. V.R. Basili, B.T. Perricone, Software errors and complexity: an empirical investigation, Commun. ACM 27 (1) (1984) 42–52. M.A. Friedman, P.Y. Tran, P.L. Goddard, Reliability of Software Intensive Systems, Noyes Publications, 1995. C.-J. Hsu et al. / Applied Mathematical Modelling 35 (2011) 506–521 521 K. Wu, Y.K. Malaiya, The effect of correlated faults on software reliability, in: Proceedings of the 4th International Symposium on Software Reliability Engineering (ISSRE 1993), Denver, CO, USA, November 1993, pp. 80–89. M.C. Chen, H.P. Wu, H.J. Shyur, Analyzing software reliability growth model with imperfect-debugging and change-point by genetic algorithms, in: Proceedings of the 29th International Conference on Computers and Industrial Engineering (ICC&IE 2001), Montreal, Canada, November 2001, pp. 520– 526. C. Jones, Software defect-removal efficiency, IEEE Comput. 29 (4) (1996) 94–95. X. Zhang, M.Y. Shin, H. Pham, Exploratory analysis of environmental factors for enhancing the software reliability assessment, J. Syst. Softw. 57 (1) (2001) 73–78. K.C. Chiu, Y.S. Huang, T.Z. Lee, A study of software reliability growth from the perspective of learning effects, Reliab. Eng. Syst. Saf. 93 (10) (2008) 1410– 1421. P.K. Kapur, D.N. Goswami, A. Bardhan, O. Singh, Flexible software reliability growth model with testing effort dependent learning process, Appl. Math. Model. 32 (7) (2008) 1298–1307. A.L. Goel, Software reliability models: assumptions, limitations, and applicability, IEEE Trans. Softw. Eng. 11 (12) (1985) 1411–1423. C.Y. Huang, M.R. Lyu, S.Y. Kuo, A unified scheme of some nonhomogenous poisson process models for software reliability estimation, IEEE Trans. Softw. Eng. 29 (3) (2003) 261–269. M. Ohba, X.M. Chou, Does imperfect debugging affect software reliability growth? in: Proceedings of the 11th IEEE International Conference on Software Engineering (ICSE 1989), Pittsburgh, PA, USA, May 1989, pp. 237–244. C.T. Lin, C.Y. Huang, Enhancing and measuring the predictive capabilities of testing-effort dependent software reliability models, J. Syst. Softw. 81 (6) (2008) 1025–1038. Y.P. Wu, Q.P. Hu, M. Xie, S.H. Ng, Modeling and analysis of software fault detection and correction process by considering time dependency, IEEE Trans. Reliab. 56 (4) (2007) 629–642. Y. Tamura, S. Yamada, A flexible stochastic differential equation model in distributed development environment, Eur. J. Oper. Res. 168 (1) (2006) 143– 152. T.M. Khoshgoftaar, T.G. Woodcock, Software reliability model selection: a case study, in: Proceedings of the Second IEEE International Symposium on Software Reliability Engineering (ISSRE 1991), Austin, TX, USA, May 1991, pp. 183–191. M. Ohba, Software reliability analysis models, IBM J. Res. Dev. 28 (4) (1984) 428–443. Y.K. Malaiya, A. von Mayrhauser, P.K. Srimani, An examination of fault exposure ratio, IEEE Trans. Softw. Eng. 19 (11) (1993) 1087–1094. N. Li, Y.K. Malaiya, Fault exposure ratio estimation and applications, in: Proceedings of the Seventh International Symposium on Software Reliability Engineering (ISSRE 1996), White Plains, NY, USA, November 1996, pp. 372–381. J.D. Musa, Rationale for fault exposure ratio K, ACM SIGSOFT Softw. Eng. Notes 16 (3) (1991) 79. H.J. Shyur, A stochastic software reliability model with imperfect-debugging and change-point, J. Syst. Softw. 66 (2) (2003) 135–141. Y.C. Chang, C.T. Liu, A generalized JM model with applications to imperfect debugging in software reliability, Appl. Math. Model. 33 (9) (2009) 3578– 3588. C.Y. Huang, C.T. Lin, Software reliability analysis by considering fault dependency and debugging time lag, IEEE Trans. Reliab. 55 (3) (2006) 436–450. M. Xie, Q.P. Hu, Y.P. Wu, S.H. Ng, A study of the modeling and analysis of software fault-detection and fault-correction processes, Qual. Reliab. Eng. Int. 23 (4) (2007) 459–470. J.H. Lo, C.Y. Huang, An integration of fault detection and correction processes in software reliability analysis, J. Syst. Softw. 79 (9) (2006) 1312–1323. G. Keller, Statistics for Management and Economics, eighth ed., South-Western College, 2008. SourceForge.net, An Open Source Software Website. 2008. Available from: (cited last accessed 11.12.08). K.Z. Yang, An Infinite Server Queueing Model for Software Readiness Assessment and Related Performance Measures, Ph.D. Dissertation, Department of Electrical Engineering and Computer Sciencein, Syracuse University, NY, USA, 1996. C.Y. Huang, Cost-reliability-optimal release policy for software reliability models incorporating improvements in testing efficiency, J. Syst. Softw. 77 (2) (2005) 139–155. C.Y. Huang, C.T. Lin, Analysis of software reliability modeling considering testing compression factor and failure-to-fault relationship, IEEE Trans. Comput. 59 (2) (2010) 283–288. X. Li, M. Xie, S.H. Ng, Sensitivity Analysis of Release Time of Software Reliability Models Incorporating Testing Effort with Multiple Change-Points, Applied Mathematical Modelling, in press. doi:10.1016/j.apm.2010.1003.1000. C.C. Chen, C.T. Lin, H.H. Huang, S.W. Huang, C.Y. Huang, CARATS: a computer-aided reliability assessment tool for software based on object-oriented design, in: Proceedings of IEEE Region 10 Conference (TENCON 2006), Hong Kong, China, November 2006, pp. 1–4.