Test case prioritization based on data reuse for Black-box environments

Texto

(1)Pós-Graduação em Ciência da Computação. “TEST CASE PRIORITIZATION BASED ON DATA REUSE FOR BLACK-BOX ENVIRONMENTS” Por. Lucas Albertins de Lima Dissertação de Mestrado. Universidade Federal de Pernambuco posgraduacao@cin.ufpe.br www.cin.ufpe.br/~posgraduacao. RECIFE, AGOSTO/2009.

(2) Universidade Federal de Pernambuco CENTRO DE INFORMÁTICA PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO. Lucas Albertins de Lima. “TEST CASE PRIORITIZATION BASED ON DATA REUSE FOR BLACK-BOX ENVIRONMENTS”. Este trabalho foi apresentado à Pós-Graduação em Ciência da Computação do Centro de Informática da Universidade Federal de Pernambuco como requisito parcial para obtenção do grau de Mestre em Ciência da Computação.. ORIENTADOR(A): Prof. PhD. Augusto Cézar Alves Sampaio CO-ORIENTADOR(A): Prof. PhD. Juliano Manabu Iyoda. RECIFE, AGOSTO/2009.

(3) Lima, Lucas Albertins de Test case prioritization based on data reuse for Black-box environments / Lucas Albertins de Lima. Recife: O Autor, 2009. xii, 100 folhas : fig., tab. Dissertação (mestrado) – Universidade Federal de Pernambuco. CIn. Ciência da Computação, 2009. Inclui bibliografia e apêndice. 1. Engenharia de software. 005.1. CDD (22. ed.). I. Título. MEI2009- 114.

(4)

(5) To my parents, the most supportive and lovely persons a man could ever desire..

(6) AGRADECIMENTOS. Primeiramente gostaria de agradecer a Deus por ter me proporcionado a realiza¸caõ deste trabalho e por ter me guiado durante toda a sua realiza¸caõ. Em seguida as pessoas mais importantes da minha vida, meus pais José Nivaldo e Ana Maria, por sempre acreditarem que eu posso dar mais um passo adiante, e por estarem sempre ao meu lado me ajudando e me dando for¸ca sempre que preciso. A Pâmela minha namorada, pela sua compreensão e incentivo durante o desenvolvimento deste trabalho. A toda a minha fam´ılia, por sempre torcerem por mim e me incentivarem desde de que me dou por gente. Aos meus orientadores, Augusto Sampaio e Juliano Iyoda, por toda compreensão e dedica¸caõ que tornaram poss´ıvel a concretiza¸caõ deste trabalho. Ao professor Alexandre Mota por ter sido o padrinho deste trabalho nos indicando o problema a ser re` professora Flávia Barros pelo embasamento teórico necessário na constru¸cão solvido. A das heur´ısticas. Ao colega Dante Torres pela ajuda no algoritmo da permuta¸caõ. Ao colega e professor Eduardo Aranha por sua contribui¸caõ em rela¸caõ a todos os estudos emp´ıricos. A Ricardo Sales, Ta´ıza Montenegro e Thaysa Paiva pelo desenvolvimento da técnica com algoritmos genéticos e realiza¸cão de experimentos com heur´ısticas. A André Lacerda e todos os testadores envolvidos na execu¸caõ do experimento controlado. Ao CNPq pelo incentivo financeiro através da bolsa de estudos. Ao programa Motorola/BTC (Brazil Test Center) que proporcionou um ambiente industrial para aplicabilidade e valida¸caõ deste trabalho. Finalmente, a todos os colegas do grupo de pesquisa do programa Motorola/BTC pelo apoio dado durante o desenvolvimento desta pesquisa e a todos os colegas do programa de pós-gradua¸cão do Centro de Informática da UFPE.. iv.

(7) Beware of bugs in the above code; I have only proved it correct, not tried it. —DONALD KNUTH.

(8) RESUMO. Prioriza¸caõ de casos de teste é um método que visa ordenar testes para obter ganhos de acordo com critérios espec´ıficos. Este trabalho propõe técnicas de prioriza¸caõ que visam reduzir o tempo gasto na execu¸caõ manual de casos de teste diminuindo o esfor¸co durante a prepara¸caõ dos dados; as melhores sequências de teste são aquelas que reusam mais dados. Estas técnicas foram aplicadas em um ambiente de teste para telefones celulares, no qual os testes são criados a partir de requisitos e executados manualmente. Quatro técnicas são propostas, incluindo uma abordagem exploratória baseada na gera¸caõ de permuta¸co˜es de testes e três heur´ısticas: Gulosa, Simulated Annealing e Algoritmo Genético. A técnica de permuta¸caõ sempre retorna as melhores sequências, no entanto, ela só pode ser aplicada para suites de teste de tamanho pequeno devido à complexidade computacional do algoritmo. Já as heur´ısticas podem, em princ´ıpio, ser aplicadas para suites de teste arbitrárias; contudo, não se pode garantir que as melhores sequências sejam sempre produzidas. Foi desenvolvida uma ferramenta que mecaniza o processo de prioriza¸cão ajudando os testadores a registrar informa¸cões, executar prioriza¸co˜es e escolher sequências a partir dos resultados retornados. Estudos emp´ıricos foram realizados, comparando a técnica de permuta¸cão com a técnica de prioriza¸cão existente, na qual testes são priorizados manualmente baseados em uma heur´ıstica que usa uma estrutura de a´rvore, como também, conhecimento e intui¸cão do testador. Os resultados mostram que a técnica de permuta¸cão reduziu aproximadamente em 25-30% o tempo de configura¸caõ e 13.5% o tempo total (configura¸caõ e procedimentos). Experimentos com nossas heur´ısticas mostraram que elas têm efetividade similar para pequenas e grandes suites de teste. As técnicas propostas trazem a resultados significativos não só na execu¸cão das sequências, mas também nas suas gera¸cões que são automatizadas pela ferramenta. Palavras-chave: Teste de Software, Prioriza¸cão de Casos de Teste, Reuso de Dados vi.

(9) ABSTRACT. Test Case Prioritization is an approach that aims to order test cases to obtain gains according to specific criteria. This work proposes test case prioritization techniques aiming to decrease the time spent in manual execution by reducing the effort of the data preparation needed for each test case; the better sequences of tests are those that reuse more data. We applied these techniques in a mobile phone testing environment where tests are manually executed and designed based on requirements. Four techniques are proposed, including one exploratory approach based on permutation generation and three heuristics: Greedy, Simulated Annealing and Genetic Algorithm. The permutation technique always yields the best sequences, however it can only be applied to a test suite of limited size due to the computational complexity of the algorithm. The heuristics can, in principle, be applied to arbitrary test suites, however there is no guarantee that the best sequences will be produced. We implemented a tool that mechanizes the prioritization process helping testers to register information, execute the prioritization techniques and choose from sequences yielded as results. Empirical studies were performed, comparing the permutation approach to the existing prioritization technique where the test cases are prioritized manually based on a heuristic that uses a tree structure and, knowledge and intuition from the testers. Results show that the permutation technique reduced approximately 25-30% of configuration time and 13.5% of total execution time (configuration and procedure time). Experiments regarding our heuristics have shown they have similar effectiveness when applied to small and larger test suites. Our techniques yield significant results not just in the execution sequence but also in the sequence generation, which is automated by our tool. Keywords: Software Testing, Test Case Prioritization, Data Reuse. vii.

(10) CONTENTS. Chapter 1—Introduction 1.1. Organization. 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Chapter 2—Background 2.1. 2.2. 6. Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.1.1. Levels of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.1.2. Test Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 2.1.3. Testing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. Test Case Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . .. 12. Chapter 3—Related Work 3.1. 3.2. 3.3. 14. White-Box Test Case Prioritization Techniques. . . . . . . . . . . . . . .. 14. 3.1.1. Prioritizing Test Cases For Regression Testing . . . . . . . . . . .. 14. 3.1.2. Other White-Box Initiatives . . . . . . . . . . . . . . . . . . . . .. 17. Black-Box Test Case Prioritization Techniques . . . . . . . . . . . . . . .. 18. 3.2.1. Test Case Prioritization for Black Box Testing . . . . . . . . . . .. 18. 3.2.2. The Automatic Generation of Load Test Suites and the Assessment of the Resulting Software . . . . . . . . . . . . . . . . . . . . . . .. 20. 3.2.3. Requirements-Based Test Case Prioritization . . . . . . . . . . . .. 21. 3.2.4. A Test Execution Sequence Study for Performance Optimization .. 23. Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26. Chapter 4—Permutation Technique for Black-Box Test Case Prioritization 4.1. 4. The Permutation Technique . . . . . . . . . . . . . . . . . . . . . . . . . viii. 28 30.

(11) ix. CONTENTS. 4.1.1. Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Chapter 5—Empirical Studies 5.1. 43. 5.1.1. Experiment Planning . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 5.1.1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44. 5.1.1.2. Participants . . . . . . . . . . . . . . . . . . . . . . . . .. 44. 5.1.1.3. Experimental Material . . . . . . . . . . . . . . . . . . .. 44. 5.1.1.4. Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 45. 5.1.1.5. Experiment Design . . . . . . . . . . . . . . . . . . . . .. 46. Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 5.1.2.1. Case Study . . . . . . . . . . . . . . . . . . . . . . . . .. 46. Empirical Study 2 - Controlled Experiment (ES2-CE) . . . . . . . . . . .. 49. 5.2.1. Experimental Planning . . . . . . . . . . . . . . . . . . . . . . . .. 50. 5.2.1.1. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 5.2.1.2. Participants . . . . . . . . . . . . . . . . . . . . . . . . .. 51. 5.2.1.3. Experimental Material . . . . . . . . . . . . . . . . . . .. 51. 5.2.1.4. Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 52. 5.2.1.5. Hypotheses and variables . . . . . . . . . . . . . . . . .. 53. Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. 5.2.2.1. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . .. 56. Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. 5.2.3.1. Preparations . . . . . . . . . . . . . . . . . . . . . . . .. 58. 5.2.3.2. Deviations . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. 5.2.4.1. Total configuration time . . . . . . . . . . . . . . . . . .. 60. 5.2.4.2. Total procedure time . . . . . . . . . . . . . . . . . . . .. 60. 5.2.4.3. Total execution time . . . . . . . . . . . . . . . . . . . .. 61. Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61. 5.2.5.1. Evaluation of results and implications . . . . . . . . . .. 61. 5.2.5.2. Threats to Validity . . . . . . . . . . . . . . . . . . . . .. 63. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64. 5.2.2 5.2.3. 5.2.4. 5.2.5. 5.3. 42. Empirical Study 1 - Case Study (ES1-CS) . . . . . . . . . . . . . . . . .. 5.1.2 5.2. 34.

(12) x. CONTENTS. Chapter 6—Prioritization Heuristics. 65. 6.1. Greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66. 6.2. Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67. 6.3. Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71. 6.4. Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 72. 6.4.1. Genetic Algorithm Parametrization . . . . . . . . . . . . . . . . .. 73. 6.4.2. Heuristic Techniques Comparison . . . . . . . . . . . . . . . . . .. 75. Chapter 7—Mechanization 7.1. 7.2. 77. Software Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 7.1.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 7.1.1.1. Core Module . . . . . . . . . . . . . . . . . . . . . . . .. 80. 7.1.1.2. UI Module . . . . . . . . . . . . . . . . . . . . . . . . .. 82. Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83. 7.2.1. Add Test Case screen . . . . . . . . . . . . . . . . . . . . . . . . .. 83. 7.2.2. Prioritization screen . . . . . . . . . . . . . . . . . . . . . . . . .. 84. Chapter 8—Concluding Remarks. 87. 8.1. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 89. 8.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. Appendix A—appendix A.1 Test Suites Used in the Controlled Experiment . . . . . . . . . . . . . . .. 92 92.

(13) LIST OF FIGURES. 3.1. The calculation of M0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20. 3.2. The calculation of M1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20. 3.3. The weight prioritization equation [Sri04]. . . . . . . . . . . . . . . . . .. 22. 3.4. An example of an ordered tree using the approach [RBJ07]. . . . . . . . .. 25. 4.1. An example using the reuse function. . . . . . . . . . . . . . . . . . . . .. 31. 4.2. Possible phone combinations. . . . . . . . . . . . . . . . . . . . . . . . .. 32. 4.3. An example keeping the phone state. . . . . . . . . . . . . . . . . . . . .. 33. 4.4. An example using pruning mechanism. . . . . . . . . . . . . . . . . . . .. 35. 4.5. Three equivalent test cases. . . . . . . . . . . . . . . . . . . . . . . . . .. 37. 4.6. Reducing test set with equivalence class. . . . . . . . . . . . . . . . . . .. 38. 4.7. The transitive Closure mechanism. . . . . . . . . . . . . . . . . . . . . .. 39. 4.8. Example of cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 4.9. Cluster as a representative test case. . . . . . . . . . . . . . . . . . . . .. 41. 5.1. An example of a Latin square used in our experiment. . . . . . . . . . . .. 56. 5.2. Results from the ANOVA with respect to the total configuration time. . .. 60. 5.3. Results from the ANOVA with respect to the total procedure time. . . .. 61. 5.4. Results from the ANOVA with respect to the total execution time. . . . .. 62. 6.1. An example of successors from a state in our simulated annealing algorithm. 70. 7.1. The Architecture of the Prioritization Tool. . . . . . . . . . . . . . . . .. 79. 7.2. Class diagram of the entities submodule. . . . . . . . . . . . . . . . . . .. 81. 7.3. The screen for data insertion. . . . . . . . . . . . . . . . . . . . . . . . .. 82. 7.4. The screen for test case insertion. . . . . . . . . . . . . . . . . . . . . . .. 84. 7.5. A prioritization execution with the stateful permutation. . . . . . . . . .. 85. xi.

(14) LIST OF TABLES. 3.1. Subjects used in the experiments [RUCH01]. . . . . . . . . . . . . . . . .. 17. 3.2. An example using PORT [Sri04]. . . . . . . . . . . . . . . . . . . . . . .. 22. 3.3. Test case representation and their relationship [RBJ07]. . . . . . . . . . .. 25. 4.1. Feasibility study results. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 5.1. An example of 2 lists with random orders of execution containing 4 input data generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 5.2. Execution times of configuration data collected in ES1-CS (in seconds). .. 47. 5.3. Comparison among approaches. . . . . . . . . . . . . . . . . . . . . . . .. 48. 5.4. Dependent and independent variables of the experiment. . . . . . . . . .. 54. 5.5. Results from the Latin squares with respect to the total configuration time and the total procedure time. . . . . . . . . . . . . . . . . . . . . . . . .. 6.1. 59. Best parameters for the Genetic Algorithm found in the experiments performed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. Results of experiments executed with different sets of test cases. . . . . .. 76. A.1 The test suite I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92. A.2 The test suite II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 93. A.3 Test suites sorted according to each prioritization technique. . . . . . . .. 93. 6.2. xii.

(15) CHAPTER 1. INTRODUCTION. Software quality is an essential discipline of software engineering, strictly involved with product quality and customer satisfaction. As quality is a measure of confidence in a product, software customers demand a trustworthy reliability through the application of such a measure. In particular, there is a critical need to reduce cost, effort, and time-tomarket of software. Hence, the challenge is to meet customer’s expectations through the application of quality principles in the software development process. Software quality disciplines aim to measure if a software is well designed (quality of design), and how well the software conforms to its requirements (quality of conformance) [Pre05]. One of the most used methods to assure quality of conformance is software testing. In this context, a test can be defined as an action for exercising the software, where the objective is to find application faults. It has become a normal part of modern software engineering due to its efficiency (when applied correctly) and relatively cheaper cost compared to other approaches (formal validation, reviews, debugging). However, its cost can reach more than 50% of the total cost of the software development process [MSBT04]. Therefore, any effort to optimize the creation and the execution of test cases could help companies to solve their problem concerning the lack of resources that sometimes affects the progress of the testing process. One approach to the creation problem is to automatically generate test cases with a mechanized process, reducing the time spent creating tests. Regarding the optimization of test case execution, there are several approaches to a large variety of problems (some of them are presented in Chapter 2). For example, test selection aims to choose the right set of tests from a larger suite, following some criteria previously specified, allowing testers to reach some the goal related to the selected criteria, keeping a coverage rate for instance. However, some categories of problems require that a fixed number of tests be executed to assure quality in testing, avoiding potential losses 1.

(16) INTRODUCTION. 2. in the defect discovery capability. We can imagine a situation where a set of test case cannot be reduced any more through the application of other techniques, such as test selection, because some errors may not be detected or to assure that some functionality is being tested. In these cases, an alternative is to arrange a sequence of test cases in an ordered way such that the execution in this specific order will make the tester reach some goal. Some examples of goals are reaching a coverage criteria earlier, finding defects earlier, and, in our case, reducing the time spent on test execution. This approach is well known as Test Case Prioritization [EMR00] and we instantiated it using the data reuse concept. Our problem is to provide an ordered sequence of tests that can be executed as fast as possible. In order to achieve this goal, we propose four test case prioritization techniques. An exploratory strategy based on permutation generation, and three heuristic-based techniques: Greedy, Simulated Annealing and Genetic Algorithm [RN03]. They aim to generate sequences of tests that when executed the inputs needed for each test may be reused from the execution of a previous test in the sequence. As data is reused the tester (or machine) saves time spent on preparation of these data, which in some test case environments is the most critical phase. Hence, the time spent on the execution is decreased. This time saved can be used to execute more tests or to make testers devote more time to other testing activities. At the end, the main objective is to increase test team productivity, helping companies to deal with the problem of lack of resource. Although, it is important to emphasise that defect discovery capability of the tests being prioritized is not the focus of this work. We assume that the test design phase was performed correctly and the tests were appropriately chosen at the test plan phase done by a test specialist. Our focus is on improving test execution performance through test data reusability. We applied these techniques in a mobile phone testing environment at the BTC (Brazil Test Center ) program, which is a partnership between Motorola and the Informatics Center of the Federal University of Pernambuco (CIn-UFPE). The characteristics from this industrial environment fit perfectly our objectives, as test teams have daily demands on test case executions and test case prioritization can be applied to help them meet these daily testing marks. Test cases in BTC are built from a functional testing perspective (also known as.

(17) INTRODUCTION. 3. black-box testing), which means that the source code is not available [Bei95]. The test design is based on requirements, from where the test engineers retrieve information for creating them. They are described textually, and the tester should read the text and exercise the actions presented inside it on the mobile phone, and compare if the result of such an action corresponds to the expected behavior also described in the test case. Testing in industry is still largely conducted in a manual basis [AR02, Ber07], which is often justified by limitations of existing automation frameworks and an unclear ROI (Return Over Investment) on automation [Hof99, RW06]. The creation process of these test cases can be automated by using a tool developed by the BTC research team. The TaRGeT (Test and Requirements Generation Tool) automatically generates test suites from use case scenarios written in a Controlled Natural Language (CNL). The possible scenarios are specified as use cases described in CNL in a template that supports automatic processing. Then, from use cases, a test suite can be generated, where each test case represents a flow from the written use cases. Experiments revealed a great effort reduction by using TaRGeT on the BTC testing environment [NCT+ 07]. Regarding the testing execution process, a test case technique was proposed by research students from BTC aiming to improve test team productivity [RBJ07]. This technique uses an heuristic to build a sequence according to the data that can be reused keeping a tree where each path from the root to the leaves represents a sequence of test cases that has some data reusability among them. In the end the largest path is returned. However, this technique is not mechanized and the testers may perform their methodology manually. Moreover, it does not guarantee that the best sequence is produced. This work was the main motivation for our research. The main contributions of our work are four mechanized test case prioritization techniques that use the concept of data reuse to generate sequences of tests and, in particular, the permutation technique that guarantees the computation of the best sequences (there might be more than one) for test suites with a small number of tests as it uses an exploratory algorithm. We also provide empirical evidences that the permutation technique can generate better sequences than those generated by the manual approach. Our experimental results show that the permutation technique reduced approximately 25-30% of configuration time, and this represented a reduction of 13.5% of total execution time,.

(18) 1.1 ORGANIZATION. 4. which is the time spent in the whole test suite, both to prepare the phone for a test case and to execute the test case procedures. This gain can be even larger when our technique is applied in test suites with no prioritization. The manual technique experiments reported economy of around 5.56%, reducing an average of 4 minutes the execution time. Authors claim that the saved time could be used to execute more 1 or 2 test cases. Therefore, we assume that our prioritized sequences could allow sequences almost 20% more economical, which could allow the execution of even more test cases increasing the tester team’s productivity. In what follows, we present the organization of this work. 1.1. ORGANIZATION. The remainder of this work is organized as follows: Chapter 2 introduces some basic concepts used throughout the work: Software. Testing, Levels of Testing, Test Designs, Testing Approaches and Test Case Prioritization. Chapter 3 discusses the work related to our strategies. We contextualize white-box. and black-box test case prioritization techniques. Chapter 4 describes the permutation technique developed in this work that uses an. optimized exploratory strategy to generate the best sequences regarding reusability of data. Chapter 5 presents two comparative empirical studies between the permutation. technique and the manual approach used by testers: a case study and a controlled experiment. Chapter 6 presents the three techniques that use artificial intelligence concepts:. Greedy, Simulated Annealing and Genetic Algorithm. In addition, a comparative study among them is also presented. Chapter 7 details the tool developed to support the mechanization of the presented. techniques..

(19) 1.1 ORGANIZATION. 5. Chapter 8 describes the concluding remarks, discussing our contributions, the. limitations of our approach, and future work..

(20) CHAPTER 2. BACKGROUND. 2.1. SOFTWARE TESTING. In this chapter, we present a brief introduction to software testing and test case prioritization. In the software industry, testing has become an activity to assure quality of programs by validating assumptions about the software built. As software has become more and more complex each day, clients demand confidence in the product they are investing on. So, software testing plays a very important role in the achievement of a satisfied level of trustiness among the software stakeholders. The main objective of testing is to find unpredictable situations that the software might behave with the exercise of some scenarios. These situations are described as errors, which are incorrect results from human actions. An error is manifested as a failure, a deviation of the system from the expected result, which is caused by a defect, also called fault: a flaw in the system. These definitions are described in the IEEE Standard Glossary of Software Engineering Terminology [IEE90]. The result of a test execution is a verdict informing if the software worked as predicted (passed) or whether it has some problem (failed) for the specific situation tested. As for the entity which has the capacity to judge if a test has passed or failed, the term Oracle is used, which could be a human mechanism, the person who performs the tests, or automatic, some testing tool. The Oracle’s judgement considers the inputs provided for the test, the outputs generated compared to the expected results. To predict these results the oracle must consider the status of objects involved in the test, say State, defined as the set of all values of all objects for a given set of objects at any given moment [Bei95]. Inputs are any means by which data can be supplied to a test like, for instance, stored data, generated data, keystrokes, messages. Outputs are the observable state of objects at the end of a test execution, including a changed data object, a message, a new screen, the absence of a message, a blank screen. Some authors distinguish outputs and outcomes, 6.

(21) 7. 2.1 SOFTWARE TESTING. but for simplicity we are considering the former as a general term for both. From the definitions above, now we can describe the structure of a test case which is composed of the following parts [Jor02]: Initial State: State of how all test data should be set before the beginning of the. test execution. Procedures: Inputs and actions that should be performed in the test objects. during the test execution. Expected Results: Outputs that should be expected during the execution of the. test procedures. Final State: Output State whose all test data are set after the test execution.. In the context of mobile phones, where we are inserted, the Initial State phase is also called Initial Conditions, and the Procedures are described as a set of Steps composed by the action to be performed and its expected behavior (Expected Results). Initially, there are a lot of definitions and classifications that can be suited to software testing, and we know some of these definitions are not a consensus among all testing authors. However, we need to provide some information about how the testing activity can be classified to make use of these definitions along this document. Broadly, tests can be divided according to categories where each category has its subdivision. We consider the following categories: Level of Testing, Test Design and Testing Approaches. 2.1.1. Levels of Testing. Commonly, the tests are divided into four basic levels, each one associated to a phase in the software development process. They are: Unit Testing, Integration Testing, System Testing and Acceptance Testing . Unit Testing:. The focus is on atomic units of testing. The main goal is to. take the smallest piece of software being tested, isolate it from other pieces and verify if it behaves as expected. It is also called component testing, in which each.

(22) 8. 2.1 SOFTWARE TESTING. unit is tested alone to find any errors in its source code. This level of testing is recommended to be done by programmers since they know well how the code is and how to break it. It should guarantee that each module functions well before integration. There is a large variety of supporting tools for this kind of level, known as the xUnit family [Ham04]; Integration Testing:. This level of testing should explore the interaction and. consistency of successfully tested modules. For example, if components X and Y have both passed their unit tests and there is a new component Z created from the aggregation of X and Y, an integration test should be done to explore the self-consistency of the new module. System Testing:. At this level of testing the entire software as a complete sys-. tem should be tested. The software is tested together with the whole environment where it is inserted. It is done to explore system behaviors that cannot be perceived during unit or integration testing. Examples of some of these behaviors are: installation, configuration, throughput, security, resource utilization, data integrity, storage management and reliability. Acceptance Testing: This is a process to obtain confirmation by the client of the. software under test, through trial or review, that there is a conformance between the system and the requirements previously defined. It is usually performed by the client itself or end-users. 2.1.2. Test Design. Another possible classification of tests is according to the design method used to create them. There are two most common methods for doing test design: black-box testing and white-box testing. Other types of test design exist, as gray-box testing, which is a hybrid approach of the previous two. White-Box Testing: This kind of test takes advantage of the knowledge of the. internal mechanism of a system or component [IEE90]. It is also named structural testing or glass-box testing, as the tester has the perception of how the software works internally, visualizing it clearly. Tests are derived from the structure of the.

(23) 9. 2.1 SOFTWARE TESTING. tested object and the main concern is the proper selection of program or subprogram paths to exercise during the battery of tests. They are normally used in unit testing as, generally, the testers are the programmers who know how the code is built. Some of the benefits from the usage of this kind of test design are: easy gathering information about test coverage and control flow, focused testing as it can test atomic pieces of code, data integrity is more easily achieved by tracking the items in the code, internal boundaries are clearly visible in the code and algorithm testing (if you are using a traditional algorithm, probably there are some tests about it in the literature) [KFN99]. White-box testing is not the focus of this work. Black-Box Testing:. Totally the opposite, black-box testing assumes that the. system is viewed as a closed box, hence the testers should have no information about how the code behaves or how it is structured. While white-box design allows the view inside the system, and it focuses specifically on using internal knowledge of the software to guide the selection of test data, black-box design does not explicitly use knowledge of the internal structure of the software to generate tests. This design method focuses on testing functional requirements. Synonyms for black-box include behavioral, functional, opaque-box, and closedbox. Therefore, the main objective is to test the behaviors and functionalities of the system without seeing how it works inside, but, just seeing the descriptions of how it should execute its tasks. The mechanism of this kind of design is to feed the program under test with inputs retrieved from the knowledge of specifications or tester’s experience and observe the output conformance. Black-box testers do not spend time learning the source code, instead, they study the program from its requirements, which is how the customer will work with it. These test designs can be applied together with any level of testing, although, some levels are more inclined to specific designs, as generally unit tests are constructed from a white-box point of view and acceptance tests from a black-box one. But, what is almost a consensus is that when used together they increase software quality. The main reason is because they usually capture different types of failures in the system as white-box.

(24) 2.1 SOFTWARE TESTING. 10. works deeply into the code while black-box can have a wide view over the system. For example, despite the structural testing approaches make it easy to run certain types of tests, black-box thinking exposes errors that will elude white-box testers [KFN99]. We focus on black-box testing, since only the requirements and the executable code are available in the context of the project in which this work is inserted. 2.1.3. Testing Approaches. There are lots of testing approaches to specific purposes. Some authors refer to them as other levels of testing, but we define them as a separate group from the previous one. Some examples are Stress Testing, which run tests to determine if the software can withstand an unreasonable load with insufficient resources or extreme usage, Performance Testing, which run tests to determine actual performance as compared to predicted performance, and Regression Testing,which is detailed below because of its relevance to the state of the art concerning Test Case Prioritization. Regression Testing is used to confirm that a new release of a software has not regressed, i.e., lost functionality. Thus, it is a testing process used to validate modified software and detect whether new faults have been introduced into previously tested code. Its process consists of rerun test cases previously passed to verify if any functional part of the code has been broken with the increments. There are different approaches where all tests or part of them are executed again. However, re-executing tests every time a modification takes place can be highly costly for systems with a high number of test cases; some approaches try to minimize the size of the test suites to be run, but in practice this is not always an available option to every software company [OTPS98]. Therefore, many investments on research have been done to minimize this problem and one of the techniques suggested is Test Case Prioritization, which is discussed in Section 2.2. In spite of testing being an effective way to discover bugs, it does not guarantee the absence of errors in software, only their presence [Dij72]. From this point of view, testing cannot prove the correctness of programs as it is infeasible to evaluate all their possibilities of execution. A simple example is a program that presents some behavior according to a formal parameter with type integer. As there is an infinite number of integers, we cannot evaluate every one, thus, we cannot prove its correctness, but we can.

(25) 11. 2.1 SOFTWARE TESTING. imagine some specific values that could run the program in an undesirable way. For such values we can use some testing techniques to improve the quality of our testing, which would, potentially, improve the quality of our program. Some testing techniques are described below. Boundary Testing:. To solve the problem of testing all possibilities from an. input with infinite number of values, the tester should choose the ones with higher likelihood of causing flaws. This technique proposes a test that focuses on putting the software being tested under boundary and limit conditions. Once a limit situation is found, the tester must create a test with this limit value and with the values around it. Supposing that 10 and -10 are discovered as limit situations under the system, they should be tested together with 11, 09, -11 and -9; moreover the central value is advised to be tested as well, so, 1, 0 and -1, are also in the set of the test data. Statement, Branch and Path Coverage: For statement coverage, the number. of statements in the code should be executed at least once. Similarly, in branch coverage all outcomes from conditions should be tested along the program. In path coverage, every flow of execution is traversed. This coverage technique includes branch coverage, as it ensures the test of every decision outcome, but in addition, path coverage verifies all dependencies among every branch coverage outcome, becoming a more robust technique compared to the previous ones. Mutation Testing:. Some defects are purposely introduced along the code to. verify the effectiveness of test suites. These faults are inserted in small modifications along the program, one at a time, creating a new version of it which is referred to as a ”mutant”. If the tests detect the mutant, it is disposed, but if it passes undetected the tests are not enough effective. This technique is also used to generate efficient test suites. Equivalence partitioning:. It is a technique that chooses some tests to be exe-. cuted instead of the whole test suite with a large size of elements. These selected tests must be representative among some class of tests called equivalence partition, i.e, they must have similar behavior. The selected subset should cover all partitions; doing so, all possible scenarios are covered and cost spent on test execution.

(26) 2.2 TEST CASE PRIORITIZATION. 12. are reduced [Pre05]. An instantiation of this technique is described as Test Set Minimization [WHLM98]. According to Wong et al., a test set minimization procedure finds a minimal subset in terms of the number of test cases that preserves the coverage with respect to a certain coverage criterion of the original test set. An implemented example of this technique is the tool named ATACMIN [HL92] which generates test sets minimized. It uses an enumeration algorithm to find optimal solution and heuristics when the exact solutions are not obtained in reasonable time. The techniques presented above are related to improving the testing process by improving test case generation, detecting faults, guaranteeing effectiveness, and to reduce the cost of the execution effort. Test Case Prioritization can also be an alternative to address this latter issue, and it is the subject of the next section. 2.2. TEST CASE PRIORITIZATION. In the literature, the problem of ordering test cases is commonly known by the term test case prioritization (TCP). It considers that a test set can be prioritized following some criteria to achieve a goal. Unlike test set minimization and equivalence partitioning, this technique does not suggest the reduction of elements in a test suite, but just adjust them in some sequence that will bring gains related to the goal to be reached. As we have mentioned before, reducing the size of test sets may not be possible in some cases; it may happen due to quality requirements, coverage criteria to be kept or testing targets to be met, thus, an alternative is using a Test Case Prioritization technique to the purpose being pursued. The test case prioritization problem is strongly tied to regression testing as seen in various works [RUCH01, JG08, KP02, QNXZ07, Sri04, ST02, ZNXQ07]. As discussed in the Section 2.1.3, regression testing is a process used to validate modified software and detect whether new faults have been introduced into previously tested code; it can become a very expensive and laborious task. Nevertheless, Elbaum et al. [EMR02] also show that test case prioritization can also be employed in the initial testing of software, and sometimes, it is used with a different terminology [AW95]. In our approach, the prioritization technique can be applied in any phase of the software testing execution;.

(27) 2.2 TEST CASE PRIORITIZATION. 13. once the tests are created they can be submitted to our technique which will generate an ordered sequence of tests. Elbaum et al. formally defined the test case prioritization problem as follows [EMR00]:. Definition 1. Let S be a test suite, P the set of permutations of S and f a function from P to the real numbers. The test case prioritization problem consists in finding p ∈ P such that ∀p0 ∈ P. (p 6= p0 ) ⇒ (f (p) ≤ f (p0 )). where P represents the set of all possible prioritizations of S, and f is a function that, applied to any such sequence denoted by p , assigns a real value to that ordering allowing the comparison of two prioritizations. Then, the definition refers to a search inside P for the best prioritization p. However, the main problem of test case prioritization is the generation of P . We have to produce all permutations of test cases, which in practice is not feasible for a large number of tests. For instance, assuming that it takes 1µsec to generate a single permutation, the permutation of up to 12 tests takes no more than few minutes. However, for 17 tests, it takes 10 years. For more than 25 elements, the time required is far greater than the age of the earth [Sed77]. Despite this assumption being very old, over 30 years ago, and the computability of today being much faster, the complexity of the problem still persists, just allowing the computation of a few more elements. This computational complexity can be seen as a variation of the Traveling Salesman Problem (TSP), where the test cases are the cities, and to go from one test case to another on the prioritization we have an associated weight, which would be the cost of traveling from a city to another. In our case, the main difference is that the cost to travel from city A to city B is different from the cost of traveling from B to A. Moreover, we do not need to go back to the starting city (or the initial test case). But these differences do not change the computational complexity of the problem which is classified as NP-hard [GJ79]. Because of the complexity of the problem, all related works focus on the usage of heuristics, and we present them in the following chapter. Nevertheless, in our strategy, we still explore permutation for small test suits and some heuristics in other cases..

(28) CHAPTER 3. RELATED WORK. In this chapter we describe some test case prioritization techniques that are related to our work. As the methodologies can be applied to different kinds of testing, we divided the prioritization techniques according to the level of code accessability: white-box techniques, that use source code information, and black-box techniques, with no such information at all. Each technique also has a specific purpose and applied to restricted testing environments, as ours. 3.1. WHITE-BOX TEST CASE PRIORITIZATION TECHNIQUES. Despite our work is not a white-box technique most work on test case prioritization take into consideration the source code of applications to generate their orderings, including the paper that formally defined test case prioritization [EMR00]. Most of them are applied to reduce the impact of regression testing, which can be very difficult to execute as discussed in section 2.1. As our technique does not involve the white-box context, we will detail one referenced work about regression white-box testing prioritization, concentrating on definitions, and relevant concepts. 3.1.1. Prioritizing Test Cases For Regression Testing. The initiative reported in [RUCH01] is one of the most referenced works in test case prioritization area. It has formally defined the problem and has discussed it in the regression testing context. The main motivation is the high cost of testing, especially in regression testing. The work considers other techniques to regression testing as test selection and test suite minimization techniques. The motivation for not using them are difficulties like the loss on the fault detection capability, which happens because they deal with a reduced set of test cases, which means coverage cannot always be maintained.. 14.

(29) 3.1 WHITE-BOX TEST CASE PRIORITIZATION TECHNIQUES. 15. Then, test case prioritization is proposed as a new approach that does not lose information, but arrange test cases in some way such that the faults are discovered earlier. Furthermore, nine techniques are described, where actually the first three of them are pseudo-techniques, used only for comparison, and the other six represent heuristics that use test coverage information, produced by prior executions of test cases, to order them for subsequent execution. The techniques are: 1. No prioritization: the application of no technique 2. Random prioritization: randomly order the test cases in a test suite. 3. Optimal prioritization: This is not a practical technique when the size of the suite is very large (as discussed earlier in Chapter 2) because it might not be possible to generate all permutations for suites with large number of tests, but this technique is used to measure the success of the other practical heuristics. To avoid generating all permutations, they have developed a greedy ”optimal” prioritization algorithm, even knowing that it may not always produce the best ordering. 4. Total statement coverage prioritization: prioritize test cases in terms of the total number of statements they cover by counting the number of statements covered by each test case and then sorting the test cases in descending order of that number. 5. Additional statement coverage prioritization: Initially selects the test case which covers more statements (the same way as the total statement coverage prioritization) but the following test cases are chosen according to the number of statements covered by them that were not covered yet on the test suite, i.e., the ones that add more new statements to the coverage. 6. Total branch coverage prioritization: the same as total statement coverage prioritization, except that it uses test coverage measured in terms of program branches rather than statements. 7. Additional branch coverage prioritization: the same as additional statement coverage prioritization, except that it uses test coverage measured in terms of program branches rather than statements..

(30) 3.1 WHITE-BOX TEST CASE PRIORITIZATION TECHNIQUES. 16. 8. Total fault-exposing-potential (FEP) prioritization: the order is based on the ability of a test case to expose a fault. It is obtained from the probability that a fault in a statement will cause a failure for a given test case. This probability must be an approximation of the fault-exposing-potential, and it is acquired through the adoption of an approach that uses mutation analysis to produce a combined estimate of propagation-and-infection that does not incorporate independent execution probabilities. 9. Additional fault-exposing-potential (FEP) prioritization: Analogous to the extensions made to total statement and branch coverage prioritization to yield additional statement and branch coverage prioritization, total FEP prioritization is extended to create additional fault-exposing potential (FEP) prioritization. After selecting a test case t, other test cases that exercise statements exercised by t have their FEP values lowered, then, another test is selected based on the best FEP value and the process repeats until all test cases have been ordered. Furthermore, a series of experiments are described using a measure to compare effectiveness of the various test case prioritization techniques. The APFD measure (average of the percentage of faults detected) plays the role of the function f discussed before. It measures how quickly the prioritized suite detects the faults, with the range from 0 to 100. Higher APFD numbers mean faster (better) fault detection rates. To perform the tests with the techniques, eight C programs were used as seen in Table 3.1, with the information about lines of code, number of versions, number of mutants, the test pool size and the average test suite size. The first seven are programs from Siemens Corporate Research and are commonly known as the Siemens programs; the last one is a program developed for the European Space Agency and is referred to as space. As can be seen, the space program is the larger and more complex than the other subjects. These programs were used as experiments in other works related to this subject [EMR02, JG08, KP02, JH03]. Some intuition acquired from experiments performed with the techniques described shows, in general, that test case prioritization can substantially improve the rate of fault detection in test suites. The six heuristics proposed obtained such improvements but in the case of the program schedule, no heuristic outperformed the untreated or randomly.

(31) 3.1 WHITE-BOX TEST CASE PRIORITIZATION TECHNIQUES. 17. Experiment Results. Table 3.1 Subjects used in the experiments [RUCH01].. prioritized test suites. In almost every case, including the space program, additional FEP prioritization outperformed prioritization techniques based on coverage. Considering the overall results on the Siemens programs, branch-coverage-based techniques almost always performed as well as or better than their corresponding statement-coverage-based techniques. Considering differences between total and additional branch and statementcoverage-based techniques, there was no clear winner overall. The authors conclude the paper explaining that, in this study, test case prioritization techniques can improve the rate of fault detection of test suites, as shown in the empirical studies, and that a benchmark was left with the APFD measure to compare test case prioritization techniques and other measures. For future work, the gap between the optimal test case prioritization and the heuristics used, even with FEP-based techniques, was not bridged, then, other heuristics could be used to reduce this gap. Finally, the test case prioritization problem has many other facets that can be faced, with other objectives and goals, and not all them were addressed in this paper. 3.1.2. Other White-Box Initiatives. A series of works have been done to address problems related to white-box test design. An extension to the work previously described can be found in [EMR02], where 16 techniques are proposed (including 4 already presented in [RUCH01]); the other 12 are not related to statements or branches, but to functions. These techniques use coverage information, probabilities on fault existence or exposure, or a combination of them, to prioritize.

(32) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. 18. the tests. Their experiments show that function-level had similar results to statementlevel prioritization techniques. Another extension was carried out later by Jeffrey and Gupta, proposing techniques based on statement coverage that takes into consideration how statements affect each other [JG08]. Their conclude that their techniques have more potential than the total statement and branch coverage prioritization techniques. Jones and Harrold presented a technique that prioritizes test cases using the modified condition/decision coverage (MC/DC) criterion, which is a form of exhaustive testing that uses branch coverage [JH03]. Their case study shows encouraging results to reduce the cost of regression testing for users of the MC/DC testing criterion. Srivastava and Thiagarajan propose a test case prioritization technique based on basic block coverage for large systems [ST02]. The work suggests that it is possible to effectively prioritize tests in large scale software development environments by using binary matching to determine changes at a fine granularity. 3.2. BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. In this section we describe some works about test case prioritization techniques proposed for tests with no information retrieved from the source code, but from requirements or historical execution data. 3.2.1. Test Case Prioritization for Black Box Testing. The approaches discussed before gave us a good understanding about the prioritization problem, but their experiments are related to source code, more commonly known as white box testing. Regarding the approach presented in [QNXZ07], we can see a closer relation to our problem, because it deals with black box testing as we do, but it still tries to solve regression testing problems. Despite of this, that work does not use code data to build the test prioritization; it uses runtime and historical information. The purpose of this work is to test software in the situation where the source code is not available. This approach reuses test suites and assigns each test case a selection probability according to its historical performance, and then selects ordered test cases to run until testing resources exhaust. Algorithms were developed to order the tests. The main idea of these algorithms is to group all reused test cases by the fault types they.

(33) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. 19. have revealed, and then dynamically adjust their priorities according to test results of related test cases. The technique uses a matrix R that relates each test case to all others, so once a test case revealed regression faults, related test cases can be adjusted to higher priority to acquire better rates of fault detection. Given a test suite T , T 0 a subset of T and R the relation matrix, the general process is described as follows: 1. Select T 0 from T and prioritize T 0 using available test history. 2. Build the test case relation matrix R based on available information. 3. Draw a test case from T 0 , and run it. 4. Reorder the remaining test cases using run-time information and the test case relation matrix R. 5. Repeat step 3 and 4 until testing resource is exhausted. The approach presents three algorithms: one builds the relation matrix and the other two Escalate and Deescalate test cases. The first is described based on the matrix, where each cell is a boolean value which is true if two related test cases reveal the same faulty type, and false otherwise. For simplicity, in this paper example, a test case can reveal only one faulty type. So, the matrix is traversed and filled following this rule. Based upon the relation matrix, the regression testing strategy selects one test case to run, and then escalate or de-escalate the priorities of its related test cases that have not been used. The Escalate algorithm gives higher priorities to test cases related to the previous one tested and which has failed. The Deescalate algorithm does exactly the opposite, because it gives lower priorities to test cases related to others that stayed at the last ordinal orders when one test case has ran successfully. Further, some examples and experiments are shown in [QNXZ07], making use of this approach. A new metric M0 for black box testing is exposed, given that APFD is hard to use in a black box environment, because to find the fault numbers that each test case reveals is said not to be an easy task. Let ti be the ith test case in the ordered test suite, let timei be the execution time of ti , and let fi be a Boolean value that indicates if ti has failed. M0 for a test suite T is given by the first equation seen in Figure 3.1, and the second M1 is used when execution time is not relevant, as seen in Figure 3.2:.

(34) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. M0 =. 1 2. Pn. 20. Pn timej − timei ) × fi ] Pn j=1 Pn × 100% i timei × i fi. i=1 [(2. Figure 3.1 The calculation of M0 .. M1 =. 1 2. Pn. − 2i + 1) × fi ] P × 100% n × ni fi. i=1 [(2n. Figure 3.2 The calculation of M1 .. The experiments were made using Microsoft Word and PowerPoint 2003 with the service packet (SP2) and all bug-fixes. The metric used was M1 and the results show that the strategies using Escalate and Deescalate algorithms are on average 10% better than no adjusting strategy. Then, the authors conclude that the dynamic adjusting methods are of benefit to improve the test suite’s fault detection rate for black-box testing. Finally, the authors explain that despite their relation matrix is built based on faulty types, it can be done using other kinds of information like test objectives. For future work, the authors mention that, as the algorithms are nonnumeric, they intend to research numerical algorithms to adjust selection probabilities of test cases and perform experiments to compare the approaches. 3.2.2. The Automatic Generation of Load Test Suites and the Assessment of the Resulting Software. The main purpose of the approach reported in [AW95] is to generate ordered test cases sequences for telecommunications systems. Three automatic test case generation algorithms are introduced, and the authors claim that it can be applied to any software that can be modeled by a Markov Chain [Nor98] using collected data and estimates. The approach is based on a telecommunication profile that includes information about the number of different types of calls, the average service demands and the probability distribution for each type of call, and the average external arrival rate and probability distribution for each call type. Using this information, a model using Markov Chains is constructed, but this model could become very large given a high number of calls. For instance, to test a system with five different call types, each holding at most 500 calls, it would be necessary 268.318.178.226 Markov states requiring over 13.000.000.000 hours of.

(35) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. 21. testing assuming three minutes holding time. So, the algorithms deal with this problem of reducing the amount of states, choosing a representative state that corresponds to the most likely to be executed. This is done using a constant that represents the minimum probability that a state has to have to be used in a test case. Three algorithms are described to solve the problem. The first generates all states that have steady-state probability greater than . The second algorithm generates in order all state transitions such that the product of the state probability and the call arrival rate is greater than . It uses the first algorithm to generate all states of interest and then filters them based on the restriction cited. The third algorithm extends the second one by considering sub-paths rather than individual edges. Empirical studies were performed using five industrial software telecommunications projects ranging in size from 50,000 to more than one million lines of code. Further, it was defined a reliability measure to evaluate the approach. This measure could be used to track the progress of the load testing algorithms, and to guide test planning. The results with the projects show the detection of serious program faults that they predict would not have been detected until field release. They also report a greatly facilitated regression testing and fault localization capability due to the use of these algorithms. They plan to use the notions to domains other than telecommunications systems. This paper is used as reference in [EMR02] as a work where test case prioritization is not used for regression testing but at initial testing. 3.2.3. Requirements-Based Test Case Prioritization. As our problem deals with tests acquired from requirements, the work reported in [Sri04] is one of the most related to ours. Besides requirements based testing, it also uses a black-box approach at the system level. Another difference from other approaches is that its main goal is to identify the severe faults earlier and to minimize the cost of test case prioritization. An enhancement on test case prioritization is proposed by incorporating additional knowledge from requirements engineering research. The approach proposes a new strategy called Prioritization of Requirements for Testing (PORT). It uses 3 prioritization factors (PFs) to order test cases: (1) customerassigned priority on requirements, (2) requirement complexity and (3) requirements.

(36) 22. 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. volatility. These factors are assigned values. The first, customer-assigned priority on requirements (CP), is a value from 1 to 10 assigned by customers based on how important the requirement is. The second, requirement complexity (RC), is also a value from 1 to 10, but it is assigned by the developer based on the perceived implementation difficulties of the requirement. The last, requirements volatility (RV), is the number of times a requirement has changed. The higher the factor value of a requirement, the higher the priority of test cases related to that requirement. Another task is the assignment of weights to the PFs such that the assigned total weight (1.0) is divided amongst all. The assignment is done by the development team, but must be according to the customer needs. Given these values, a weighted prioritization (WP) is calculated for all requirements, as seen in Figure 3.3.. WP =. n X. (P F value × P F weight). P F =1. Figure 3.3 The weight prioritization equation [Sri04].. Test cases are then ordered such that the test cases for requirements with high WP are executed before others. An example is shown in Table 3.2, where the better prioritization is R1, R4, R3 and R2. Factors. R1. R2. R3. R4. Weights. Customer Priority 7. 4. 10. 8. 0.30. Req. Complexity. 10. 5. 4. 9. 0.40. Req. Volatility. 10. 8. 5. 5. 0.30. WP. 9.1 5.6 6.1 7.5 1.00 Table 3.2 An example using PORT [Sri04].. Further, a case study is detailed on an industrial project that comprises 152 KLOC. The preliminary results show that requirements complexity and volatility impact fault density. For future work, an analysis on the industrial data to determine the effectiveness of customer priority should be made..

(37) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. 3.2.4. 23. A Test Execution Sequence Study for Performance Optimization. The research reported in [RBJ07] has the objective to find out an optimization in the execution of mobile phone tests inside the BTC program. As said before in Chapter 1, the tests are in natural language and the testers should read the test cases, perform the actions described and verify if the state of the phone corresponds to the description of the expected results in the test case. As the testers have a high demand on daily test suites execution, the time spent on the execution of these tests becomes a critical factor to the teams. So any effort to increase their efficiency could help these teams to provide a better service. Another aspect is that, usually, testers do not follow a predefined order, but their own feelings based on practical experience on how well they could execute a test suite. These decisions end up being random and personal. In this context, a test case structure can be described as follows: Pre-condition (Setup) - some setups needed in order to start the test case execution.. This information is treated differently from the order data because they do not change during the test execution, and they should be executed together in the beginning of the whole ordered test sequence. Generally, this setup information is related to the act of setting bits or basic configurations on the phone. They are not considered in the reuse strategy. Input Data - data needed for the execution of the test that can change during test. execution to produce outputs. They are considered in the reuse. Output Data - data resulted from the execution of the test. Elements that should. be in the state of the phone in the end of the test. They are considered in the reuse. So, the main idea is to make an ordered sequence based on the reuse of this information, minimizing the time spent to calibrate a mobile phone for a test execution. As a test case can be represented as a set of inputs, some procedure and a set of outputs, the objective is to generate a sequence where each test case is linked to another combining inputs and outputs. The methodology is proposed not to be mechanized but to be followed by the testers like a guide. It is divided into a sequence of steps. The first step refers to a survey on.

(38) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. 24. configurations, input and output data from the test cases. As said before, the configurations (setups) are done before the entire test sequence is executed. The input and output data discovered are tagged with a unique ID and organized in a table. It serves to identify them later. All test cases are redefined following a standardization, which presents a test case as a unique ID, a set of inputs and a set of outputs using the tagged IDs from the previous step. To use the methodology, a root test case is created, defined as INIT, which represents a fictitious test case for the sequence to have an initial point to be ordered from. It has no inputs, and usually its outputs are the ones that require the minimal effort to be executed (for example, “0 contacts in the phonebook”). Given that all these steps are performed, the ordering itself can be initialized. It is done using a tree data structure, where INIT is the root node. To define the children of the nodes, one verifies the test cases that have intersections between their inputs with the outputs of the current node. So, the children nodes of the root are the test cases that have their input elements inside the set of output elements of the root node, and so on. There are some rules to be followed. For instance, a node cannot be repeated on its sub-tree and whenever it is not possible to insert new nodes, that sequence is finished. For example, in Table 3.3, we have some test cases with their input and output sets. The INIT is the fist test case in the table. The intersections among the test cases are highlighted in bold. We can see in Figure 3.4 that TC 097, TC 255 and TC 256 are the children nodes from the root node (their inputs intersect the output of INIT). Following this rule the whole tree is constructed. Each branch of the tree corresponds to a test case sequence. Once the tree is constructed, then the ordered sequence is chosen taking the deepest branch of the tree. However, there are cases in which not all test cases are in the deepest branch. A new tree with the remaining test cases is built and the process is redone until there are no test cases left. When there are two or more branches with the same depth, to make the algorithm deterministic, the leftmost path is chosen. An example of tree using the elements of Table 3.3 is displayed in Figure 3.4. A case study was driven, comparing this approach to ordered sequences created by testers from their feeling and personal experience (what they were using up to this research), and the approach has shown better results in 83% of the performances and the average gain was over 5.56% reducing the execution time of the sequences in around 4 minutes. Authors claim that the saved time could be used to execute other one or.

(39) 3.2 BLACK-BOX TEST CASE PRIORITIZATION TECHNIQUES. Table 3.3 Test case representation and their relationship [RBJ07].. Figure 3.4 An example of an ordered tree using the approach [RBJ07].. 25.

(40) 3.3 DISCUSSIONS. 26. even two test cases. Currently, this methodology is being used for some teams inside the BTC. Although this approach has brought some gains in efficiency terms, it does not always bring the optimal ordered sequence. A generated sequence can occasionally be the optimal one, but it is not certain. In next section, we have a discussion about the contributions, criticisms and opinions over the related work and their relevance to our problem. 3.3. DISCUSSIONS. Each work has its importance in the test case prioritization area, but with respect to our problem, we have some more specific considerations about them. The approach of [RUCH01] is a remarkable contribution to the test case prioritization area because it has a formal definition of the problem and contributed to establish a research topic under the term ”test prioritization”. Test case prioritization schedules test cases in order to increase their ability to meet some performance goal. Although this approach give us a background of the problem from a standard perspective, regarding our problem, these contributions are not applicable because the approach is designed for a white-box problem and takes in consideration only regression testing. On the other hand, the work reported in [QNXZ07] is in the context of black-box testing, taking into consideration the faulty types of each test case and use algorithms to determine the ordering sequence, but the focus is on regression testing problems. The approach in [AW95] presents a black-box example of test case generation and prioritization that works at the initial testing phase. This approach requires the system to be modeled as Markov Chains. It also requires an operational profile of the problem context. The example presented uses an operational profile from a telecommunications domain. We still need to further investigate whether Markov Chains are a suitable mathematical model for our application domain. The approaches reported in [Sri04] and in [RBJ07] present closely related work. The first describes a black-box non-regression testing approach based on requirements. Its goal is to discover the severe faults earlier on the testing process. For that, it uses some factors and weights that should be given by customers and developers to retrieve a value for each requirement. Based on these values the tests are ordered according to the.

(41) 3.3 DISCUSSIONS. 27. requirements they are associated with. However, no prioritization technique is scheduled to test cases from the same requirement. The second work was the main inspiration for our research, and the first attempt to optimize test case execution through prioritization inside BTC program. This research provides some good results, as shown earlier, and its methodology is already adopted in some test execution teams. However, there are also some drawbacks. As said before, following those steps does not always guarantee the generation of the best execution sequence. In order to recover the best sequence we have to analyze all possible permutations. On the other hand, a feasible generation of all permutations depends on the size of the test suite. When we have a small set of test cases, it could be feasible. Our experiments showed that the perfect sequence can be generated from suites of up to 15 test cases in the worst case. In cases where the perfect sequence is infeasible, we can also experiment with different heuristics. However, heuristics cannot guarantee the best alternatives. Using heuristics we are susceptible to make arbitrary choices that lead us to a solution that may not always be a good one. For instance, one of the choices reported in [RBJ07] is the definition of the root node that restricts the possibilities of starting with nodes that have minimal preparation efforts. However, there might be a case in which we start from test cases with medium or high preparation efforts and still get a minimal cost at the end. Another choice is the selection of the leftmost branch when there are two sequences with the same depth. These decisions can interfere in the quality of the sequences generated. Moreover, this approach lacks a formal definition of the measure function f , which makes it difficult to compare the results with different heuristics. In a numeric form we can use mathematical concepts and return a value representing the results in a more concrete and clear way. Finally, except by [RBJ07], all approaches focus on ordering the test cases according to the chance of discovering faults, while only we and [RBJ07] are using test case prioritization to optimise the test cases execution considering the inputs and outputs provided..

(42) CHAPTER 4. PERMUTATION TECHNIQUE FOR BLACK-BOX TEST CASE PRIORITIZATION. Test Case Prioritization can be used to achieve many goals, as shown in the previous chapter, and most of the techniques developed come from well defined heuristics. The main cause to use heuristics is the difficulty to find the best prioritization, given it is a problem with factorial order of complexity. However, contrary to other initiatives that do not consider an optimal solution, we noticed that an exhaustive strategy that brings the best results is feasible for small test suites. Whenever this methodology is not feasible, heuristics must be taken in consideration even knowing the optimal result may not be achieved. We discuss some proposed heuristics further in Chapter 6. A similar initiative have been made but regarding Test Set Minimization discussed earlier. This problem is also a NP-complete, so the same restrictions regarding computational cost are applied. The tool ATACMIN [HL92] uses an exponential algorithm to find best solutions. It also implements some heuristics to be used in the cases where these solutions are not found in a feasible time. We developed this work inserted in the mobile phone testing environment at BTC program. We deal with functional tests where the source code of the system under test is not available, the tests are specified in English and are executed manually. By observing the daily work activities at the BTC, we noticed that several test suites contain a small number of test cases. A feasibility study was taken to verify if an exhaustive technique could be applied. We run a permutation algorithm (without any optimizations), to check the possible limited amount of tests we could run with this technique. We had in mind that such an algorithm could be left to run an entire day, as the testers received the test suite in a period of one day in advance of their execution. These results are listed in Table 4.1. We notice that a test suite with 15 test cases can run (in the worst case) in less than 24 hours. Clearly, to generate all permutations 28.