A big thank you to all my ground segment colleagues who made my experience at the European Space Operations Center memorable. Experiments performed in the dataset of the European Space Agency's Ground Segment test scenarios demonstrate the ability of this domain-specific tool to produce results close to human thinking and facilitate test procedures.
Motivation
Problem Statements
Furthermore, the current test routine in the ESOC Ground Segment includes a well-established test framework, which allows the creation of test cases in a model-driven manner. Can we achieve traceability of test execution requirements based on natural language expression of test scenarios.
Proposed Solution
Structure
Test Scenarios
The purpose of scenario testing is to test the end-to-end functionality of a software application and ensure that business processes and flows work as required. Test scenarios are the high-level concept of what needs to be tested and are considered critical as they help determine the most important end-to-end transactions or actual use of software applications.
Test Case
Tracing Requirements
The background between the user requirements for the system you are building and the work products developed to implement and verify those requirements. Requirements tracing helps the project team understand which parts of the design and code implement user requirements and which tests are needed to verify that user requirements have been implemented correctly.
Text Mining
These work products include software requirements, design specifications, software code, test plans, and other artifacts of the system development process.
Natural Language Processing
- Syntax
- Semantics
- Discourse
- Speech
Lexical semantics, Name Entity Recognition (NER), Natural language understanding are also important tasks in this work. Background Other common tasks in this category are Machine Translation, Natural Language Generation, Optical Character Recognition, Question Answering, Text Relation Recognition, and Word Meaning Disambiguation. Natural language understanding Converts text into more formal representations such as first-order logic structures that are easier for computer programs to manipulate.
Similarity Calculation between Documents
Vector Space Models
The first step to calculate semantic similarity is to obtain a semantic vector representation of the relevant terms. The frequency of each term in each document (tf) is calculated as the square root of the number of times the term appears in the document, while the inverse document frequency (idf) is the logarithm of the total number of documents divided by the number of documents that the term contains. After calculating the sense vectors, the cosine similarity between them defines the similarity score.
Association Rules Mining
Recommender Systems
Recommender Systems for Software Engineering
A test scenario with attributes expressed in natural language acts as an input to the system. The user chooses what they think is more relevant until the test scenario is completed and the system recommends requirements relevant to the functionality of the test. The user is provided with intermediate outputs of appropriate recommendations so that the user can quickly interact with the system.
Design Constraints
GUI
Test Blocks Tab
The Test Blocks tab allows the user to enter a test step and select one or more functionally equivalent test blocks. In the case of assigning a value to a parameter, the user must enter an equal sign before the value. Methodology In this sector, the user can select the appropriate test blocks from among the recommended ones.
Requirements Tab
The first option displays several recommendations, while the second opens a dialog box where the user can search for a test block by entering its full name or part of it. The parentheses should be replaced according to the desired start and end step IDs in this table.
System Design
Methodology The logic of the system regarding the Test Blocks tab is described in detail in the component diagram of Figure 3.4. Because the software implementation behind the Requirements tab is similar, we will discuss the components of the Test Blocks tab in detail and refer to the core differences where they apply.
Deep Learning Model
Most of the above parameters were chosen based on the suggestions of the creators of Word2Vec, while the size and architecture of the system were decided by the experimental evaluation detailed in Section 4.3.1. We selected a window of 5, min_count is set to 1 since our data is small and workers to 8; the number of cores of our development machine.
Presenter
Spell Checker
Parser
NLP Filter
The final retrieved information of a test block is a group of individual words that appear in both the description and name of the test block, without any parameter values.
Recommender
Score of Keywords
Score of Parameters
Association Analysis and Re-scoring
Flow Checker
Methodology We perform an analysis of the content of the test case to check whether an application must be started in order for the action in the selected test blocks to be applied. For this reason, we check the prerequisites of the selected block to open or close any applications as needed. The Tear Down phase is implemented by selecting the test blocks that close all the applications residing in the application stack.
Data Storage
In addition, we retrieved information from a database containing 5040 test scenarios, 5569 requirements, and 2160 test blocks from 21 test libraries. Regarding the testing part, the test blocks were extensively analyzed so that only those blocks that provided high-level information were included; close to human thinking. These comprise 685 of the total 2160 test blocks and usually consist of groups of two or more lower level test blocks.
Evaluation Measures
Experiments on Recommender Decisions
Text Similarity between Test Steps and Test Blocks
This experiment involves all the test steps of the collected test scenarios with associated test blocks. We aim to observe whether the correctly assigned test blocks can be included in this list, and therefore we calculate the Recall@K metric for each test step with corresponding test blocks in each test scenario. Recall that the k-metric informs us about how many of the test steps, the correct relevant test block has been retrieved in the first k recommendation points.
Text Similarity between Test Scenarios and Requirements
Experiments on User Feedback
The MRR is calculated for each test scenario as we go through each iteration of the data set. The occurrence of similar sequences of blocks is not uncommon, especially for the setup part of the developed automated test. However, this does not mean that the sequences are the same, and it negatively affects the ranking of the correct blocks.
Experiments on Efficiency and User Productivity
Time performance
Test Coverage
In addition, we extended the approach with the aim of improving mismatched rankings in the provided recommendations using association analysis reordering. The proposed system is implemented as a standalone tool compatible for integration with the software components of ESOC's ground segment. The instrument was designed in accordance with the software testing routine guidelines and procedures.
Contributions
Furthermore, we demonstrated the developed system using appropriate data from the Ground Segment's Mission Control System and proved its advantages and weaknesses in providing recommendations of test blocks and requirements. Specifically, the proposed method appears to perform efficiently in associating test steps with test blocks that may contain free text and parameters. In addition, it offers the ability to link proposed requirements to a test scenario; a very time-consuming and laborious task for a human mind.
Directions for Future Extensions
Applications
Linear relationships captured from Word2Vec
Le and Mikolov [Le and Mikolov, 2014] propose a variation of the Word2Vec algorithm for computing paragraph vectors by adding an explicit paragraph function to the input of the neural network. When we pass the user input through the NLP filter, we use this vector space model, where each term is the value of a dimension of the model. The hybrid approach has been introduced to avoid the limitations of the content-based and collaborative filtering approaches [Adomavicius and Tuzhilin, 2005].
In [Azizi and Do, 2018], the proposed recommendation system uses three data sets: code coverage, change history and user sessions, to produce a list of the riskiest components of a system for regression testing. To ensure the quality of the automated test cases generated by the tool, the user must follow the guidelines described in section 3.3. Using this benchmark, we can set the weights of the heuristic function of the recommendation system.
Depending on the case, a training corpus should be collected with caution and implementation should take into account the design of the recommendation system in advance to achieve the highest possible quality and effectiveness.
A simple Word2Vec model with CBOW architecture containing only
Word2Vec model with CBOW architecture
As we mentioned above, the similarity measure refers to semantic similarity, a metric defined over a set of documents or terms, where the idea of distance between them is based on the similarity of their meaning or semantic content. We will use an example to go through similarity calculations between a query and a list of document candidates to examine the vector space models listed below. The similarity between two text segments is calculated using the term frequency inverse document frequency (TFIDF).
Word2Vec model with skip-gram architecture
LSI is based on the principle that words used in the same contexts tend to have similar meanings. The system then recommends to the target user the movies that they have rated highly in the past by those users. The combination of collaborative filtering and content-based approaches is mostly used in industry today [Adomavicius and Tuzhilin, 2005].
Graphical User Interface - Test Blocks Tab
Each step must be listed as a new statement and may contain one or more user-system interactions relevant to the feature being developed. Both the description and the expected result must be a single sentence to achieve top quality recommendations. A table of test scenario steps lists the corresponding steps sequentially (in the order they are executed) to achieve the functionality being developed.
Graphical User Interface - Requirements Tab
High-Level System Design
The "push" operation of the stack is equivalent to opening a program, while the "pop" operation is equivalent to closing it. We have chosen the Mission Control System (MICONYS) software package in the Ground Segment (Fig.4.1) to perform the evaluation due to its complexity and its ability to collect tagged data. Towards the next generation of recommender systems: An overview of the state of the art and possible extensions”.
In: Proceedings of the 23rd Annual ACM SIGIR International Conference on Research and Development in Information Retrieval. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp.
Component Diagram: Assignment of Test Blocks to Test Steps
Component Diagram: Assignment of Requirements to Test Scenario
Functionality of the Spell Checker component
Spell Checker example
Information extraction from a test step
Information extraction from a test block
We define applications that are opened only once and not closed during the scenario phase as belonging to the installation phase.
Data Storage Objects
The latest version of MICONYS consists of the following software systems: DABYS, DARC, EDDS, FARC, GFTS, MATIS, NIS, SCOS-2000, SFT, SLE API, SMF [Peccia,2005]. The recommended weights for these experiments are set towk = 0.8 and wp = 0.2; however, further improvements in return values are achieved with different weights, as shown in the Properties Recommender experiments. In Figure 4.5 we observe that for some test scenarios, the MRR has little improvement to negative for 50 iterations, while in others the corresponding test blocks are reaching high ranks.
The two recommender systems are decoupled in our system architecture, which provides the opportunity to save the requirements tracking task for a later point in the testing process. Measuring semantic similarity in short texts using greedy combinations and word semantics.” In: FLAIRS conference.
A simplified spacecraft system. Orange arrows denote radio links;
Relevant Test Blocks in the first k recommendations. The model with
Relevant test blocks recommendations from different Vector Space
Relevant requirements recommendations from different Vector Space
Test Block rankings improvement from user feedback in 50 iterations
Time performance of the tool in the testing dataset
Test Coverage of the testing dataset
This thesis considered the problem of improving a testing routine using natural language processing innovations and techniques.
Example - Candidate Documents in a Recommender Engine
Example - Search Query in a Recommender Engine
TFIDF Recommendations
LSI Recommendations
Jaccard Index Recommendations
Google Word2Vec Model Average Recommendations
Google Word2Vec Model Comprehensive Recommendations
Word2Vec model: Values assigned to numerical parameters after eval-
Presenter: System responses to User actions
Performance of Word2Vec with different sets of parameters for Test
Recommender Scores - Weights Tuning
Performace of Word2Vec with different sets of parameters for Require-
Recommender Scores - Weights Tuning