Automated analysis for process compliance

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

Automated Requirements Analysis

using Natural Language Processing

Mariana Oliveira

Mestrado Integrado em Engenharia Informática e Computação Supervisor: Prof. João Pascoal Faria - FEUP

Co-Supervisor: Eng. Nuno Teixeira - Bosch Car Multimedia

(2)

(3)

Automated Requirements Analysis using Natural

Language Processing

Mariana Oliveira

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Chair: Prof. Ana Cristina Paiva

External Examiner: Prof. Rui Maranhão Supervisor: Prof. João Pascoal Faria

(4)

(5)

Abstract

The automotive domain has been recognised as being remarkable regarding aspects like quality and safety. Behind all the technology involved in the creation of automotive software, there are several Software Engineering processes taking place.

One crucial process is the production of artefacts. some examples are Use Cases, Models, Requirements and Design Documents. They help describe the features, architecture and design of the software, therefore their quality is directly linked to the quality of software systems as well as their development process.

Surveys have shown that the most common reasons for a project’s lack of success are not technical but rather problems with requirements. For that reason, during this investigation, the Requirements documents and their quality will be the target.

Considering that the automotive market is as demanding as ever concerning innovations of increasing complexity, the associated shorter development periods make it paramount to improve software development processes.

In order to guarantee the quality of their software, automotive companies support requirements reviews which aim is to ensure these documents are abiding to a series of international quality standards. These describe various quality criteria which are then used to measure the quality of a project. Making sure these quality criteria are fulfilled during the requirements reviews is a complex task that requires effort and resources extending the time to release new products.

It is a challenge to find a way to transform this task in a simpler and more efficient action. As such, the centre of this research is in identifying applicable quality criteria and their respective process metrics in the interest of creating a tool that finds, interprets and alerts the user about faults or incoherences in the requirements.

As most requirements are written using Natural Language due to it being easily understandable by all stakeholders, a tool that analyses their quality should apply Natural Language Processing (NLP) techniques. Although there are several tools available in the market that support require-ments analysis, none of them provide the necessary input type and integration means and that is why a tool was created to tackle that lack of suitable solutions.

The tool that was developed during this investigation is called Requirements ANalysis Tool (RANT). RANT employs several NLP techniques such as Sentence Splitting, Tokenization and Part-of-Speech tagging, in order to assess the requirements quality by producing a final evaluation regarding the requirements structure and potential ambiguity.

Creating a tool that was integrated in the used Requirements Management environment was an essential feature. Hence RANT was developed as an extension of IBM’s Requirements Man-agement environment called Collaborative Lifecycle ManMan-agement (CLM).

Alongside RANT, a checklist was created with the aim of being used by the analysts together with RANT in the interest of helping the analysts to perform a more complete and accurate re-quirements review.

(6)

As a form of validation of RANT, binary diagnostic testing was performed with the main goal of obtaining precision, recall and accuracy values. The results obtained seem promising with values of 82%, 95% and 85% regarding precision, recall and accuracy respectively.

(7)

Resumo

O setor automóvel é reconhecido como sendo notável em aspectos como a qualidade e a se-gurança. Por detrás de toda a tecnologia envolvida no desenvolvimento de software automóvel, ocorrem vários processos de Engenharia de Software.

Um processo crucial é a produção de artefactos, alguns exemplos são os Casos de Uso, Mod-elos, Requisitos e documentos de Design. Estes artefactos ajudam a descrever as características, a arquitectura e design do software. Assim, a qualidade destes documentos está directamente ligada à qualidade dos sistemas de software, assim como ao respectivo processo de desenvolvimento.

Estudos mostraram que a maioria das razões comuns para o insucesso de um projeto não são por motivos técnicos mas sim por problemas com os requisitos. Por esse motivo, durante esta investigação, o foco estará nos documentos de requisitos e na respetiva qualidade.

Considerando que o mercado da indústria automóvel está cada vez mais exigente em relação a inovações de complexidade crescente e os consequentes períodos de desenvolvimento mais curtos, levam a uma necessidade de melhorar os processos de desenvolvimento de software.

Para garantir a qualidade do seu software, empresas do ramo automóvel fazem revisões aos requisitos, cujo objectivo é assegurar-se que estes documentos cumprem uma série de standards internacionais de qualidade. Estes standards descrevem vários critérios de qualidade que são então utilizados para medir a qualidade de um projecto. Certificar-se que estes critérios de qualidade se-jam seguidos durante as revisões de requisitos é uma tarefa que requer bastante esforço e recursos, levando a um prolongamento do tempo até ao lançamento de novos productos.

É um desafio encontrar uma forma de transformar esta tarefa numa acção mais simples e mais eficiente. Como tal, o centro desta investigação é identificar critérios de qualidade aplicáveis e as respectivas métricas de processo com o objectivo de criar uma ferramenta que encontra, interpreta e alerta o utilizador acerca de falhas ou incoerências nos requisitos.

Como a maioria dos requisitos são escritos utilizando Linguagem Natural devido a ser facil-mente compreensível por todos os interessados, uma ferramenta que analisa a qualidade dos requi-sitos deverá aplicar técnicas de Processamento de Linguagem Natural (PLN). Apesar de existirem várias ferramentas disponíveis no mercado que efectuam análises aos requisitos, nenhuma delas providencia o tipo de input necessário ou os meios de integração pretendidos e, por isso, foi criada uma ferramenta para mitigar essa falta de soluções adequadas.

A ferramenta que foi desenvolvida ao longo desta investigação denomina-se Requirements ANalysis Tool (RANT). A RANT utiliza várias técnicas de PLN como Sentence Splitting, Tok-enizatione Part-of-Speech tagging, para aferir a qualidade dos requisitos através da produção de uma análise que avalia a estrutura e a potencial ambiguidade que possa existir nos requisitos.

A criação de uma ferramenta que estivesse integrada no ambiente de gestão de requisitos utilizado era uma característica essencial. Assim, a RANT foi desenvolvida como uma extenção do ambiente de gestão de requisitos da IBM chamado Collaborative Lifecycle Management (CLM).

Em paralelo com a RANT, foi elaborada uma checklist com o objetivo de ser utilizada junto com a RANT e poder ajudar os analistas a realizar revisões de requisitos mais exatas e completas.

(8)

Como forma de validação da ferramenta RANT, o teste de diagnóstico binário foi o escolhido com o principal objectivo de obter os valores de precisão, revocação e exatidão. Os resultados obtidos foram promissores com valores de 82%, 95% e 85% para a precisão, revocação e exatidão, respetivamente.

(9)

Acknowledgements

First, I would like to thank my thesis supervisor Prof. João Pascoal Faria for his invaluable guidance and support.

I am grateful to my co-supervisor Nuno for allowing me to freely pursue my research while still giving me all the guidance needed.

I would like to express my appreciation to Bosch for allowing me to have access to such a great workplace and providing everything I could need to fulfil my goals for this research.

A special thank you to my coworkers with whom I have had the pleasure to work and that always made me feel part of the team.

I would like to show my greatest appreciation to the experts that were involved in the validation of this project without whom this research would not have had a happy ending.

My deepest heartfelt appreciation goes to my boyfriend João for putting up with me and my terrible mood swings during this process and for always giving me the strength, love and motiva-tion that I need to push through.

Finally, my acknowledgements would not be complete without expressing my profound grat-itude to my parents for their relentless support and encouragement throughout my years of study and for their unwavering love and patience through tough moments.

Thank you.

(10)

(11)

“We can only see a short distance ahead, but we can see plenty there that needs to be done.”

(12)

(13)

List of Figures

2.1 Reasons for project failure [Hul09] . . . 7

2.2 The Requirements Engineering Process [Som10] . . . 7

2.3 Quality Management and Software Development [Som10] . . . 11

2.4 The software review process [Som10] . . . 12

2.5 NLP Stages . . . 13

2.6 Dependency Graph [SAAA+] . . . 14

2.7 Phrase Structure Tree [SAAA+] . . . 14

3.1 CoreNLP system architecture [MSB+14] . . . 20

3.2 spaCy’s pipeline [spab] . . . 22

3.3 spaCy’s Architecture Diagram [spab] . . . 22

3.4 RAT’s Analysis Overview [VK08] . . . 23

3.5 QuARS High-level Architecture Scheme [Lam05] . . . 24

3.6 Overall Smell Detection Process of Qualicen Scout [FFJ+14] . . . 25

3.7 Example of QVscribe in Microsoft Word [QRA] . . . 26

4.1 Checklist for Requirements Review . . . 32

4.2 Description of Quality Criteria (Continuation of the checklist) . . . 33

4.3 RANT Architecture Diagram . . . 35

4.4 Example of RANT integrated in the CLM RM environment . . . 36

4.5 Ambiguous words and expressions . . . 39

(16)

LIST OF FIGURES

(17)

List of Tables

(18)

LIST OF TABLES

(19)

Abbreviations

API Application Programming Interface ARM Automated Requirements Measurement CLI Command Line Interface

CMMI Capability Maturity Model Integration NLP Natural Language Processing

NLTK Natural Language Tool Kit

PCFG Probabilistic Context-Free Grammar PoS Tagging Part-of-Speech Tagging

PST Phrase Structure Parsing

QuARS Quality Analyser for Requirements Specifications RANT Requirements ANalysis Tool

RAT Requirements Analysis Tool

REMsES Requirements Engineering and Management for software-intensive Embedded Systems

SATC Software Assurance Technology Centre SE Software Engineering

SPICE Software Process Improvement and Capability dEtermination SQM Software Quality Management

SQuaRE Systems and software Quality Requirements and Evaluation V&V Verification and Validation

(20)

(21)

Chapter 1

Introduction

Over the past decade there has been an increase in the complexity of software-intensive em-bedded systems in the automotive domain [BBH+14]. With the automotive market being as demanding as ever concerning the innovations of increasing complexity, the associated shorter development periods due to the high pressure to market a product make it a necessity to push the current software development processes to its limits.

One efficient method of ensuring high quality software in time and within budget is by ap-plying software engineering (SE) processes to a project [Som10]. An important part of these processes is the creation of artefacts, in particular requirements documents.

Requirements Engineering is one of the fundamental activities of SE and consists of under-standing and defining what services are required from the system and identifying the constraints on the system’s operation and development [Som10].

Low quality requirements can lead to misunderstandings which consequently may contribute to errors in the design flow that are either hard to detect or detected too late [SAAA+].

Therefore, the quality of these documents must be assured and is directly connected to the quality of the software systems as well as their development process. In fact, surveys have shown that the most common reasons for a project’s lack of success are not technical but rather require-ments related (13.11%) [Hul09].

Most of these problems arise because the requirements are written in Natural Language which can lead to issues with clarity, consistency and ambiguity. When the requirements are written, it is critical to ensure that they comply with relevant quality criteria and international standards. Unfortunately, that is not always the case and that is what the verification and validation (V&V) phase in requirements engineering is for.

During the V&V phase, processes such as reviews are used to check if the requirements doc-uments abide by the quality criteria and standards related to those docdoc-uments.

Although helpful in finding defects in the requirements, these reviews can be tedious because reading and analysing each requirement individually takes time and requires a great effort to stay

(22)

Introduction

alert and remember relevant information. By not performing the requirements reviews adequately, the time until new products are launched could be extended.

Existing static analysis tools mainly target code files, which are of no use when analysing requirements and that is why there is a need for specific tools that analyse requirements. There are some solutions available that for reasons of adaptability or privacy could not be used.

Throughout this investigation, a viable solution was developed by means of an integrated so-lution in IBM’s Collaborative Lifecycle Management (CLM) tool called RANT (Requirements ANalysis Tool) that makes use of Natural Language Processing (NLP) techniques. This solution is complemented by a checklist that shall be used alongside RANT, both are thoroughly described in4.

1.1 Context

Bosch Car Multimedia is a division of Bosch which develops smart embedded solutions for entertainment, navigation, telematics and driver assistance functions. They develop cutting-edge features tailored to modern mobility requirements, providing optimum driving convenience, safety and access to entertainment and information via smart networked architectures [Bos].

Their ambition is to enhance the quality of life with solutions that are both innovative and beneficial. They focus on their core competencies in automotive and industrial technologies as well as in products and services for professional and private use.

A huge part of creating such technology are the processes that take place throughout the de-velopment of a project. Software Engineering processes are usually applied by companies in order to produce quality software on time and within budget.

One of those processes is the elaboration of artefacts and their respective validation. In the scope of this experiment, the Software Specification, i.e., the Requirements Engineering process is the one on focus.

As quality is of such importance in the automotive industry, it is essential to develop soft-ware fast but still abiding to certain quality criteria defined by a series of international standards regarding this matter.

These criteria should be applied when the requirements are being written but that does not always happen. Hence, the validation of the requirements is a crucial task to ensure quality in the requirements and, consequently, of the software.

During the requirements review which is part of the validation process, analysts manually go through each requirement trying to find mistakes and incoherences regarding the international quality standards. The problem urges because this task takes up a lot of time and adding the pressure to market a product to the necessity of having quality software, there comes a need for a partially automated solution regarding this review process.

(23)

Introduction

1.2 Motivation and Goals

Taking into consideration that the requirements review process is such a tremendously time consuming chore, it is paramount to find a solution that helps turn this task into a brisker and semi-automated process.

Although there are numerous static analysis tools that can be used to scan source code files for errors, there is not extensive knowledge about tools that help analysing requirements.

The reason for that lack of possible solutions is that the requirements are written in Natural Language which, in one hand is very good because it is accessible to everyone involved in the project; on the other hand, it can be incoherent, incomplete and ambiguous.

A lot of time is spent reading the requirements one by one and looking for possible defects which might even result in an unsuccessful search because, as humans, staying always alert can be difficult and errors can be overlooked.

The quality of the requirements is of the utmost importance to companies as errors in the requirements can cause serious consequences in the overall project leading to a delay in product releases consequently bringing costs to the company.

With the aim of achieving quality requirements in the smallest period of time possible, the focus of this project is to develop a tool that assists the analysts during the review, turning it into a simpler and more efficient task. By identifying applicable quality criteria, the tool will analyse the requirement using NLP techniques providing an insight as to the existence of possible issues with the requirement, alerting the user to that situation.

1.3 Document Structure

From this point forward, this document is structured in five chapters.

The first one, Chapter2contains all the needed knowledge to understand the research done in this project.

Then, there is a chapter regarding the State of the Art, Chapter3, with information about the existing tools that have similar goals as the one developed during this research along with relevant details about toolkits and libraries regarding this matter.

Chapter4describes the work that was pursued in detail, providing information about the ar-chitecture of the tool and its features.

The validation process of this project is thoroughly specified in Chapter5, mentioning the tests done and their respective results.

Lastly, in Chapter6, the conclusion taken from this experiment including some points con-cerning future work are stated.

(24)

Introduction

(25)

Chapter 2

Background

2.1 Software Engineering

Being in an era where technology has such an increasing impact in our lives, it seems impos-sible to imagine a world with no software, because it is everywhere around us: from our cars to our smart TVs, going through medicine and industry, cleaning robots and smart virtual assistants. In late 2013, a survey found that 23% of products now contain software in some form [Agi].

Software Engineering as defined in [Ref10] is the systematic application of scientific and technological knowledge, methods, and experience to the design, implementation, testing, and documentation of software; that is, the application of engineering to software.

Applying software engineering to a project is considered the most effective way of getting software of high quality within schedule and budget [Som10].

Software processes are concerned with work activities accomplished by software engineers to develop, maintain and operate software, such as requirements, design, construction, testing, configuration management and other activities [BF14].

Four is the number of fundamental activities that are common to all software processes accord-ing to [Som10]. The one on focus during this investigation is the Software Validation activity.

During the Software Validation activity the software is analysed to make sure it complies to its specification and meets the needs of the stakeholders.

There are no universal software engineering methods that are applicable to all systems [Som10]. One of the most important points to consider when deciding which method to implement is the type of software that is being developed: stand-alone or interactive transaction-based applications, embedded control or batch processing or entertainment systems, modelling and simulation soft-ware and data systems.

Embedded systems are software control systems that control and manage hardware devices. Bosch Car Multimedia develops embedded solutions for entertainment, navigation, telematics and driver assistance functions and they follow the REMsES (Requirements Engineering and Manage-ment for software-intensive Embedded Systems) approach.

(26)

Background

2.2 Requirements Engineering

Requirements Engineering is the process of understanding and defining what services are re-quired from the system and identifying the constraints on the system’s operation and develop-ment [Som10]. Requirements Engineering is concerned with discovering, eliciting, developing, analysing, determining verification methods, validating, communicating, documenting, and man-aging requirements [Ref10].

Following Requirements Engineering processes ensures that the user expectations will be met and the software will be of high quality. In the automotive domain, where requirements are elab-orated at an early stage in the project life cycle, errors made at this stage can be hazardous to the software design and implementation if not found in time and corrected. The later these errors are found, the higher are the costs [HDS+04].

Low quality requirements can lead to potential failures to discover the needed functionality, to understand the implications of such functionality, to properly explain the requirements to the developers and, very often, to understand what is the real problem that needs to be solved [RR]. In fact, surveys have shown that problems with the requirements are the main reason for the lack of success in a project, represented by 13.11% as shown in Fig.2.1.

Regarding the Requirements Engineering process, there are four main sub-activities (Fig.2.2). During the Feasibility Study, it is evaluated whether the identified needs of the user are being fulfilled by the software and hardware solutions currently being used. It is also taken into consid-eration whether the suggested system will be profitable from a business point of view and if it can be developed within the available budget [Som10].

With the results from this study, it must be decided if the proposed solution is the best alterna-tive given the constraints in technology, resources, finances, etc. [BF14].

The Requirements Elicitation and Analysis process is fundamentally a human activity where the stakeholders are identified and relationships are established between the development team and the customer [BF14]. It may also involve the creation of one or more system models and prototypes in order to better understand the system that is to be specified [Som10].

After gathering the information from the previous activity, the Requirements Specification process is simply turning that information into a set of requirements that accurately define the customer needs. This document can also be used as the basis for developing effective verification and validation plans [BF14].

The Requirements Validation process is where it is checked if the requirements are real, com-plete and consistent. The goal of this activity is to find possible problems in the requirements document. If any are found, a report shall be elaborated showing the problems with the require-ments and the necessary corrections so the requirerequire-ments document can then be adjusted.

(27)

Background

Figure 2.1: Reasons for project failure [Hul09]

(28)

Background

2.2.1 Requirements Validation

The requirements documents may be subject to Verification and Validation (V & V) procedures [BF14].

V & V procedures are used to determine whether the development products of a given activity conform to the requirements of that activity and whether the product satisfies its intended use and user needs [10112].

The requirements may be validated to make sure that the developer has understood the require-ments; it is also important to verify that a requirements document conforms to company standards and that it is correct, complete, accurate, consistent and testable [BF14].

But why is Requirements Validation so important? If errors are not discovered in the require-ments document during this stage but later during project development or even after the system is being put to use, it can lead to extensive and costly rework.

As mentioned in [Som10], this assurance is given by:

• Validity checks: The functions proposed by stakeholders should be consistent with what the system needs to do.

• Consistency checks: Requirements in the document should not conflict.

• Completeness checks: The requirements document should include all the requirements and all the constraints.

• Realism checks: Make sure the requirements can be fulfilled using current existing technol-ogy, within budget and schedule.

• Verifiability: It should be possible to write a set of tests that can prove that the delivered system meets each requirement.

In order to validate the requirements, there are various techniques that can be used individually or jointly: Requirements Reviews, Prototyping, Model Validation and Acceptance Tests [BF14].

Requirements reviews seem to be the most common means of both verifying and validating requirements documents [BF14]. During this process, a group of reviewers analyse the require-ments systematically looking for errors, mistaken assumptions, lack of clarity, verifiability issues and deviation from standard practice [Sta11].

The output of a requirements review is a list containing reported problems with the require-ments document and the actions necessary to cope with the reported issues [PEM03].

(29)

Background

2.2.2 Requirements Quality

The quality of requirements in the requirements document has a significant impact on the final system. Requirements of poor quality can lead to misunderstandings and further errors in the design that are usually difficult to detect or detected too late [SAAA+]. Therefore, it is paramount to ensure the quality of requirements by defining requirements in a way that their validation is objective, rather than a subjective opinion.

Requirements are usually written in Natural Language, meaning there is no defined format by default, just normal plain text. Hence, requirements can be vague and ambiguous which calls for the use of certain standards and guidelines to help make the requirements clearer, easy to understand, complete and consistent.

Software standards are essential regarding software quality management because they capture wisdom that is of value to the organisation, provide a framework for defining the meaning of quality and they facilitate the continuity when work started by another person is picked up and continued by another [Som10].

Concerning Requirements Quality, Bosch Car Multimedia follows the following international standards:

• Automotive SPICE [VDA15]

• ISO/IEC/IEEE 29148:2011 - System life cycle processes - Requirements Engineering [Sta11] • ISO/IEC 25010:2011 - Systems and software engineering — Systems and software Quality

Requirements and Evaluation (SQuaRE) — System and software quality models [ISO02a]

• Capability Maturity Model Integration (CMMI) [CMM]

Automotive SPICE (Software Process Improvement and Capability dEtermination) is a pro-cess maturity framework that assesses the capability and maturity of organisational propro-cesses to develop embedded systems in the automotive industry [VDA15]. Basically, it is a derivation from the ISO 15504 created specially for the automotive industry.

ISO/IEC/IEEE 29148:2011 defines the construct of a good requirement, provides attributes and characteristics of requirements and discusses the iterative and recursive application of re-quirements processes throughout the life cycle [Sta11].

SQuaRE defines a product quality model that comprises eight quality characteristics: Func-tional Suitability, Performance Efficiency, Compatibility, Usability, Reliability, Security, Main-tainability and Portability [ISO].

CMMI is a process model that provides a clear definition of what an organisation should do to promote behaviours that lead to improved performance. It has five Maturity Levels: Performed, Managed, Defined, Quantitatively Managed and Optimising [CMM].

(30)

Background

2.3 Software Quality Management

Software quality problems were initially discovered in the 1960s with the development of the first large software systems and they are still an issue nowadays [Som10]. Trying to solve this situation and inspired by the methods used in the manufacturing industry, formal techniques of software quality management were developed and adopted.

The definition of quality software is, basically, fitness for purpose, i.e., conformance to re-quirements. It is providing something that satisfies the customer and ensures the needs of all the stakeholders are considered [Hul09].

Quality Management as defined in [Ref10] is a set of coordinated activities to direct and con-trol an organisation with regard to quality. Every management decision is a compromise between cost, schedule and quality [Hul09].

Software Quality Management consists of four subcategories: • Quality Planning

• Quality Assurance • Quality Control • Quality Improvement

Quality Planning involves deciding which quality standards will be used, defining quality goals and doing an estimation on the effort and schedule of the quality activities.

Quality Assurance incorporates various steps that define and assess if the software processes are appropriate and that they produce software products of suitable quality for their intended pur-poses [BF14].

As for Quality Control, the project documents are analysed and executables are examined to determine whether they follow the standards set for the project.

Regarding the Quality Improvement category, it is sought to improve process effectiveness, efficiency and other characteristics with the objective of improving software quality [BF14].

2.3.1 Software Quality Assurance

Quality Assurance as stated in [Ref10] is a set of planned and systematic activities imple-mented within the quality system and demonstrated as needed, to provide adequate confidence that an entity will fulfil requirements for quality.

The purpose of the Quality Assurance Process is to provide independent and objective as-surance that work products and processes comply with predefined provisions and plans and that non-conformances are resolved and further prevented [VDA15].

(31)

Background

Figure 2.3: Quality Management and Software Development [Som10]

2.3.2 Verification and Validation

The Software Quality Management (SQM) process is significant when it comes to assuring quality in software. This process checks the project deliverables to make sure that they are consis-tent with organisational standards and goals [Som10]. Part of this SQM process is the Verification and Validation procedure.

The purpose of Verification and Validation (V&V) is to aid an organisation to achieve quality in their system during the life cycle. V&V processes grant an unbiased evaluation of products throughout the life cycle. This evaluation shows whether the requirements are correct, complete, accurate, consistent and testable [BF14].

Verification is an attempt to ensure that specified requirements have been fulfilled and that they satisfy the standards, practices and conventions during the life cycle processes. Validation is the confirmation that the requirements for a specific intended use or application have been fulfilled and that they solve the right problem [ISO02b].

Both verification and validation are interrelated and complementary processes that use each other’s process results to establish better completion criteria and analysis [10112]. They take place early in the development or in the maintenance phase. As the output, the V & V plan documents describe the various resources and their roles and activities as well as the techniques and tools to be used [BF14].

Part of the V & V activities involves checking processes, such as reviews and inspections, at each stage of the software process from user requirements definition to program development [Som10] as illustrated in Fig.2.3.

2.3.2.1 Reviews and Inspections

Requirements reviews are possibly the most used method of both verifying and validating a requirements document. Reviews and inspections are used side by side with program testing as part of the V & V processes.

(32)

Background

Figure 2.4: The software review process [Som10]

In order to arrange a requirements review, a group of reviewers is constituted with the purpose of looking for errors, mistaken assumptions, lack of clarity, verifiability issues and deviation from standard practice [Sta11]. Generally, the review process consists of three phases: pre-review activities, the review meeting and the post-review activities as shown in Fig.2.4.

Inspection as defined in [IEE08] is a visual examination of a software product to detect and identify software anomalies, including errors and deviations from standards and specifications. As part of the inspection process, a checklist with the most ordinary errors is used to help the inspectors focus during the search for bugs [Som10].

Inspections and reviews are part of the so-called static V & V techniques. These techniques ex-amine software documentation, including requirements, interface specifications, designs, models and source code without executing the code [BF14].

Inspections and reviews have a very positive outcome when it comes to discovering software errors. Yet, they demand a lot of time to organise and often lead to delays into the development process [Som10].

2.4 Natural Language Processing

Natural Language Processing (NLP) is a theoretically motivated range of computational tech-niques for analysing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or ap-plications [D.L01].

The processing of natural languages is a difficult task and it needs different techniques to be used than the ones for processing artificial languages [ESW14].

Most requirements documents are written in Natural Language rather than using modelling or structured techniques because it is much easier to be understood by all the stakeholders in the project. Besides that, Natural Language allows the engineer to be as abstract or as detailed as required in a certain situation. On the other hand, Natural Language can be ambiguous, leading to misunderstandings in the requirements specification.

A simplified view of Natural Language Processing emphasises four distinct stages as depicted in Fig.2.5.

(33)

Background

Figure 2.5: NLP Stages

The Lexical Analysis purpose is to interpret the meaning of individual words and has five main techniques that can be used: Sentence splitting, Tokenization, Part-of-Speech (PoS) tagging, Morphological Analysis and Parsing.

Sentence splitting is the process of breaking the text into separate sentences. During this process the Natural Language text is analysed to determine the sentence boundaries between the sentences.

Most languages use punctuation marks to indicate the boundaries between sentences. There are, however, some instances were punctuation marks are not used for indicating boundaries. For instance, with abbreviations and titles punctuation marks are used, which do not indicate a sen-tence boundary.

Tokenization splits the sentence into its meaningful units, named tokens. Based on the struc-ture of the text, which is partly provided by the sentence splitting, the tokens are associated to a category. The most common categories are words, numbers, punctuation marks and symbols.

The Part-of-Speech tagging process is responsible for tagging each token with its grammatical category, based on its definition and context. Each token is then identified with a tag, such as noun, verb, adjective or determiner [Are16].

The Morphological Analysis is the preliminary stage that takes place before syntactic analysis. The purpose of this stage is to identify the root of compound words. This can be accomplished by using stemming and lemmatization. Stemming is a technique that reduces an inflected word to its stem, usually by removing their suffixes and Lemmatization is a technique that finds the root form of a word [Are16].

(34)

Background

Figure 2.6: Dependency Graph [SAAA+] Figure 2.7: Phrase Structure Tree [SAAA+] Parsing is a technique that consists in analysing a sentence by taking each word and determin-ing its structure from its constituent parts. In order to parse a piece of text, it is necessary to have two components: a parser and a grammar. The grammar for natural languages is ambiguous and typical sentences have multiple possible analyses [BSPM16].

There are two primary types of parsing: Dependency Parsing and Phrase Structure Parsing. The first one focuses on the relations between words in a sentence as shown in2.6while the sec-ond one focuses on building the Parse Tree, usually using a Probabilistic Context-Free Grammar (PCFG) as it can be seen in2.7.

The output of the Lexical Analysis is the input to the Syntactic Analysis. This process performs an analysis of the words in a sentence in order to reveal the grammatical structure of the sentence. This requires both a grammar and a parser. The output of this level of processing is a representation of the sentence that reveals the structural dependency relationships between the words [D.L01].

Semantic processing determines the possible meanings of a sentence by focusing on the inter-actions among word-level meanings in the sentence [D.L01]. It builds up a representation of the objects and actions that a sentence is describing and includes the details provided by adjectives, adverbs and propositions [RMP13].

Categorisation aims at automatically nominating new documents to categories that are already established [YT06]. In Requirements Engineering, categorising deals with classifying require-ments for a certain purpose which can be helpful for the development of software. Classified requirements can be assigned to teams that each focus on a particular class of requirements.

(35)

Background

2.4.1 Statistical Parsing

The statistical approach to NLP has become more and more important in recent years though its application started back in the 1980s [YT06].

Statistical parsing consists in methods for syntactic analysis that are based on statistical in-ference from samples of natural language text [YT06]. Statistical inference may be applied for various features of the parsing process but is mainly used for disambiguation.

A possible solution to the problem is a probabilistic parser as it computes the probability of each interpretation and chooses the most probable one [PES17].

The set of possible syntactic representations is usually defined by a particular theoretical framework but normally takes the form of a complex graph or tree structure. The most common type of representation is a phrase structure [YT06].

The idea behind a statistical parser is that it assigns sentences in natural language to their favourite syntactic representations, either by providing a ranked list of possible analyses or by selecting a single optimal analysis [YT06].

There are two types of probability models: Discriminative and Generative.In a Discriminative model, the condition probability P(y|x) is modelled [YT06]. In the Generative model what is modelled is the conditional probability of an input x given a certain label y.

The simplest Generative Statistical Parsing model is the Probabilistic Context-Free Grammars (PCFGs).

2.4.2 Phrase Structure Parsing and Probabilistic Context-Free Grammars

Phrase Structure Parsing focuses on identifying phrases and their recursive structure originat-ing in a Phrase Structure Tree (PST).

A PST contains structural information about a sentence where the root node represents the whole sentence and the non-terminal nodes represent the syntactic grammar structure in terms of constituents, while the terminal nodes are the atomic words of the sentence [SAAA+].

The analysis of the sentence and annotation into a PST is performed by structural parsers such as the one contained in the Stanford CoreNLP natural language processing toolkit [Sta17].

Natural Language can be ambiguous which can lead to originating multiple PSTs for the same sentence. How the PSTs are generated depends on the grammar employed by the parser.

The simplest augmentation of the context-free grammar is the Probabilistic Context-Free Gram-mar (PCFG). Besides the elements contained in a context-free gramGram-mar, the set of terminal sym-bols, nonterminal symsym-bols, the start symbol and a set of rules, PCFGs add a probability function.

As shown in Algorithm1, this probability function takes each grammar rule and and associates it with the probability value of each rule [Tha17].

A PCFG can be used to estimate a number of useful probabilities concerning a sentence and its parse-tree(s) which can be useful in disambiguation [JM09].

(36)

Background

Algorithm 1 PCFG’s Formal Definition [Tha17]

1: G= (T, N, S, R, P)

2: T is a set of terminal symbols

3: N is a set of nonterminal symbols

4: S is the start symbol ( S ∈ N)

5: R is a set of rules/ productions of the form X → γ

6: P is the probability function

7: P is |R → [0, 1]

8: ∀ X ε N, ∑X→γεRP(X → γ) = 1

The probability of an ambiguous sentence is the sum of the probabilities of all the parse trees for that sentence [JM09]. The PST with the highest PCFG score has the best probability of being the correct one.

2.5 Conclusion

Applying software engineering to a project is paramount in order to get high quality software within schedule and budget. The focus of this investigation is on one of the four main activities of software engineering: Requirements Engineering.

Following Requirements Engineering processes increases the chances that user expectations are met and the software is of high quality. Mistakes made during this stage can be harmful in the future if they are not found in time and corrected.

Requirements Engineering also has four fundamental sub-activities. This research is based on the Requirements Validation stage.

Requirements documents shall be subject to validation and verification processes. These are important to make sure errors are corrected so there is less probability of having to do extensive and costly rework ahead in the project development. One of the most used techniques to validate requirements are the Requirements Reviews.

As requirements are usually written in Natural Language and due to its ambiguous nature, it is not an easy task to evaluate their quality so it is important that requirements are written in a specific way. That is when international standards and guidelines come to aid.

Software Quality Assurance is one of the sub-categories of Software Quality Management which is responsible for providing independent assurance that the products and processes comply with the predefined plans and that non-conformances are solved.

Verification and Validation (V&V) methods such as Reviews and Inspections are some of the various processes of Quality Assurance. Besides being one of the most common means of validation, reviews are a time consuming and very tedious task.

As most Requirements Documents are written using Natural Language, in order to be able to partially automate the process of reviewing requirements, it is necessary to process those require-ments using Natural Language Processing (NLP) techniques.

(37)

Background

The one on target during this research is the Statistical approach. This approach computes the probability of each interpretation and chooses the most probable one. The most ordinary type of representation is a phrase structure and the simplest statistical parsing model is the Probabilistic Context-Free Grammars (PCFGs).

(38)

Background

(39)

Chapter 3

State of the Art

3.1 NLP Toolkits and Libraries

In this section, some of the existing NLP tool kits and libraries will be described, namely their main functionalities as well as their structure.

3.1.1 Stanford CoreNLP Toolkit

The Stanford CoreNLP Toolkit is an NLP toolkit developed by The Stanford Natural Language Processing Group. It offers Java-based modules for the solution of a range of basic NLP tasks as well as the means to extend its functionalities with new ones.

CoreNLP is an annotation pipeline framework which provides most of the common core NLP processes. Some of those are Tokenization, Sentence Splitting, Part-of-Speech tagging, Lemmati-zation and Syntactic Analysis based on a probabilistic parser.

The overall system architecture of CoreNLP, as illustrated in3.1, consists on the input of raw text into the Annotation Object, then a sequence of Annotators add information in the analysis pipeline which then results in an Annotation object containing all the analysis information added by the Annotators [MSB+14]. This result can be in XML format or plain text.

The annotators provided in Stanford CoreNLP work with any character encoding, the default being the UTF-8 encoding. While most users employ the annotators already present in the toolkit, it is also possible to add additional custom annotators to the system [MSB+14].

Stanford CoreNLP can be used directly from the command-line, via its original Java API, via the object-oriented simple API, via third party APIs for most major modern programming languages like JavaScript or Python or via a web service.

(40)

State of the Art

Figure 3.1: CoreNLP system architecture [MSB+14]

3.1.2 The Natural Language Tool Kit

The Natural Language Toolkit (NLTK) is a suite of program modules, data sets, tutorials and exercises, covering symbolic and statistical natural language processing [BL]. It is written in Python and is distributed under the GPL open source license.

The NLTK provides simple interfaces to over 50 corpora and lexical resources alongside a se-ries of librase-ries for classification, tokenization, stemming, tagging, parsing and semantic reasoning [NLT18].

The aim of the NLTK is to assist research and teaching in NLP areas such as empirical linguis-tics, cognitive science, artificial intelligence, information retrieval and machine learning [BL].

3.1.3 Apache OpenNLP

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language [Apa].

The Apache OpenNLP toolkit consists of several components making it possible to build a full natural language processing pipeline, namely a sentence detector, a tokenizer, a name finder, a document categoriser, a PoS tagger, a chunker and a parser.

Components contain parts which enable one to execute the respective natural language pro-cessing task, to train a model and often also to evaluate a model. Each of these facilities is acces-sible via its application program interface (API). In addition, a command line interface (CLI) is provided for convenience of experiments and training [Apa].

(41)

State of the Art

3.1.4 IBM Watson

IBM Watson is a powerful tool that uses Artificial Intelligence techniques applied to business. One of those techniques is the Natural Language Understanding service.

This service analyses text to extract meta-data from content such as concepts, entities, key-words, categories, sentiment, emotion, relations, semantic roles, using natural language under-standing []. Natural Language Underunder-standing uses NLP to analyse semantic features of any text.

Provided plain text, HTML or a public URL and Natural Language Understanding returns results for the features specified. The service cleans HTML before analysis by default, which removes most advertisements and other unwanted content.

It is possible to interact with the Natural Language Understanding service by means of a REST API.

3.1.5 spaCy

spaCy is a library for advanced Natural Language Processing in Python and Cython. It comes with pre-trained statistical models and word vectors and currently supports tokenization for more than twenty languages. It features the fastest syntactic parser in the world, convolutional neural network models for tagging, parsing and named entity recognition and easy deep learning integra-tion [spab]. It is commercial open-source software, released under the MIT license.

Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage.

spaCy provides the means for the usual NLP techniques such as tokenization, lemmatisation, PoS tagging, entity recognition, dependency parsing, sentence recognition, word-to-vector trans-formations and many methods for cleaning and normalising text [spaa].

While some of spaCy’s features work independently, others require statistical models to be loaded, which enable spaCy to predict linguistic annotations – for example, whether a word is a verb or a noun. spaCy currently offers statistical models for eight languages, which can be installed as individual Python modules [spab].

When the NLP model is loaded and then called on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the default models consists of a tagger, a parser and an entity recogniser. Each pipeline component returns the processed Doc, which is then passed on to the next component as it can be seen in Fig.3.2.

The architecture of spaCy has essentially two central structures: Doc and Vocab [spab]. As shown in Fig. 3.3, the Doc object owns the sequence of tokens and all their annotations and the Vocabobject owns a set of look-up tables [spab]. The Doc is made by the Tokenizer and then modified by the components of the pipeline. The Language object takes raw text and sends it to the pipeline and then returning an annotated document.

(42)

State of the Art

Figure 3.2: spaCy’s pipeline [spab]

Figure 3.3: spaCy’s Architecture Diagram [spab]

(43)

State of the Art

Figure 3.4: RAT’s Analysis Overview [VK08]

3.2 Tools for Requirements Analysis

In this section, a few existing tools for requirements analysis and their respective major func-tionalities will be described. These tools present a similar goal as the one of this project but are not a viable solution either for adaptability or privacy issues.

3.2.1 Requirements Analysis Tool

The Requirements Analysis Tool (RAT) performs a range of analysis on requirements docu-ments based on industry best practices while allowing the user to write the docudocu-ments in a stan-dardised syntax using Natural Language [VK08].

This tool carries out a syntactic analysis with the help of a set of glossaries that identifies syntactic components and flags problematic phrases. An experimental version of RAT uses domain ontologies and structured content extracted from the requirements documents during the syntactic analysis in order to perform a semantic analysis [VK08].

As shown in3.4, RAT’s approach starts with the syntactic analysis of the requirements doc-ument. Then, it extracts structured content from the document about each requirement which is used for phrasal and semantic analysis [VK08].

RAT supports a set of controlled syntaxes for writing requirements which include: • Standard Requirements Syntax

• Conditional Requirements Syntax • Business Rules Syntax

Three types of user glossaries are used to parse requirements documents: agent glossary, action glossary and modal word glossary [VK08]. It implements a deterministic finite automata based approach to parse the requirements, extract structured content and generate error messages.

(44)

State of the Art

Figure 3.5: QuARS High-level Architecture Scheme [Lam05]

RAT has been created as an extension to Microsoft Office and can be installed as a plugin for Word and Excel. Office libraries in .NET are used to access the textual requirements and analyse them [VKV14].

3.2.2 QuARS

The Quality Analyser for Requirements Specifications (QuARS) is a research tool that aids in the creation of quality requirements by allowing he user to perform an initial parsing of require-ments by automatically detecting potential linguistic defects that can cause ambiguity problems at later stages of the software product development [Lam05]. QuARS also provides support for the consistency and completeness analysis of the requirements [Lam05].

The high-level architectural design of the QuARS tool can be seen in3.5.

The input is a requirements document in plain text. This file is passed on to the syntax parser which then produces a new file containing the parsed version of its sentences. This tool relies on a set of indicator-related dictionaries which must also be in plain text [QuAa]. The outputs of the tool include log files with the indications of the sentences containing defects and the calculation of metrics about the defect rates of the analysed document [Lam05].

3.2.3 Automated Requirements Measurement

NASA’s Software Assurance Technology Centre (SATC) developed a tool around the late 90s that automatically analyses a requirements document and produces a detailed quality report [CL14]. This report is based on a statistical analysis of word frequencies at many structural levels of the document.

(45)

State of the Art

Figure 3.6: Overall Smell Detection Process of Qualicen Scout [FFJ+14]

The tool is called Automated Requirements Measurement (ARM). It browses a requirements document searching for exact key words and phrases that could impact the quality of the require-ments [McC01].

The SATC later developed eSMART, which is an update of ARM, that included additional features, such as a better user interface, the possibility to read the input specifications from a Microsoft Word formatted document and the ability to specify custom word lists for the quality indicators used in the analysis process [CL14].

Unfortunately, the work on the ARM tool stopped in 2011, although some versions of the tool are still used by various organisations.

3.2.4 Qualicen Scout

Qualicen Scout is a real-time quality analysis and visualisation tool for requirements and tests written in natural language [quab]. This tool features several different analysis to identify quality issues in the requirements also named requirement smells. The difference between a requirement smell and a defect is that a smell is only an indication for a possible quality defect [Are16].

Scout detects long and complicated sentences, usage of passive voice, multiple negations, vague phrases and pronouns, comparatives and superlatives, usage of a slash, duplicate require-ments and, if required, Qualicen Scout can also perform structural analysis [quab].

The tool provides users with warning messages which a short description whenever a smell is detected. Some smells use NLP techniques such as morphological analysis and PoS tagging. The process of detecting requirements smells consists of four steps (Fig. 3.6): Parsing, Annotation, Smell Identification and Presentation.

Qualicen Scout integrates with some current tools like Qualicen PTC Integrity LM, Microsoft TFS, Visual Studio Team Services, GIT and SVN [quab].

3.2.5 QVscribe

QVscribe is a requirements analysis tool created by QRA Corp [QRA]. It combines natural language processing (NLP) with an expert system. QVscribe analyses requirements and alerts the author of ambiguous, overly complex and essentially malformed engineering requirements [QVs].

(46)

State of the Art

Figure 3.7: Example of QVscribe in Microsoft Word [QRA]

QVscribe performs syntactic and semantic analysis of requirements for quality and consis-tency.

Once QVscribe is installed, it appears as a toolbar icon. It is used as a add-in tool for Microsoft Word, Visure Requirements and Marinvent’s Synthesis [QRA].

In order to start using QVscribe, first it is necessary to specify the location of the requirements in a document, then specify the acceptable terminology to be used in the requirements and identify specific parts of the requirements that should not be analysed [QRA]. After these steps, QVscribe can auto-find the requirements present in the document.

The analysis provided by QVscribe is based on eight quality measures: Imperatives, Negative Imperatives, Options, Weaknesses, Vagueness, Subjectiveness, Continuances and Directives. The analysis can also be customised to match a company’s best practices for requirements documenta-tion.

The results of the analysis are displayed in a simple, interactive scorecard which indicates the quality of each requirement by number of stars (one to five) - an example of QVscribe in Microsoft Word can be seen on Fig.3.7.

Clicking on a requirement listed in the scorecard takes the engineer to the requirement in the document. There, the engineer can see highlights showing the quality indicators that triggered the given score for the requirement [QRA].

(47)

State of the Art

3.3 Conclusion

A tool that performs static analysis on requirements is a huge help when it comes to assuring the quality of the requirements documents.

Libraries and toolkits such as the Stanford CoreNLP, the Natural Language Tool Kit (NLTK), the Apache OpenNLP and spaCy are paramount when developing a tool that analyses require-ments docurequire-ments. Most of these perform standard NLP techniques such as tokenization, PoS tagging, sentence splitting which is of great help when it comes to detecting possible defects in requirements.

With that in mind, several companies have come up with solutions that are able to provide that needed assistance. Although the offers on the market for requirements analysis tools are scarce, there are some which can be of help to this research. Some example of such tools are the Requirements Analysis Tool (RAT), the Quality Analyser for Requirements Specifications (QuARS), Automated Requirements Measurement (ARM), the Qualicen Scout and QVscribe.

Despite being useful for this investigation, they are not a viable solution for this particular case. Adaptability is an issue in the case of RAT, Qualicen Scout and QVscribe because they are available as extensions to several platforms such as Microsoft Word and Marinvent’s Synthesis and can not be integrated in the intended environment - CLM.

Accessibility is also an obstacle, specifically in the case of ARM and QuARS, because these tools are not available for use, either because they are not available anymore (ARM) or because they were created as academic research tools (QuARS). Some of these tools demanded that the requirements were written in a determined structure and not just plain Natural Language text as expected in this investigation.

However, the analysis provided by these tools is very detailed and advanced, providing a deeper evaluation of the requirements documents. That leaves the reasons mentioned before as the only setbacks of using them in this specific case. Were they to create an extension for IBM’s CLM and doing some adjustments to the input type, they could be viable solutions.

(48)

State of the Art

(49)

Chapter 4

Proposed Solution - RANT

Analysing requirements is a complex task that requires a considerable amount of time and resources.

For that reason, it was critical to create a solution that not only helps the reviewers during that process but is also integrated in the requirements management tool already in use.

In this chapter, the challenges faced during this investigation are carefully outlined, alongside the approach taken in order to work on those challenges.

The tool developed for this project is an extension of IBM’s Collaborative Lifecycle Manage-ment (CLM) tool that is used in the RequireManage-ments ManageManage-ment (RM) environManage-ment. A compre-hensive description of this tool regarding its architecture and features can also be found in this chapter.

4.1 Challenges and Approach

The quality of requirements in textual specification documents has a significant impact on the final product [VAD09]. As a matter of fact, various studies have shown that fixing engineering errors in software projects can become exponentially costly over the project life cycle [HDS+04]. Furthermore, it is estimated that more than half of those errors are originated in the require-ments [RR]. Hence, it is imperative that errors are found and corrected during the requirements stage, exactly when they occur and not later in the project’s life cycle.

Defects in the requirements such as ambiguity, lack of readability or inconsistency can lead to delays when launching a product, when that could be avoided by simply finding and correcting issues in the requirements in due time.

The task of finding errors in the requirements has been handed to software engineers and analysts who manually go through the requirements one by one trying to detect possible mis-takes. Though, as human beings, sometimes errors can slip by because of how tedious and time-consuming this task is.

(50)

Proposed Solution - RANT

Considering that the market is as challenging as ever, it is paramount that time is not lost, especially analysing and correcting requirements when software could have been developed and a product launched.

As nearly 90% of the requirements documents are written in Natural Language, the existent static analysis tools are not fit for that purpose [SARW12].

Most of the tools available in the market are not for requirements analysis but rather code syntax checkers, debuggers or static analysis tools designed to find errors in the software and not in the requirements.

Nevertheless, there are some tools, as mentioned in Chapter3, that are suitable for require-ments analysis but either require them to be written in a specific form, are not integrated in the tool that is being used, are not freely available or are not ready for use in a non-academic environ-ment.

Consequently, the need for a partial automation of the requirements review process, together with the lack of applicable solutions, led to the urgency to create a tool that checks the intended boxes.

What makes this work challenging is the Natural Language Processing (NLP), since most requirements are written in Natural Language as it is easy to understand by all the people involved in the project. On the other hand, it can be ambiguous, inaccurate and inconsistent.

Another challenge regarding this project is related to the fact that it needs to be integrated in the current requirements management tool being used by Bosch Car Multimedia: IBM’s Collaborative Lifecycle Management (CLM).

The aim of this investigation is to develop a tool that comes to aid during requirements reviews and is integrated in the software being used. The intention is to partially automate the process of analysing requirements in order to decrease the time spent as well as increasing the number of defects correctly identified in those reviews.

IBM’s CLM tool offers a section dedicated to the requirements elaboration, analysis and re-view processes called Requirements Management (RM). In order to be integrated in this environ-ment and being able to access the requireenviron-ments, an extension was the most reasonable solution.

Being a task that demands the fullest attention to detail, the requirements review shall be made with the help of not only the tool but also a checklist that was elaborated during this research as well.

Even though this tool together with the checklist provide insight on whether or not the require-ments have issues, they are not supposed to be used on their own, but rather as a complement to the analysis of the developer.

Towards partially automating this process, first it was essential to comprehend which inter-national quality criteria should be applied to the requirements. Then, a checklist was created to serve as a guideline for the tool itself. Only at that point did the development of the Requirements ANalysis Tool (RANT) begin as an aid tool that makes use of NLP techniques and is integrated in the CLM application.

(51)

RANT aims at helping the reviewers achieve a defect free requirements document by analysing selected requirements and detecting if they comply with their expected structure as well as possible ambiguous words and expressions present in the sentence and alerts the user to that fact.

4.2 Checklist

There are various guidelines that help when writing good requirements which can either be provided globally, such as international standards, or internally, as an agreement between a com-pany and its clients [SAAA+].

A checklist was created with the main goal of being used together with RANT, in the interest of complementing the review process performed by the analyst. It was elaborated at the beginning of this investigation so it could serve as a guideline to the development of RANT.

In order to gather the needed items for this checklist, a research was done regarding Bosch’s in-ternal quality criteria as well as international quality standards. The structure and the explanations given in the checklist were found in [KP15].

Good requirements should all follow the same set of criteria [Cor]. With that in mind, the checklist that was elaborated not only contains a list of criteria but also a group of simple guide-lines, accompanied with practical examples.

As it can be seen in Fig.4.1, there is a list of criteria that should be fulfilled: • Completeness • Correctness • Clearness • Consistency • Feasibility • Verifiability • Traceability

Each one of these criteria shall be applied when writing and reviewing a requirement. To have a good requirement, it is essential to check if each requirement is valid against each item of this list.

Furthermore, there is a list of items that shall be checked when reviewing the requirements, which helps achieving those quality criteria. An explanation of each quality criteria can be found at the end of the checklist as depicted in Fig.4.2.

(52)

Figure 4.1: Checklist for Requirements Review

(53)

(54)

4.3 Tool Architecture

RANT has a relatively straightforward architecture as it is shown in the architecture diagram in Fig.4.3.

Provided that one of the main goals of this project was to create a tool that is integrated with CLM, RANT has a structure that is based on a standard template for extensions for CLM supported by IBM.

The extensions that are developed for CLM must follow a specific structure, particularly re-garding RANT’s front-end environment - RANT_FE:

• An XML file • A JavaScript file • A CSS file

The XML file is used when adding RANT to the CLM dashboard. It contains all the necessary information to run the tool, namely the JavaScript and CSS files used to build RANT.

The access to the data stored in the CLM tool, specifically in the RM environment, is only possible due to the RM API.

The RM API is a service-centric mechanism for accessing and modifying data in the system and for interacting with mechanisms in the local client, such as showing user interface elements [RM].

Generally, the services are accessed via the RM object and its child objects, all of which are provided as part of the RM feature that is loaded into each extension.

The JavaScript file contains all the functions necessary to perform calls to the RM API. These functions allow access to the requirements and their respective information.

Some examples are the RM.Event.subscribe function, which subscribes RANT to an event and until the extension is closed or unsubscribes from the event, the given callback is invoked whenever the event occurs; another example is the RM.ArtifactAttributes function that associates a set of attribute values with an artefact; other examples along with their description are available in [RM].

Besides that, the requests done to RANT’s back-end structure are also contained on that single JavaScript file.

In order to know when a requirement or a group of requirements is selected, the JavaScript file subscribesthe notifications of the RM API for when that action occurs.

When a requirement is selected, it is possible to gather data about the requirement such as the ID, the Primary Text, who created it, who changed it and when, its priority, its status, among other details.

With all that information, only the relevant data concerning the analysis of the requirement is selected, namely the ID and the Primary Text, which will then be sent to RANT’s back-end structure - RANT_BE.

Automated analysis for process compliance

F

E

U

P

Automated Requirements Analysis

using Natural Language Processing

Mariana Oliveira

Automated Requirements Analysis using Natural

Language Processing

Mariana Oliveira

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Abstract

Resumo

Acknowledgements

Contents

List of Figures

List of Tables

Abbreviations

Chapter 1

Introduction

1.1

Context

1.2

Motivation and Goals

1.3

Document Structure

Chapter 2

Background

2.1

Software Engineering

2.2

Requirements Engineering

2.3

Software Quality Management

2.4

Natural Language Processing

2.5

Conclusion

Chapter 3

State of the Art

3.1

NLP Toolkits and Libraries

3.2

Tools for Requirements Analysis

3.3

Conclusion

Chapter 4

Proposed Solution - RANT

4.1

Challenges and Approach

4.2

Checklist

4.3

Tool Architecture