Model checking requirements written in a controlled natural language

(1)

By

SÉRGIO BARZA

M.Sc. Dissertation

Federal University of Pernambuco [email protected] <www.cin.ufpe.br/~posgraduacao>

RECIFE 2016

(2)

MODEL CHECKING REQUIREMENTS WRITTEN IN A

CONTROLLED NATURAL LANGUAGE

A M.Sc. Dissertation presented to the Center for Informatics of Federal University of Pernambuco in partial fulfillment of the requirements for the degree of Master of Science in Computer Science.

Advisor: Juliano Manabu Iyoda

RECIFE 2016

(3)

Catalogação na fonte

Bibliotecária Monick Raquel Silvestre da S. Portes, CRB4-1217

B296m Barza, Sérgio

Model checking requirements written in a controlled natural language / Sérgio Barza. – 2016.

104 f.: il., fig., tab.

Orientador: Juliano Manabu Iyoda.

Dissertação (Mestrado) – Universidade Federal de Pernambuco. CIn, Ciência da Computação, Recife, 2016.

Inclui referências.

1. Engenharia de software. 2. Métodos formais. 3. Verificação de sistemas. I. Iyoda, Juliano Manabu (orientador). II. Título.

005.1 CDD (23. ed.) UFPE- MEI 2016-098

(4)

Sérgio Barza

Model Checking of Requirements Written in Controlled Natural Language

Dissertação de Mestrado apresentada ao Programa de Pós-Graduação em Ciência da Computação da Universidade Federal de Pernambuco, como requisito parcial para a obtenção do título de Mestre em Ciência da Computação

Aprovado em: 25/02/2016.

BANCA EXAMINADORA

__________________________________________ Prof. Dr. Augusto Cézar Alves Sampaio

Centro de Informática / UFPE

__________________________________________ Prof. Dr. Adalberto Cajueiro de Faria Departamento de Sistemas e Computação / UFCG

__________________________________________ Prof. Dr. Juliano Manabu Iyoda

Centro de Informática / UFPE

(5)

(6)

I really would like to thank God and some important people that gave me support in this work. I have no doubt that without them, this dissertation would not have been concluded, or even started.

First of all, I would like to thank my advisor, Juliano Manabu Iyoda, for all dedication, advices, knowledge, enthusiasm and guidance that he has given to me. Anytime I was needing his help, he was there to guide me and show the best way to solve problems. I am very grateful for having him as my advisor.

I also would like to thank Gustavo Henrique Porto de Carvalho whose contribution was giving us all the necessary materials related to NAT2TEST strategy.

Finally, but not less important, I would like to thank all my family and friends, in special my mother, Belarmina Maria, for their friendship, patience, and assistance during all my life.

(7)

(8)

In the software engineering and process models context, the first activity that must be followed in order to develop a system is the requirements analysis. In this phase, Requirements Engineers identify what are the real needs of their customers through techniques such as in-terviews and questionnaires, brainstorming, etc. One of the output artefacts of this step is the Requirement Document, in which all the features and behaviours that the system should have are written, generally, in natural language, and it provides to developers the information needed to build the software just like the tests elaboration. Once natural languages are likely to have ambiguity, the interpretations given to the requirements can vary depending on who reads them, generating an undesired effect since the errors generated by misinterpretation of requirements can spread and lead to a late discovery and a costly solution. One way to combine the readability of natural language and the elimination of its ambiguity is the adoption of a Controlled Natural Language, CNL. Once a CNL has a concise grammatical structure, it makes possible to extract semantic information, called Case frames, by using some theories. In addition, the semantic information should be used when it is desirable to map from CNL to a formal language. There exist several implemented tools in the academia whose main objective is to generate test cases from software requirements described in CNL. One in particular, called NAT2TEST, does it by generating, as one of its intermediate products, Case Frames.

This work aims to perform the automatic translation from Case Frames to models of the NuSMV model checker. Through model checking, it is feasible to verify if the requirements, which are contained in the Requirements Document and are described in a CNL, satisfy a set of properties written in Temporal Logic. This approach can prevent possible errors in the requirements: once it is detected that the generated model does not satisfy properties to be verified, error propagation is avoided, which reduces software costs. In addition to that, this work also aims to create a grammar for a CNL specification and its automatic translation to CTL Temporal Logic, which is one of the property languages acceptable by the NuSMV. The purpose of this CNL is to allow the user to specify properties to be verified and free the user of some formalisations related to the CTL as well as hide some technical details that are internal of each NuSMV model. In short, system requirements must be written in the CNL acceptable by the NAT2TEST tool in order to generate NuSMV models, and properties to be verified in these models are written according to the CNL developed in this work in order to be translated to CTL properties. Hence, the NuSMV is able to perform model checking using these information. We illustrate the strategies developed in this work through a case study.

Keywords: Controlled Natural Language. Case Fames. Model Checker. Model Checking. NuSMV. Temporal Logic. CTL.

(9)

No contexto de Engenharia de Software e modelos de processos, o primeiro passo que deve ser realizado para o desenvolvimento de um sistema é o levantamento e a análise de requisitos. É nesta fase onde Engenheiros de Requisitos identificam quais as necessidades de seus clientes através de técnicas como entrevistas e questionários, brainstorms, etc. Um dos artefatos de saída desta etapa é o Documento de Requisitos, onde são registradas, geralmente em linguagem natural, todas as funcionalidades que o sistema deve possuir, fornecendo aos desenvolvedores as informações necessárias para a construção do software e para a elaboração de seus testes. Uma vez que as linguagens naturais são passíveis de possuir ambiguidade, as interpretações dadas aos requisitos podem variar de acordo com quem as lê, gerando um efeito indesejado, pois os erros gerados pela má interpretação podem se propagar e levar a uma descoberta tardia e cara. Uma maneira de aliar a legibilidade da linguagem natural e a eliminação da sua ambiguidade é a adoção de uma Linguagem Natural Controlada, LNC. Além disso, por ela possuir uma estrutura gramatical concisa, é possível extrair, utilizando algumas teorias, informações semânticas, denominadas Case Frames, que tornam possível o mapeamento da LNC para uma linguagem formal. Há várias ferramentas implementadas cujo principal objetivo é a geração de casos de teste a partir de requisitos de software descritos em LNC. Uma em partilar, NAT2TEST, realiza essa tarefa através de Case Frames.

Este trabalho tem como objetivo realizar a tradução automática de Case Frames para modelos NuSMV. Utilizando verificação de modelos, é possível verificar se os requisitos, uma vez descritos em LNC, satisfazem um conjunto de propriedades escritas em Lógica Temporal. Essa abordagem pode detectar possíveis erros nos requisitos: uma vez detectado que o modelo gerado não satisfaz alguma(s) proprieade(s), a propagação de erros é evitada, diminuindo os custos de software. Além da geração de modelos, o presente trabalho também tem como objetivo a criação de uma gramática para especificação de uma LNC, bem como a sua tradução automática para a Lógica Temporal CTL, utilizada como uma das linguagens de propriedades de NuSMV. O propósito desta LNC é permitir ao usuário descrever as propriedades do modelo e esconder do mesmo todo formalismo de CTL, bem como alguns detalhes técnicos que são internos a cada modelo. Em suma, os requisitos devem ser escritos de acordo com a CNL de NAT2TEST, onde são traduzidos para modelos NuSMV. As propriedades dos modelos a serem verificadas são descritas de acordo com a CNL desenvolvida neste trabalho, onde são traduzidas para CTL. Deste modo, NuSMV é capaz de realizar model checking utilizando essas traduções. Ilustramos as estratégias desenvolvidas neste trabalho através de um estudo de caso.

Palavras-chave: Linguagem Natural Controlada. Case Frames. Model Checker. Verificação de Modelos. NuSMV. Lógica Temporal. CTL.

(10)

1.1 Proposed solution . . . 17

2.1 The Sys-Req-CNL grammar used to describe System Requirements . . . 21

2.2 Kripke structure representation corresponding to the semaphore example . . . . 26

2.3 Kripke structure representation corresponding to the Single Process Example . 28 2.4 NuSMV Code corresponding to the Kripke structure of Figure 2.3 . . . 28

2.5 Unwound Kripke structure representation where AG φ holds at initial state s0 . 30 2.6 Unwound Kripke structure representation where EG φ holds at initial state s0 . 30 2.7 Unwound Kripke structure representation where AF φ holds at initial state s0 . 31 2.8 Unwound Kripke structure representation where EF φ holds at initial state s0 . 31 2.9 Unwound Kripke structure representation where AX φ holds at initial state s0 . 31 2.10 Unwound Kripke structure representation where EX φ holds at initial state s0 . 32 2.11 Unwound Kripke structure representation where A[φ U ψ] holds at initial state s0 32 2.12 Unwound Kripke structure representation where E[φ U ψ] holds at initial state s0 32 3.1 NAT2TEST and NuSMV Model Generator . . . 34

3.2 Internal representation of TRANSMAP when tables 2.5–2.6 are used as input of Algorithm 1 . . . 49

3.3 Code snippet generated related to ASSIGN NuSMV section when TRANSMAP showed in Figure 3.2 is used as input of Algorithm 2 . . . 52

3.4 Mapping from system variables to NuSMV modules . . . 53

3.5 Code snippet generated related to ASSIGN NuSMV section when TRANSMAP showed in Figure 3.2 is used as input of Algorithm 3 . . . 54

3.6 Entire NuSMV code corresponding to the requirements written in the Semaphore example . . . 57

4.1 Input and output data concerning the CTL Translator . . . 58

4.2 Extended Backus–Naur Form corresponding to the Natural-CTL grammar . . . 59

4.3 Regular expression for recognizing identifiers . . . 60

4.4 Regular expression for recognizing constants . . . 60

4.5 Extended Backus–Naur Form corresponding to the Computation Tree Logic (CTL) grammar . . . 61

5.1 VARVALS configuration . . . 78

5.2 TRANSMAPtuple representation where the key is thesystemmode . . . 82

5.3 TRANSMAPtuple representation where the key is therequesttimer . . . 83

5.4 TRANSMAPtuple representation where the key is thecoffeecounter . . . 84

(11)

5.8 Counter-example generated by the NuSMV model checker . . . 87 5.9 Class Diagram referring to Translator from Natural Language to NuSMV Models 88 5.10 The NuSMV Model Generator . . . 89 5.11 The NuSMV Model Generator when the file chooser is clicked . . . 90 5.12 The NuSMV Model Generator when all initial information are filled . . . 90 5.13 The NuSMV Model Generator when some extra data must be filled by user . . 91 5.14 The NuSMV Model Generator showing the final status . . . 91 5.15 NuSMV model checker execution . . . 92

(12)

2.1 Case Frame set corresponding to the Requirement 1 of Example 1 . . . 23

3.1 The function formatnumber() applied to Table 2.1 . . . 35

3.2 The function fconcatenate() applied to Table 2.5 . . . 36

3.3 The function fexpandtov1() applied to Table 2.7 . . . 37

3.4 Univocal correspondence between indexes and variables retrieved from the Semaphore example . . . 39

3.5 Case Frame set after applying auxiliary functions in Table 2.5 . . . 47

5.7 Case Frame set after applying pre-processing functions in Table 5.1 . . . 70

5.13 Univocal correspondence between indexes and variables retrieved from the Coffee Vending Machine example. . . 74

(13)

(14)

SMV Symbol Model Verifier . . . 27

BDD Binary Decision Diagram . . . 27

LTL Linear-time Temporal Logic . . . 29

CTL Computation Tree Logic . . . 18

CNL Controlled Natural Language . . . 16

CF Case Frame . . . 17

TR Thematic Role . . . 22

gcd greatest common divisor . . . 41

SDT Syntax-Directed Translation . . . 18

IDE Integrated Development Environment . . . 87

GUI Graphical User Interface . . . 87

JavaCC Java Compiler Compiler . . . 88

ACT Action . . . 22

AGT Agent . . . 22

PAT Patient . . . 22

TOV To Value . . . 22

CAC Condition Action . . . 22

CPT Condition Patient . . . 22

CFV Condition From Value . . . 22

CTV Condition To Value . . . 22

CMD Condition Modifier . . . 22

CCM Causal Component Model . . . 93

SCR Software Cost Reduction . . . 17

(15)

1 Introduction 16

1.1 Dissertation Structure . . . 18

2 Background 20 2.1 NAT2TEST: From Controlled Natural Language to Test Cases . . . 20

2.2 The NuSMV Model Checker . . . 25

2.2.1 Describing Models: The NuSMV Description Language . . . 27

2.2.2 Specifying Properties: CTL Specifications . . . 29

3 The NuSMV Model Generator: From Natural Language to NuSMV Models 34 3.1 Defining Pre-processing Functions . . . 35

3.1.1 Formatting Functions . . . 35

3.1.2 Expansive Function . . . 36

3.2 Mapping Variables . . . 37

3.3 Inferring Types of Variables . . . 38

3.3.1 Special Typifications . . . 40

3.3.2 Special Values . . . 45

3.4 Building Transitions . . . 45

3.4.1 Auxiliary Functions to Construct Transitions . . . 45

3.5 NuSMV Code Generation - First Translation . . . 49

3.6 NuSMV Code Generation - Second Translation . . . 50

4 The CTL Translator: From Natural Language to CTL Specifications 58 4.1 The Natural-CTL grammar . . . 59

4.2 Implementation of a Syntax-Directed Translation . . . 61

5 Case Study 65 5.1 Example: The Coffee Vending Machine . . . 65

5.2 Translation: From Case Frames to NuSMV . . . 69

5.2.1 Application of Pre-Processing Functions . . . 69

5.2.2 Retrieve the Variables . . . 72

5.2.3 Typifying the Variables . . . 73

5.2.4 Code Generation . . . 78

5.3 Translation: From Natural Language to CTL Specifications . . . 83

5.4 Implementation . . . 87

(16)

Checking Specifications . . . 93

6.2 A Formal verification with natural language specifications: guidelines, experi-ments and lessons so far . . . 94

6.3 VARED: Verification and Analysis of Requirements and Early Designs . . . 94

6.4 Model Checking Complete Requirements Specifications Using Abstraction . . 95

6.5 Supporting Requirements Validation: the EuRailCheck tool . . . 95

6.6 Model Checking RSML−e Requirements . . . 96

6.7 Feasibility of Model Checking Software equirements: A Case Study . . . 96

6.8 Concluding Remarks . . . 96

7 Conclusion 99 7.1 The NuSMV Model Generator . . . 99

7.2 The CTL Translator . . . 99

7.3 Performing Model Checking . . . 100

7.4 Future Work . . . 100

(17)

1

Introduction

Software requirements are mostly written in a natural language (BERRY, 2008) and good results of software development projects depend on the quality of these requirements, as they serve as inputs to the design, coding and testing phases (ORMANDJIEVA; HUSSAIN; KOSSEIM, 2007). However, one of the biggest problems for specifying system requirements using natural language is its ambiguity, which can cause a critical impact on the quality of the resulting software (BELL; THAYER, 1976). In addition to that, errors found in early software phases are less expensive to be fixed than during subsequent phases (BOEHM; BROWN; LIPOW, 1976).

Formal methods are techniques that use formal notations in the software development. One of the most popular techniques is model checking. Symbolic Model Checkers (BURCH et al., 1992) have been used as a powerful technique to verify the correctness of reactive systems (BIERE et al., 1999), as all states corresponding to these systems are mechanically and exhaustively verified. Formal methods like model checking are free from ambiguities but their formal notations are harder to learn and understand in comparison with natural language.

One solution to get the best of both worlds is to use a Controlled Natural Language (CNL): a subset of a natural language that obeys a formal grammar (FUCHS, 2010). A CNL can describe requirements and be subsequently translated to different formal models. Once in the formal domain, we can run several theorems proving systems or exhaustive analysis techniques such as model checking. In this dissertation, we proposed and implemented a translation of requirements written in the CNL developed by Carvalho et al. (2015) to the NuSMV (CIMATTI et al., 2000) model checker. In order to check properties concerning requirements, we also proposed and implemented a translator to temporal logic formulas from statements written in a restricted subset of English. We chose NuSMV as it is one of the many standard and well known model checker (HUTH; RYAN, 2004).

Requirements must be written in the CNL developed by Carvalho et al. (2015). Properties to de checked by the model checker is written in our CNL. Both are translated to NuSMV automatically. The scope of this work is shown in Figure 1.1 inside the dashed rectangle. Note that an external tool, NAT2TEST, is used to gather semantic.

(18)

and could be used as our starting point to our translation. DFRS reduces the semantic gap existent from a Case Frame (CF) to any formal model, including the NuSMV description language. Such translation remains as future work.

Figure 1.1: Proposed solution is inside the dashed box.

When x is 6.0 or the y is ... NAT2TEST Tool Semantics Information Requirements written in CNL check sentence: for all paths, globally ... CTLSPEC AG (x.past) = 3) & (y.value = ... . . . CTL Translator Properties written in CTL . . . Properties written in CNL . . . ._. . NuSMV Model Translator MODULE main VAR x : m_x (y) y : m_y ... NuSMV Model . . . MODULE main VAR x : m_x (y) ... CTLSPEC AG (x.past) = 3 &_. . . NuSMV File

Source: Made by the author.

Some related works describe how to develop translators for some formal languages. The works reported in Aceituna, Do and Srinivasan (2014) and Cavada et al. (2009) describe translations for NuSMV languages, including description and specification languages, while Holt (1999) and Badger, Throop and Claunch (2014) aim to translate temporal logic formulas from CNLs. The works found in Bharadwaj and Heitmeyer (1999) and Sreemani and Atlee (1996) describe a methodology to translate Software Cost Reduction (SCR) notation to NuSMV. Bharadwaj and Heitmeyer (1999) also describe how to translate to Spin. The work done by Choi and Heimdahl (2002) proposes and implements a tool that translates from an specific formal notation to NuSMV models.

Our work is more complete than previous works in the sense that we provide solutions to translate automatically CNLs to both the model and the property to be verified.

The development of this work has produced the specific contributions listed below. Design and mechanisation of a translation to NuSMV models from the Controlled

Natural Language (CNL) proposed by Carvalho et al. (2015). In particular, the translator deals with constructions of the CNL that are not conventional for a model checker, like references to past states.

Development of algorithms in order to generate NuSMV code from semantic mean-ings.

(19)

A CNL for specifying system properties.

Design and mechanisation of a translation to Computation Tree Logic (CTL) formulas from properties written in our CNL for specifying system properties.

A case study that illustrates all constructions of this work. For that, we introduce the Coffee Vending Machine example whose functionality is to prepare either strong or weak coffee according to the inputs provided by the user. The requirements that specify the behaviour of this machine is our starting point to the NuSMV model generator.

1.1 Dissertation Structure

The remainder of this work is structured in six chapters.

Chapter 2 briefly introduces the NAT2TEST strategy and its goals. In addition to that, it explains some important intermediate elements extracted from this tool that are used as inputs in the model translator. Regarding model checking, we present the NuSMV model checker as well as its description language. Ending the chapter, we introduce the Computation Tree Logic (CTL) temporal logic, which is the one of the specification languages accepted by the NuSMV model checker.

Chapter 3 explains the NuSMV Model Generator, which is the tool responsible to translate from Case Frames (CFs) to models according to the NuSMV language. To achieve this goal, some functions as well as sets are defined aiming to pre-process data, expand information and store values. An important theorem is introduced in order to create types whose variables are considered specials. The code generation is conducted by two algorithms: one generates the code corresponding to the main NuSMV section, while the other generates code for remainder elements.

Chapter 4 presents the Natural-CTL, a grammar developed in this work which enables the user to specify temporal properties in a restricted subset of English. This chapter also describes how the CTL Translator, which is a tool responsible to translate sentences written in the Natural-CTL to CTL formulas, can be built using the Syntax-Directed Translation (SDT) approach. To achieve it, the CTL grammar acceptable by the NuSMV model checker is seen, since it is the target of the CTL Translator. Chapter 5 illustrates our translators in a case study using the Coffee Vending Machine

example.

Chapter 6 presents the main related works concerning to translations to formal languages, specially to NuSMV. For each study, comparisons with the translators

(20)

developed in this work as well as their approaches are made, pointing out differences and limitations among them.

(21)

2

Background

2.1 NAT2TEST: From Controlled Natural Language to Test

Cases

In the academia, there exists an implemented tool called NAT2TEST (NATural language requirements to TEST cases) (CARVALHO et al., 2015) that generates Test Cases from require-ments written in a CNL. Generally speaking, this tool was implemented in four phases: Syntactic Analysis, Semantic Analysis, Formal Representation and Test Cases Generation. The input of NAT2TEST are requirements, which represent a reactive system, written in an appropriated CNL created for this purpose called Sys-Req-CNL, aiming to avoid textual ambiguity and complexity. As discussed in Chapter 1, system requirements are written, generally, in a natural language such as English, for example. With NAT2TEST, the user must simply adopt Sys-Req-CNL when writing requirements.

Concerning the lexicon, Sys-Req-CNL is classified in lexical classes such as determiners, nouns, adjectives, etc. All these lexical entries are stored in a dictionary that is consumed by the NAT2TEST tool. Regarding its grammar, the CNL is considered as a Context Free Grammar. Carvalho et al. (2015) define its production rules in order to describe requirements in Sys-Req-CNL. Figure 2.1 shows the Extended Backus-Naur Form for this grammar (Figure 2.1 does not illustrate all grammar constructors corresponding to the Sys-Req-CNL, since we are interested to have an overview of it. See Carvalho et al. (2015) for the complete and self-contained grammar). We can see that each requirement comprises two main productions: ConditionClause and ActionClause. The ConditionClause is a set of sentences that reflects the current configuration about the system, where each one of them is joined with the and or or conjunctions. On the other hand, we have the ActionClause production, which is followed by the shall clause and the sentences are separated by comma. All remaining sentences tell what are the next system configurations according to the sentences of ConditionClause.

In order to illustrate the Sys-Req-CNL lexicon and grammar, let us present the Semaphore example. A semaphore is a traffic light whose time interval for changing colours from green to red is seven seconds. The yellow light happens always between the green and the red colours.

(22)

Figure 2.1: The Sys-Req-CNL grammar used to describe System Requirements.

Requirement → ConditionalClause COMMA ActionClause; ConditionalClause → CONJ AndCondition;

AndCondition → AndCondition COMMA AND OrCondition | OrCondition; OrCondition → OrCondition OR Condition| Condition;

Condition → NounPhrase VerbPhraseCondition; ActionClause → NounPhrase VerbPhraseAction; NounPhrase → DET? ADJ∗ Noun+;

Noun → NSING | NPLUR;

VerbPhraseCondition → VerbCondition NOT? ComparativeTerm? VerbComplement; VerbCondition → (VPRE3RD | VTOBE PRE3 | VTOBE PRE | VTOBE PAST3 | VTOBE PAST); ComparativeTerm → (COMP (OR NOT? COMP)?);

VerbPhraseAction → SHALL (VerbAction VerbComplement |

COLON VerbAction VerbComplement (COMMA VerbAction VerbComplement)+); VerbAction → VBASE;

VerbComplement → VariableState? PrepositionalPhrase∗ ; VariableState → (NounPhrase | ADV | ADJ | NUMBER); PrepositionalPhrase → PREP VariableState;

Based on the current colour, the requirements specify if a car must go, pay attention, or stop as we can see stated below.

Example 1. The Semaphore

• Requirement 1: When the counter is 6.0 or the counter is lower than 3.0, the system shall assign green to the semaphore.

• Requirement 2: When the counter is greater than or equal to 3.0, and the counter is lower than 5.0, the system shall assign yellow to the semaphore.

• Requirement 3: When the counter is 5.0, the system shall assign red to the semaphore. • Requirement 4: When the semaphore is green or the semaphore is red, the system shall assign go to the car.

• Requirement 5: When the semaphore changes from green to yellow, the system shall assign pay attention to the car.

• Requirement 6: When semaphore changes from yellow to yellow, the system shall assign stop to the car.

• Requirement 7: When the counter is 6.0, the system shall reset the counter.

• Requirement 8: When the counter is greater than or equal to 0.0, the system shall add 1.0 to the counter.

(23)

Once all the system requirements are written according to the Sys-Req-CNL grammar, the NAT2TEST tool is able to execute its first phase, i.e., the Syntax Analysis. For this, a CNL-parser was implemented to verify whether there are ambiguities and/or syntax errors. In addition, this parser generates, as its output, syntax trees that are used as input in the subsequent phase of NAT2TEST.

Based on Case Grammar linguistic theory, the Semantic Analysis utilizes the syntax trees previously generated in order to represent semantics information using a structure with slots to be filled in by sentence elements. These structures are called Case Frames (CFs) and the slots are Thematic Roles (TRs). For each verb in a sentence, there are specific TRs that together compound a CF. In the NAT2TEST strategy, there are seven verbs that are used to describe requirements and they are classified in two groups:

• Verbs used only in action statements: to add, to assign, to reset and to subtract. To this group, there are four TRs named Action (ACT), Agent (AGT), Patient (PAT) and To Value (TOV). Note that the verb to reset is only used in integer variables, since its meaning is to assign the value zero to a variable.

• Verbs used only in sentences associated to conditions: to be, to become, and to change. To this group, there are five TRs named Condition Action (CAC), Condition Patient (CPT), Condition From Value (CFV), Condition To Value (CTV) and Condition Modifier (CMD).

Observation. Not all TRs are required. It depends on the verb used in the statement. Carvalho et al. (2014) define each Thematic Role (TR) as follows:

• Action statements TRs:

– Action (ACT): the action performed if the conditions are satisfied. Required for all verbs. – Agent (AGT): entity who performs the action. Required for all verbs.

– Patient (PAT): entity who is affected by the action; Required for all verbs.

– To Value (TOV): the Patient value after action completion. Required for all verbs, except for the verb to reset, in which it is not used.

• Condition clause TRs:

– Condition Action (CAC): the action that concerns each condition. Required for all verbs. – Condition Patient (CPT): the entity related to each condition. Required for all verbs.

– Condition From Value (CFV): the CPT previous value. Not used for the verbs to be and to become. Optional for the verb to change.

– Condition To Value (CTV): the value satisfying the condition; Required for all verbs. – Condition Modifier (CMD): a modifier related to the condition. Optional for all verbs.

In this work, the translation from requirements to models of NuSMV is achieved using the Case Frame (CF) as input. This approach saves us from knowing the syntax of Sys-Req-CNL,

(24)

since the TRs do not change their categories. Furthermore, if the grammar rules change, the translation to NuSMV will continue to be correct, since the translator does not handle syntactic elements, but with semantic interpretation through CFs and TRs, consequently. Tables 2.1–2.8 exhibit the CFs and their TRs for each requirement of Example 1.

Table 2.1: Case Frame set corresponding to the Requirement 1 of Example 1 Condition #1 - Main Verb (CAC): is

CPT: the counter CFV:

-CMD: lower than CTV: 3.0

OR - Main Verb (CAC): is

-CMD: - CTV: 6.0

Action - Main Verb (ACT): assign

AGT: the system TOV: green

PAT: the semaphore

Table 2.1 shows the output of the NAT2TEST Semantic Analysis for the first Requirement. We can see that the verbs, which are placed in the CAC and ACT TRs, are is and assign, respectively. Note that the system acts on the semaphore, assigning the value green to it.

An important reflection about TRs concerning to action statements is that the values to be assigned, located in TOV, to the PAT will only take effect in the next state. It means that the CFs related to conditions describe the scenario of the current state, whereas CFs related to action describe what is the value of PAT in the next state.

-CMD: lower than CTV: 5.0

Condition #2 - Main Verb (CAC): is

-CMD: greater than or equal to CTV: 3.0 Action - Main Verb (ACT): assign

AGT: the system TOV: yellow

PAT: the semaphore

-CMD: CTV: 5.0

AGT: the system TOV: red

(25)

Table 2.2 shows the output of the NAT2TEST Semantic Analysis for the second Require-ment. As we can see, the verbs are the same of the previous table.

Table 2.3 shows that the main verbs of the third Requirement are is and assign, which are related to the semaphore and the system. The latter is regarded as the agent, while the former is the patient.

CPT: the semaphore CFV:

-CMD: - CTV: red

CPT: the semaphore CFV:

-CMD: - CTV: green

AGT: the system TOV: go

PAT: the car

Table 2.4 shows that the main verbs of the fourth Requirement are is and assign. The same observation seen at Table 2.3 is applicable here, except that, this time, the car suffers the action of the system.

Table 2.5: Case Frame set corresponding to the Requirement 5 of Example 1 Condition #1 - Main Verb (CAC): changes

CPT: the semaphore CFV: green

CMD: - CTV: yellow

AGT: the system TOV: pay attention

PAT: the car

Table 2.5 shows the output of the NAT2TEST Semantic Analysis for the fifth Require-ment. See that the verb changes is used to describe a condition. In this case, CFV is optional: once it appears, it informs what is the previous value of the semaphore (in this case is green). If CFV was not informed, the only information about the previous value would be that it is different from the current, i.e., the previous value of the semaphore is different of yellow.

Table 2.6: Case Frame set corresponding to the Requirement 6 of Example 1 Condition #1 - Main Verb (CAC): changes

CPT: the semaphore CFV: yellow

CMD: - CTV: yellow

AGT: the system TOV: stop

(26)

Table 2.6 shows the output of the NAT2TEST Semantic Analysis for the sixth Require-ment. The same observation seen at Table 2.5 is applicable here, because the verbs are the same. An interesting fact is that in the first CF, both CFV and CTV have the same value. It is because there is no verb in Sys-Req-CNL that expresses the fact that a value remains unchanged. This way, the verb changes was used with this purpose.

CMD: - CTV: 6.0

Action - Main Verb (ACT): reset

AGT: the system TOV:

-PAT: the counter

Table 2.7 shows the output of the NAT2TEST Semantic Analysis for the sixth Require-ment. Note that TOV is not used in the CF related to action statements, since the verb reset is utilized. Actually, we can think that this verb is equivalent to the verb assign with TOV set to 0.0. Chapter 3 implements this idea in order to help in the variable typing.

-CMD: greater than or equal to CTV: 0.0 Action - Main Verb (ACT): add

AGT: the system TOV: 1.0

PAT: the counter

Table 2.8 shows the output of the NAT2TEST Semantic Analysis for the last Requirement. In the ACT is the verb add, which informs that the AGT, the system, must add 1.0 the the current value of PAT and assigns this result to the PAT, too, in this case, the counter.

We end this section remembering that NAT2TEST has two more phases, but its contri-bution in this work is limited to its first two phases, since the main information needed is the Case Frames (CFs) generated in the Semantic Analysis, allowing the use of model checking techniques in requirements, whereas NAT2TEST is intended to generate test cases.

2.2 The NuSMV Model Checker

The nature of the system requirements studied in this work is characterized by the interaction with an environment and its computation that often does not terminate (MANNA; PNUELI, 2012). Thus, it arises the need to represent the behaviour of these systems. A way to do it with mathematical rigour is through Kripke structures. We can visualize them as being a

(27)

graph representing all the reachable nodes, called states in this context. Besides states, there is a transition relation which rules the movement between states. An important observation is that this relation must be total, i.e., there must always exist at least one state that can be reached from another state (loops are permitted). Each state contains a set of atomic formulas that are true in that specific state. The formal definition of Kripke structures found in Clarke, Grumberg and Peled (1999) is stated below:

Definition 2.2.1. Let AP be a set of atomic propositions. A Kripke structure over AP is a 4-tuple M= (S, S₀, R, L) where:

• S is a finite set of states;

• S0⊆ S is the set of initial states;

• R ⊆ S x S is a transition relation that must be total, i.e., for every state s ∈ S there is a state s0∈ S such that R(s, s0);

• L : S −→ 2AP is a function that labels each state with the set of atomic propositions true in that state. The notation 2AP denotes the power set of AP.

The Kripke structure representation corresponding to the requirements in Example 1 is illustrated in Figure 2.2. For each state, the information shown in a top-down order is: the value of the counter, the semaphore colour and the status of the car.

Figure 2.2: Kripke structure representation corresponding to the Semaphore Example. 1 go green 0 go green 2 go green 3 go green 6 stop red 5 pay attention yellow 4 go yellow s0 s1 s2 s3 s4 s5 s6

The system variables are the counter, the semaphore and the car, and the possible values of the counter range from 0 to 6, and the possible values of the semaphore are green, yellow and red, and the possible values of the car are go, pay attention and stop. We can infer that there are seven atomic formulas concerning the counter, three atomic formulas concerning the semaphore and three atomic formulas concerning the car. With this, |AP| = 13, where |AP| denotes the cardinality of the set AP. It means that |2AP| = 213_{. However, we do not need to be worried}

about all these subsets, since the only possible transitions between states are s₀→ s₁, s₁→ s₂, s2→ s3, s3→ s4, s4→ s5, s5→ s6 and s6→ s0. Finally, L gives us the information of which

assignments are true in each state. Structuring the information, M = (S, S0, R, L) such that:

• S = {s0, s1, s2, s3, s4, s5, s6};

• S0= s0;

(28)

• L(s0) = {the counter = 0, the car = go, the semaphore = green};

• L(s4) = {the counter = 4, the car = go, the semaphore = yellow};

• L(s5) = {the counter = 5, the car = pay attention, the semaphore = yellow};

• L(s6) = {the counter = 6, the car = stop, the semaphore = red}.

The use of model checkers is desirable when it is needed to check whether a model with a finite number of states that represents a reactive system satisfies a set of specifications described in a formal language, and it must be achieved automatically and exhaustively. The termination is guaranteed due to the finiteness of the model (CIMATTI et al., 2000). Basically, the use of model checking techniques must follow three tasks which are: modelling, specification and verification. The verification process is automatic, i.e., once the user describes the model and specifications in a formal language, the tool utilizes techniques to verify if the specifications are satisfied by the model.

One of the available model checkers is NuSMV1- New Symbolic Model Verifier - that is a reimplementation/extension of the Symbol Model Verifier (SMV), which is the first model checker tool based on Binary Decision Diagrams (BDDs). Currently, NuSMV uses a mix of BDD-based and SAT-based model checking in order to implement its algorithms. Furthermore, it is open source and aims to be used industrially as being one of the existing tools that gives support in the formal verification approaches.

The subsections below describe how to model reactive systems through the NuSMV description language, as well as one of the specification (property) input languages of NuSMV: the CTL.

2.2.1 Describing Models: The NuSMV Description Language

Once states can be seen as a set of atomic formulas that are true in a given moment, it is important to have a way to declare variables that store this information. In this way, NuSMV allows us to declare variables along with their types whose main classes are boolean, integer and enumeration. Besides variables, it supports different expressions like constant, arithmetic, logic, set operations and conditional (if-then-else or case).

In order to declare transitions between states, some syntactic elements are provided by this language, but two of them are enough to achieve the purpose of this work: init and next. The first one is used to build the set of initial states. If the init constraint is not utilized in a variable, its initial state is chosen non-deterministically. The second keyword defines the transition relation between states, except the initial ones. We can combine these assignments with conditionals

(29)

such as case, for example.

Observation. In NuSMV, each variable may be assigned only once (once for init and once for nextconstraints) and they cannot possess circular dependency.

The entire NuSMV program is defined by one or more modules and there must be one called main. Modules provide reusability of components and a modular and hierarchical description to the user. Each module can be instantiated several times and be assigned to a variable. The access of a variable declared in a different module is made by using a dot separator between the module and the required variable.

Figure 2.3: Kripke structure representation corresponding to the Single Process Example.

request ready request busy ¬request busy ¬request ready

Figure 2.4: NuSMV Code corresponding to the Kripke structure of Figure 2.3. MODULE main

VAR

request : boolean;

status : {ready, busy}; ASSIGN

init(status) := ready; next(status) :=

case

request : busy;

TRUE : {ready, busy}; esac;

Let us illustrate the language by an example called the Single Process, which refers to a process whose status can be ready or busy and its next states depend on a variable called request. Figure 2.3 illustrates the Kripke structure for this example.

The NuSMV code corresponding to Kripke structure of Figure 2.3 is shown in Figure 2.4. Let us state some important notes about this code. The first thing is that the initial state of request

(30)

is not declared. It means that, for the initial state, its value is non-deterministic. Another thing is about the case/esac block, where the next value of status depends on the current value of request: if the current value of request is true, then the next value of status must be busy. Otherwise, indicated by TRUE, the next value of status must be non-deterministically defined among ready and busy. All the NuSMV syntax and its grammar rules have been defined by Cavada et al. (2005).

2.2.2 Specifying Properties: CTL Specifications

Computation Tree Logic (CTL) belongs to the family of Temporal Logic and is one of the formal input languages to express specifications (properties) in NuSMV (the other one is the Linear-time Temporal Logic (LTL), not covered in this work). For CTL, time is seen as a tree-like structure, starting at the initial state, where the future is not defined. Huth and Ryan (2004) define its syntax inductively, as shown below.

Definition 2.2.2. The Backus Naur form to define CTL formulas is:

φ ::= ⊥ | > | p | (¬φ ) | (φ ∧ φ ) | (φ ∨ φ ) | (φ → φ ) | AX φ | EX φ | AF φ | EF φ | AG φ | EG φ | A[φ U φ ] | E[φ U φ ]

where p ranges over a set of atomic formulas.

The next step after defining the syntax is to interpret the CTL formulas, that is, its semantics. It can be interpreted as the following problem: given a model M and a CTL formula φ , verify if M, s φ , where s ∈ S is a state. The formal definition (HUTH; RYAN, 2004) is shown below (the symbol → is overloaded, i.e., it may represent either the implication or the transition relation between states):

Definition 2.2.3. The satisfaction relation_{between a pair consisting of a model structure M} and a state s ∈ S, and a CTL formula is inductively defined as follows.

• M, s_{> and M, s 2 ⊥.}

• M, s p iff p ∈ L(s) for an atomic proposition p ∈ AP. • M, s (¬φ ) iff M, s 2 (φ ).

• M, s_{(φ ∧ ψ) iff M, s φ and M, s ψ.} • M, s_{(φ ∨ ψ) iff M, s φ or M, s ψ.} • M, s_{(φ → ψ) iff M, s 2 φ or M, s ψ.}

• M, s_{AXφ iff for all s}1such that s → s1we have M, s1 φ .

• M, s_{EXφ iff for some s}1such that s → s1we have M, s1 φ .

• M, s_{AGφ iff for all paths s}1→ s2→ s3→ ..., where s1equals s, and for all sialong the path,

we have M, si φ .

• M, s EGφ iff there is a path s1→ s2→ s3→ ..., where s1equals s, and for all sialong the

path, we have M, si φ .

(31)

M, si φ .

• M, s_{EFφ iff there is a path s}1→ s2→ s3→ ..., where s1equals s, and for some sialong

the path, we have M, si φ .

• M, s_{A[φ U ψ] iff for all paths s}1→ s2→ s3→ ..., where s1 equals to s, there is some si

along the path, such that M, si ψ, and, for each j < i, we have M, si φ .

• M, s_{E[φ U ψ] iff there is a path s}1→ s2→ s3→ ..., where s1equals to s, and there is some

s_ialong the path, such that M, si ψ, and, for each j < i, we have M, si φ .

To better understand the satisfaction relation in CTL formulas where the temporal operators AG, EG, AF, EF, AX, EX, AU and EU appear, let us depict unfolded Kripke structures and show the formula that each one holds. Unfolding a Kripke structure is the same of building a tree-like structure that never ends, starting at the initial state.

Figure 2.5: Unwound Kripke structure where AGφ holds from the initial state s0. Note it is true globally, including in the initial state.

. . . . . . ... ... s ⊨ M AG 0 s0 , ϕ ϕ ϕ ϕ ϕ ϕ ϕ ϕ

Figure 2.6: Unwound Kripke structure where EGφ holds from the initial state s0. Note that φ does not need to be true in all computation paths, but there must be at least one

path that it must hold.

. . . . . . ... ... s ⊨ M EG 0 s0 , ϕ ϕ ϕ ϕ ϕ

(32)

Figure 2.7: Unwound Kripke structure where AFφ holds from the initial state s0. Note that, for all computation paths, there must be a state where φ is true. It means that it is

always possible, starting at s0, to get a state, in the future, where φ holds.

. . . . . . ... ... s ⊨ M AF 0 s0 , ϕ ϕ ϕ ϕ

Figure 2.8: Unwound Kripke structure where EFφ holds from the initial state s0. Note that there must be at least one path, starting at the initial state, where φ is true.

. . . ... ... ... s ⊨ M EF 0 s0 , ϕ ϕ

Figure 2.9: Unwound Kripke structure where AXφ holds from the initial state s0. Note that, for all computation paths in this structure, starting at s0, φ holds in the next states of

s0. . . . ... ... ... s ⊨ M AX 0 s0 , ϕ ϕ ϕ

(33)

Figure 2.10: Unwound Kripke structure where EXφ holds from the initial state s0. Note that, there must be at least one computation path in this structure, starting at s0, where φ

holds in one of the next states of s0.

. . . ... ... ... s ⊨ M EX 0 s0 , ϕ ϕ

Figure 2.11: Unwound Kripke structure representation where A[φ U ψ] holds at initial state s0. Note that this formula holds when for all computation paths, starting at s0, it is

the case that φ holds continuously until ψ holds.

. . . ... ... ... s ⊨ M A[ 0 s0 , ϕ ϕ ψ ψ U ] ϕ ψ ψ

Figure 2.12: Unwound Kripke structure representation where E[φ U ψ] holds at initial state s0. Note that this formula holds when, for some computation path starting at s0, it is

the case that φ holds continuously until ψ holds.

. . . ... ... ... s ⊨ M E[ 0 s0 , ϕ ϕ ψ ψ U ]

(34)

The knowledge of the NuSMV description and specification languages is important to understand how the translator from Case Frame (CF) to models is achieved as well as the translator from CNL to CTL specifications. This chapter does not cover all the syntax, grammar rules and semantic descriptions of the NuSMV model checker, but all the elements needed to build the translator were discussed here.

(35)

3

The NuSMV Model Generator: From Natural Language to NuSMV Models

As discussed in Chapter 2, the NAT2TEST strategy captures the semantics of require-ments written in CNL through syntax trees generated in its first phase, called Syntax Analysis, and through CFs generated in its second phase, called semantic analysis. Once all semantic meanings are caught in the second phase, we utilize exclusively this information in order to generate NuSMV models automatically, i.e., the subsequent phases are not used in the translator. Figure 3.1 illustrates this scenario.

Figure 3.1: The output of the semantic analysis phase is used as input of the NuSMV model generator. System Requirements Syntax Trees Case Frames Syntax Analysis Semantic Analysis

.

_.

NuSMV Model

NuSMV Model Generator

NAT2TEST strategy

In this chapter, we discuss in detail how we devised auxiliary translation functions, how we implemented some important transformation functions, what were our choices regarding data structures and what algorithms were needed to generate code from CFs to the NuSMV description language. Whenever it is necessary to exemplify and illustrate some applications of functions or algorithms, we use the Semaphore example shown in Chapter 2.

(36)

Although the Sys-Req-CNL allows us to describe requirements that enable us to use global clocks aiming to simulate the real time, the current version of the translator does not support this approach, yet. In this way, whenever variables related to timer are seen in this work, they must be treated as simple variables instead of global clocks. The manipulation and the translation of models that deal to real time, like global clocks, remain as future work.

3.1 Defining Pre-processing Functions

When CFs are used to generate a NuSMV model, we are not able to use the raw data generated by the output of the second phase of NAT2TEST, either because they do not possess the correct formatting or because some verbs used to describe the requirements omit some important information. Because of that, this section defines some functions necessary by pre-processing some TRs according to their particularities.

3.1.1 Formatting Functions

The first function to be declared, formatnumber(), is responsible to format numerical data. As NuSMV does not accept floating point numbers, the task of this function is to recognize which data are numerical through CFs and parse them to the integer format. By arbitrary convention, we transform a number to the smallest integer not less than it, i.e., the application of floorfunction denoted by b c. Let us define this function formally:

Definition 3.1.1. Let formatnumber() be the function that formats numerical data. So,

formatnumber(x) = (

bxc, if x is a number x, otherwise

Table 3.1: The function formatnumber() applied to Table 2.1 Condition #1 - Main Verb (CAC): is

-CMD: lower than CTV: 3

-CMD: - CTV: 6

AGT: the system TOV: green

PAT: the semaphore

As project decision, all real data need to be converted to integer data, since the NuSMV does not allow operations with variables whose type is real. However, there are some classes of problems where this approach fails because the decimal values matter in the verification. In

(37)

this way, the current version of the NuSMV Model Generator only handles problems where the decimal information does not preclude in the resultant model.

The application of this function can be seen in Table 3.1. The TRs highlighted in bold are those whose values were changed. See that TOV has not changed, since its value is not numerical.

Another data formatting function is about concatenating values from a set of compound nouns that we name fconcatenate(). This function basically removes the blank space from compound names of variables and values, such as “the semaphore”. It is not necessary to define this function formally, since it is a well known function in mathematics and computer science fields. However, we declare it for referencing purposes in this work. So:

Definition 3.1.2. Let fconcatenate() be the function that concatenates values from a set of compound nouns, where its behaviour is to concatenate strings separated by blank spaces.

Table 3.2: The function fconcatenate() applied to Table 2.5 Condition #1 - Main Verb (CAC): changes

CPT: thesemaphore CFV: green

CMD: - CTV: yellow

AGT: the system TOV: payattention

PAT: thecar

Table 3.2 shows the application of fconcatenate() to Table 2.5, resulting in the concatena-tion of the values highlighted in bold. Note that it is not necessary to apply this funcconcatena-tion in AGTs, since there is no needed to map them into NuSMV elements, i.e., for the NuSMV description language, it does not matter who performs actions.

3.1.2 Expansive Function

Some verbs used to describe either actions or conditionals statements omit information about some TRs. We need these data to be explicit, since TRs are our only source of data to build models. Due to this, we are going to define a function that generates values for TOV. Other functions that expand TRs are described in Section 3.4, which focuses on how transitions are built in NuSMV models.

Definition 3.1.3. Let fexpandtov1() be the function responsible to add information to TOV. When the verb “to reset” is found in a CF, the number 0 is returned. Otherwise the value remains

unchanged.

fexpandtov1(verb, val) = (

0, if verb = “to reset” val, otherwise

(38)

Table 3.3: The function fexpandtov1() applied to Table 2.7 Condition #1 - Main Verb (CAC): is

CMD: - CTV: 6.0

Action - Main Verb (ACT): reset

AGT: the system TOV: 0

PAT: the counter

The function fexpandtov1() takes as input the ACT verb and the TOV value and returns the new TOV value. Table 3.3 illustrates the application of fexpandtov1() to Table 2.7. Note that the value 0 is filled in TOV, which is highlighted in bold. This information is useful to find out the types of the variables as well as to build transitions.

3.2 Mapping Variables

After defining the pre-processing functions, let us start the task of mapping elements retrieved from CFs in order to define the NuSMV variables. Firstly, let us define sets that give support in the construction of the set of variables.

Observation. We assume that when tables 2.1–2.8 are referred, the pre-processing functions defined in Section 3.1 have already been applied to them.

Definition 3.2.1. Let CPTSET be the set of all Condition Patients (CPTs) associated with conditions, i.e., elements that reflect the state of a specific execution.

Let us extract the CPTSET from Table 2.1 and Table 2.4; CPTSET1= {thecounter} and

CPTSET4= {thesemaphore}. The CPTSET is defined as the union of all CPTSETi extracted

from all CFs set. In this example, we have CPTSET = CPTSET1∪ CPTSET4=

{thecounter, thesemaphore}.

Definition 3.2.2. Let PATSET be a set of all Patients (PATs) that are affected by any action. To illustrate this set, let us extract the PATSET from Table 2.2 and Table 2.5; PATSET2=

{thesemaphore} and PATSET5= {thecar}. The PATSET is defined as the union of all PATSETi

extracted from all CFs set. Therefore, we have PATSET = PATSET2∪ PATSET5=

{thesemaphore, thecar}.

Definition 3.2.3. Let VARSET be the set that represents the set of all variables that must be retrieved from all system requirements. VARSET is defined as the union of CPTSET and PATSET. Formally:

VARSET= CPTSET ∪ PATSET

Concerning the CPTSET and PATSET illustrated above, we have VARSET =

(39)

3.3 Inferring Types of Variables

When NuSMV was presented in Chapter 2, we saw that all variables must be declared along with their types. These types cannot be generic, since a model checker expands all values that the variables might have. Our translator must be capable to perform a full scan in all CFs, aiming to retrieve and store values associated with all variables.

One of our goal in this work is to take advantages of the techniques for reducing the state explosion problem. One of them is called data abstraction (CLARKE; GRUMBERG; PELED, 1999). Using this technique we reduce the domain of each type, which makes our analysis

feasible, since dealing with some data types like Integer, for example, makes intractable analysis in a symbolic model checker like NuSMV.

There is a clear distinction between types and data abstraction of variables. The former derives a set of values that each variable may assume while the latter represents a subset which avoids dealing with infinite. In this dissertation type inferrence and data abstraction are presented as a single procedure.

As we need to specify all values related to variables, for the purpose of defining their types later, we have to choose a data structure to store pairs such that the first element is the key and the second element is the set of values associated to this key. Therefore a map was chosen as data structure, once we find it appropriate to achieve this task. Its definition is given in Definition 3.3.5. Since variables in NuSMV are unique, they assume the role of the keys. Let us declare some definitions of sets that help us to populate this map.

Definition 3.3.1. Let CFVSETvar be the set of all Condition From Value (CFV) values of the

variable var such that var ∈ VARSET. Note that this definition produces a family of sets whose size is the cardinality of VARSET, denoted by |VARSET|, i.e., for each var belonging to VARSET, there is a set called CFVSETvar associated to it.

Definition 3.3.2. Let CTVSETvar be the set of all Condition To Value (CTV) values of the

variable var such that var ∈ VARSET. This definition also generates a family of sets whose size is equal to |VARSET|, i.e., for each element var belonging to VARSET, there is a set called CTVSET_varassociated to it.

Definition 3.3.3. Let TOVSETvar be the set of all TOV values that shall be assigned to the

variable var such that var ∈ VARSET. So, the TOVSETvarcontains all TOVs associated to var,

except when the verbs “to add” or “to subtract” are used to express actions about this specific variable. This means that we cannot include in TOVSETvarthe values of TOV when these verbs

appear. The definition of TOVSET also produces a family of sets whose cardinality is |VARSET|, i.e., for each element var belonging to VARSET, there is a set called TOVSETvarassociated to it.

Definition 3.3.4. Let VALUESETvarbe the set of all values related to a specific variable var such

(40)

and TOVSETvar. Formally:

VALUESETvar= CFVSETvar∪ CTVSETvar∪ TOVSETvar

Like the others, this definition also produces a family of sets whose size is equal to |VARSET|. Once all possible TRs that could provide values of variables have been computed, we can define the set VALUESETvaras being the type of var.

Observation. If the elements of VALUESETvarmatch the string true or false, these values are

replaced by the set {TRUE, FALSE}, since they are the boolean type in the NuSMV language. Definition 3.3.5. Let VARVALSMAP be a map data structure responsible to store variables and their types. The task of this data structure is to map each var ∈ VARSET, to the set of values VALUESET_var. Note that var is the key of the map, since each variable is unique.

Let us illustrate these concepts by retrieving CFVSET, CTVSET and TOVSET of each variable captured from tables 2.1–2.8 to then obtain their VALUESET, i.e., the type of each variable. According to Definition 3.2.3, VARSET = {thecounter, thesemaphore, thecar} for these tables. Let us adopt the following convention: a set Ai, j refers to the values of a variable i

(see Table 3.4) inferred from a Table J.

Table 3.4: Univocal correspondence between indexes and variables retrieved from the Semaphore example Index Variable 1 thecounter 2 thesemaphore 3 thecar thecounter:

CFVSET_{1, 2.1}= CFVSET_{1, 2.2}= CFVSET_{1, 2.3}= CFVSET_{1, 2.4}= CFVSET_{1, 2.5} = CFVSET1, 2.6= CFVSET1, 2.7= CFVSET1, 2.8= /0;

CTVSET_{1, 2.1}= {3, 6}; CTVSET_{1, 2.2}= {5, 3}; CTVSET_{1, 2.3}= {5}; CTVSET1, 2.4= CTVSET1, 2.5= CTVSET1, 2.6= /0; CTVSET1, 2.7= {6};

CTVSET1, 2.8= {0};

TOVSET1, 2.1= TOVSET1, 2.2= TOVSET1, 2.3= TOVSET1, 2.4= TOVSET1, 2.5

= TOVSET_{1, 2.6}= /0; CFVSET1, 2.7= {0}; CFVSET1, 2.8= /0;

thesemaphore:

CFVSET_{2, 2.1}= CFVSET_{2, 2.2}= CFVSET_{2, 2.3}= CFVSET_{2, 2.4}= /0; CFVSET2, 2.5= {green}; CFVSET2, 2.6= {yellow}; CFVSET2, 2.7

(41)

CTVSET_{2, 2.1}= CTVSET_{2, 2.2}= CTVSET_{2, 2.3}= /0; CTVSET_{2, 2.4}= {red, green}; CTVSET_{2, 2.5}= {yellow}; CTVSET2, 2.6= {yellow}; CTVSET2, 2.7

= CTVSET2, 2.8= /0;

TOVSET_{2, 2.1}= {green}; TOVSET_{2, 2.2}= {yellow}; TOVSET_{2, 2.3}= {red}; TOVSET_{2, 2.4}= TOVSET2, 2.5= TOVSET2, 2.6= CFVSET2, 2.7

= CFVSET2, 2.8= /0;

thecar:

CFVSET_{3, 2.1}= CFVSET_{3, 2.2}= CFVSET_{3, 2.3}= CFVSET_{3, 2.4}= CFVSET_{3, 2.5} = CFVSET3, 2.6= CFVSET3, 2.7= CFVSET3, 2.8= /0;

CTVSET_{3, 2.1}= CTVSET_{3, 2.2}= CTVSET_{3, 2.3}= CTVSET_{3, 2.4}= CTVSET_{3, 2.5} = CTVSET3, 2.6= CTVSET3, 2.7= CTVSET3, 2.8= /0;

TOVSET_{3, 2.1}= TOVSET_{3, 2.2}= TOVSET_{3, 2.3}= /0; TOVSET_{3, 2.4}= {go}; TOVSET3, 2.5= {payattention}; TOVSET3, 2.6= {stop}; CFVSET3, 2.7

= CFVSET3, 2.8= /0;

After all these sets have been stated, we can infer the type of these variables. In this way, we have: VALUESET_thecounter= ( 8 [ i=1 CFVSET1, 2.i) ∪ ( 8 [ i=1 CTVSET1, 2.i) ∪ ( 8 [ i=1 TOVSET1, 2.i) = {3, 6, 5, 0}; VALUESET_thesemaphore= ( 8 [ i=1 CFVSET2, 2.i) ∪ ( 8 [ i=1 CTVSET2, 2.i) ∪ ( 8 [ i=1 TOVSET2, 2.i)

= {green, yellow, red};

VALUESET_thecar= ( 8 [ i=1 CFVSET3, 2.i) ∪ ( 8 [ i=1 CTVSET3, 2.i) ∪ ( 8 [ i=1 TOVSET3, 2.i)

= {go, payattention, stop};

See that even though TOV in Table 2.8 shows the variable thecounter filled with 1.0, it was not used as elements of TOVSET3, 2.8, since the verb associated with this TR is “add”, which

has been tagged as special and is treated differently in the task of finding types. This special treatment is described in the following section.

3.3.1 Special Typifications

System requirements specification written in Controlled Natural Language (CNL) are restricted to use a small number of verbs. They can be classified as verbs used in action statements or verbs used in requirement conditions, as it was seen in Section 2.1.

(42)

However, the verbs “to add” and “to subtract” are different in the context of typifying: all the possible values of the variables that are operands of an addition and subtraction are not explicit in the requirements. For example, in Table 2.8, the system has to add 1 to the counter, i.e., thecounter = thecounter + 1. Clearly, the number one does not make part of its domain, unless thecounter has value zero at some point. Although the CNL allows the addition and subtraction of a variable to another variable, in this work we assume that addition and subtraction are restricted to a variable and a constant. Let us suppose that in one moment, it is necessary to add a value to thecounter and it happens again in another state and, in the future, it must be necessary to subtract another value from thecounter. It can happen many times and the interchanges between additions and subtractions seems, at first, unpredictable. There should be a way to deal with all these possibilities of values to these kinds of variables.

To be able to achieve this task, it is necessary to generate a set of integer numbers that can be used as the type of the variables where this condition applies. As the minimum and maximum values of this range are not known, it is essential that the user provides this information when the translator starts to run. The trivial solution for building this set is to generate consecutive integer numbers from the minimum value until the maximum value to be reached, both provided by the user. However, this set is not the best solution for this kind of problem, since there might be values that do not belong to the possible assignable values, thus increasing the number of states that the model checker needs to analyse. A solution must be created where, in fact, all elements are real possible values for a given variable, provided that the user gives as input concrete values of the variable and its minimum and maximum values.

Theorem 3.3.1. Let v ∈ VALUESETvar, where var is an operand of an addition or a subtraction

(or both). Let a and b be the additive and subtractive constant terms, respectively, and min and max be the minimum and maximum values of an integer range, respectively (provided by the user), where v, a, b, mim, max ∈ Z, and min ≤ v ≤ max. The set of all possible values of var, ADDSUBSETvar, is the union of ADDSETvar and SUBSETvar defined below. The notation

gcd(a, b) denotes the greatest common divisor (gcd) between a and b.

ADDSETvar(v) =        /0, if a = 0 b_{gcd(a, b)}max−v c [ k=0 {v + gcd(a, b) · k}, if a 6= 0 SUBSETvar(v) =        /0, if b = 0 b_{gcd(a, b)}v−min c [ k=0 {v − gcd(a, b) · k}, if b 6= 0

(43)

ADDSUBSET_var(v) = ADDSETvar(v) ∪ SUBSETvar(v), and

ADDSUBSET_var= [

v∈VALUESETvar

ADDSUBSET_var(v)

Proof. Let v + m · a = e be the equation which provides the elements to ADDSETvar(v), where

m_{∈ Z and v and a are fixed values. See that the variable m informs how many additions of} a shall be performed in v. Let e − n · b = f be the equation which provides the elements to SUBSET_var_{(v), with n ∈ Z. Note that the variable n informs how many subtractions of b shall be} performed in v. See the result of v + m · a is used to generate elements of SUBSETvar(v), since

additions and subtractions are performed intermittently. Thus, it follows that: (

Equation(1) : v + m · a = e Equation(2) : e − n · b = f

Replacing Equation(1) in Equation(2), it follows that v + (m · a − n · b) = f , that can be rewritten as Equation(3): v + m · a + (−n) · b = f . This resultant equation gives us the elements of ADDSUBSETvar(v) when are assigned integer numbers to m and n. To continue the proof, the

corollary of Bachet-Bezout’s theorem (MARTINEZ, 2010) is used, where it states that given the equation a · x + b · y = c, with a, b, c ∈ Z, it admits integer solution in x and y if and only if gdc(a, b) | c, where | denotes the divides relation. In other words, a · x + b · y = gcd(a, b) · k, for some k ∈ Z.

The term m · a + (−n) · b of Equation(3) is equal to gcd(a, b) · k in order to assume an integer solution in m and n. Thus, f (k) = v + gcd(a, b) · k. So, to generate all possible values to var we only need to assign positive values to k when dealing with the verb to add or assign negative values to k when dealing with the verb to subtract. The next step is to define the minimum and maximum values that k may assume, once min ≤ f (k) ≤ max.

To achieve it, let us separate this problem in two cases that are listed below. In both cases, k_{∈ N because the positive or negative signal is already being considered.}

Addition: (f (k) ≤ max) ⇔ (v + gcd(a, b) · k) ≤ max. Solving this inequality, we have

k≤ _{gcd(a, b)}max−v . As this value must be a Natural number set and its maximum value cannot

be greater than max, we applied the floor function to this value. Hence, f (k) ≤ b_{gcd(a, b)}max−v c. Note that this case does not take in consideration the minimum value of k as sum is a monotonic function. In summary, we can formally define ADDSETvar(v) as follows:

ADDSETvar(v) =        /0, if a = 0 b_{gcd(a, b)}max−vc [ k=0 {v + gcd(a, b) · k}, if a 6= 0