SYSTEMATIC KNOWLEDGE ENGINEERING: BUILDING BODIES OF KNOWLEDGE FROM PUBLISHED RESEARCH

(1)

1

BUILDING BODIES OF KNOWLEDGE FROM PUBLISHED RESEARCH

STEFAN BIFFL¹, MARCOS KALINOWSKI², RICK RABISER³, FAJAR EKAPUTRA¹, DIETMAR WINKLER¹

1Institute of Software Technology and Interactive Systems - CDL-Flex, Vienna University of Technology Favoritenstr. 9/188, Vienna, Vienna A-1040, Austria

<firstname>.<lastname>@tuwien.ac.at http://qse.ifs.tuwien.ac.at

2Institute of Computing, Fluminense Federal University,

Rua Passo da Pátria 156, Bloco E, 3º andar, Niterói, Rio de Janeiro 24210-240, Brazil kalinowski@acm.org

3Institute for Software Systems Engineering – CDL-MEVSS, Johannes Kepler University Altenberger Str. 69, Linz, Upper Austria 4040, Austria

rick.rabiser@jku.at Received (Day Month Year)

Revised (Day Month Year) Accepted (Day Month Year)

Context. Software engineering researchers conduct systematic literature reviews (SLRs) to build bodies of knowledge (BoKs). Unfortunately, relevant knowledge collected in the SLR process is not publicly available, which considerably slows down building BoKs incrementally.

Objective. We present and evaluate the Systematic Knowledge Engineering (SKE) process to support efficiently building BoKs from published research.

Method. SKE is based on the SLR process and on Knowledge Engineering practices to build a Knowledge Base (KB) by reusing intermediate data extraction results from SLRs. We evaluated the feasibility of applying SKE by building a Software Inspection BoK KB from published experiments and a Software Product Line BoK KB from published experience reports. We compared the effort, benefits, and risks of building BoK KBs regarding the SKE and the traditional SLR processes.

Results. The application of SKE for incrementally collecting and organizing knowledge in the context of a BoK was feasible for different domains and different types of evidence. While the efforts for conducting the SKE and traditional SLR processes are comparable, SKE provides significant benefits for building BoKs.

Conclusions. SKE enables researchers in a scientific community to reuse and incrementally build knowledge in a BoK. SKE is ready to be evaluated in other software engineering domains.

Keywords: Empirical software engineering; systematic knowledge engineering; systematic review;

body of knowledge; software inspection; software product lines.

1. Introduction

Software engineering researchers collaborate on topics, such as defect detection methods for software inspection [1] and software product lines [2], to build up a body of knowledge (BoK) in their research community. A BoK includes theory models [3], hypotheses and research questions derived from the theory models, and results from empirical studies and

(2)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research experience reports that investigate those hypotheses and research questions, to explain and/or predict phenomena in specific contexts [4].

A consequence of the growing number of research papers with evidence from empirical studies and experience reports is the need to adopt systematic approaches for aggregating research outcomes in order to provide an objective summary of such evidence on a particular topic [5]. In this context, systematic literature reviews (SLRs) have become a widely used and discussed research method [6][7]. However, the main public result of a SLR is, in general, a specific research synthesis report [8], while the accumulated knowledge in the SLR working material, generated from heterogeneous data sources [9], is not available to other researchers. The working material sometimes may even get lost over time, e.g., when the authors of a SLR leave academia. Therefore, each new SLR project has to overcome knowledge-sharing issues and has to rebuild large parts of knowledge, existing but inaccessible in previous SLR projects, making building a BoK considerably less efficient than necessary. Moreover, meta-analyses are limited to the presented research synthesis, not allowing other researchers to explore the underlying extracted data in different ways in order to answer questions related to their specific research goals [9].

Fig. 1 illustrates challenges of building a research BoK with key stakeholders, artifacts, and technologies. In the scope of a BoK, researchers (BoK Researcher) produce research papers on SLR results (SLR Report) and on evidence from studies (Study Report), e.g., empirical studies and experience reports, available to the general readership through digital libraries. However: (1) the SLR reports contain only a specific research synthesis, while data extracted during the SLR process usually stays in a local archive, making it hard for other researchers to reuse the gathered knowledge for applying different analysis and research synthesis methods or to extend the collected knowledge by just using material that is publicly available in the digital library; (2) digital libraries usually provide only text- based search mechanisms in a limited set of categories (e.g., publication meta data or full text search related to keywords), there are no mechanisms for full search of and access to the contained evidence based on semantic BoK theory concepts.

BoK Domain &

Community

BoK

Researcher Readership

Digital Library

Extracted data stays in local archives BoK Data and

Technology

Study Report SLR Data

Extract SLR Report 1

2

No support for semantic access based on BoK

theory concepts

Fig. 1. Stakeholders, artifacts, and challenges for building a research BoK, based on [10].

(3)

In a paper published at SEKE 2014 [10], we addressed these challenges by introducing the Systematic Knowledge Engineering (SKE) approach, which enables systematically building up BoKs from published research. The SKE process builds on the SLR process [11] and improves data management by storing knowledge on BoK domain concepts typically used for theory building [3] from empirical studies and experience reports in an extensible Knowledge Base (KB). The semantic technology [12] used in SKE facilitates researchers to identify relevant knowledge in the BoK through semantic queries (e.g., terms including synonyms and related concepts). Thus, SKE allows truly building up knowledge incrementally as researchers can access and reuse knowledge from past studies and integrate new contributions. Additionally, the KB enables researchers to explore the aggregated knowledge through an extensible set of queries.

We presented a first evaluation of SKE in [10] in which we instantiated SKE to design and build a software inspection BoK KB based on a specific type of empirical study:

controlled experiments. First, we elicited a set of relevant query candidates in a survey with software inspection BoK researchers. Then, we modeled BoK topics related to software inspection theory concepts and a BoK KB data model for evidence from controlled experiments, defining the relevant information to extract. Following the SKE search process, 102 software inspection experiments were identified. An independent team extracted information from the 30 most recent papers to investigate the feasibility of and effort for data import and query resolution by a knowledge engineer (a new role required for applying the SKE process).

In this paper, we extend this previous effort [10] by: (a) further detailing the SKE process and (b) strengthening the initial indications of feasibility by applying SKE to design and build a BoK KB also in a different domain (software product lines) and with different types of evidence (published experience reports). Therefore, we build on a previously conducted SLR [13] in the domain of software product lines with evidence mainly from published experience reports. For evaluation, repeating the same steps of the software inspection experience, we elicited a set of query candidates in a survey with software product line researchers. Then, we modeled BoK topics related to software product line theory concepts and a BoK KB data model for evidence from experience reports, defining relevant information to extract. One of the authors extracted information from results (focusing on product derivation) of the SLR [13]. More specifically, he extracted detailed information from 74 papers included in the original SLR for data import into a new software product line BoK KB instance and query resolution by the knowledge engineer.

Based on these experiences we evaluate the SKE process concerning feasibility and effort as well as discuss benefits and risks, when compared to building BoKs following the traditional SLR process. Major findings were:

 Feasibility. Applying SKE enabled building BoK KBs in different domains (software inspections and software product lines) and based on different types of evidence (controlled experiments and experience reports). The resulting KBs were effective in providing answers to the required pre-defined queries elicited in the surveys.

(4)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research

 Effort: While the data extraction effort is comparable to SLRs, SKE requires a knowledge engineer for data modeling, mapping, and providing query facilities. This overhead is offset by the added benefits of the KB.

 Benefits: The SKE BoK KB facilitates reusing and exploring the gathered knowledge based on semantic search capabilities not available for SLR reports. Additionally, the extension of the BoK is facilitated by enabling incremental content contributions by researchers in and beyond a work group.

 Risk: The SKE process requires the role knowledge engineer with semantic technology skills. If this role is not available in the research domain, there is the risk of losing SKE KB benefits or extra effort is needed to acquire the necessary skills.

The remainder of this paper is organized as follows. Section 2 presents the background of our work as well as related work. Section 3 motivates the research issues and outlines the evaluation approach. Section 4 describes the SKE process developed to address the research issues. In contrast to the recent paper [10], in this paper we separated the SKE process description from its application. Sections 5 and 6, respectively, describe the instantiation of SKE to build BoK KBs for the research areas Software Inspection and Software Product Lines. In Section 7 we discuss the benefits and risks of applying SKE, lessons learned, limitations of our research, and future perspectives. Section 8 concludes the paper.

2. Background and Related Work

The quality and speed of building up a body of knowledge (BoK) in a software engineering research area depend on the ability of researchers to discover the existing content in a BoK, e.g., empirical studies and experience reports investigating a set of hypotheses or research questions. Currently, online searching for content is supported by syntactic, text-based search on limited categories in specialized databases, such as digital libraries. However, support for semantic searching is limited and researchers may not discover all relevant content.

Although some effort has been spent on repositories for empirical studies – e.g., CeBASE [14] and ViSEK [15] – these repositories did not show significant progress since their introduction. To our knowledge, there is no related work on using SLR-based study identification and KB integration for bottom-up BoK building to facilitate reuse and semantic search. Therefore, this section describes work related to the theoretical foundations of main constructs in this research: Systematic Literature Reviews (Section 2.1) and Knowledge Base Design & Population (Section 2.2). Further, we provide an overview on research in the areas Software Inspections (Section 2.3) and Software Product Lines (Section 2.4), as basis for the evaluation use cases.

2.1. Systematic Literature Reviews

Kitchenham and Charters [11] developed guidelines for performing Systematic Literature Reviews (SLRs) in the software engineering (SE) domain. Those guidelines state that the

(5)

main reasons for conducting SLRs are (a) summarizing existing evidence concerning a treatment or technology; (b) identifying gaps in current research; and (c) providing background to appropriately position new research activities. The first reason is directly related to building BoKs by gathering evidence-based knowledge. In the context of this research, the main advantage of using SLRs is allowing systematically summarizing knowledge on a specific BoK scope and enabling incremental updates on top of previous SLRs. An example of such updates is available in [16], where four independent SLR trials were conducted to incrementally build evidence-based defect causal analysis guidelines.

Lessons learned from applying SLRs to the software engineering domain obtained from a broader scope of research are, for example, reported by Brereton et al. [5].

The SLR guidelines [11] summarize three main phases of a systematic review: (a) Planning the Review: identification of the need for a review, commissioning a review, specifying the research questions, and developing a review protocol; (b) Conducting the Review: identification of research, selection of primary studies, study quality assessment, data extraction and monitoring, and data synthesis; and (c) Reporting the Review:

specifying dissemination mechanisms, formatting the main report, and evaluating the report. The PICO (population, intervention, comparison, outcome) strategy [17] is suggested [11] for detailing the research question elements in order to support developing the review protocol.

SLRs have become a widely used and reliable research method [6]. However, the main public result of a SLR is, in general, a specific research synthesis report by the authors.

Unfortunately, reusable SLR packages that include the working material – the data extracted from the selected studies; commonly stored in spreadsheets – holding the accumulated knowledge, are very seldom available. Therefore, new SLRs have to re-build large parts of existing knowledge, making the addition of knowledge less efficient than possible. While considerable effort has been put into improving the SLR process and into conducting SLRs, there is surprisingly little work on making the knowledge in the intermediate results of SLRs accessible and reusable for the research community. Even if such knowledge were available, data integration mechanisms to enable making the knowledge available for further use by other BoK researchers have not been discussed in this context. In summary, the reuse value of SLR knowledge to help building BoKs is limited.

2.2. Knowledge Base Design and Population

The process of designing and building a KB can be seen as a modeling activity [18]. For creating a KB, it is essential to capture domain knowledge through content-specific agreements, so both human and knowledge-based systems can access and use the information [18]. For this purpose, formal ontologies have been successfully used since the 1990s [19]. Ontologies can provide standard terminologies and rich semantics to facilitate knowledge sharing and reuse [18]. OWL DL (Web Ontology Language - Description

(6)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research Logic^a) is the most used language for ontologies as it has the capability of supporting semantic interoperability to exchange and share context knowledge between different systems, and keeps a balance between strong expressiveness and efficient automated processing. In addition, using ontologies enhances searching capabilities, because one may refer to precise semantic concepts rather than simple syntactic keywords, facilitating the use of the knowledge stored in the ontology [20].

Many methodologies have been proposed to design ontologies [21]. However, only a few of them consider collaborative and distributed construction of ontologies, such as the Collaborative Design Approach (CDA) [22], which addresses the issue of collaborative construction of the ontologies by identifying and involving a diverse panel of participants, such as a group of BoK researchers.

Once the ontology or the data model of the KB is defined, it is necessary to capture the extracted data from information resources in accordance to the KB data model. This process is called KB population and involves the creation, transformation, and integration of individuals (instances) into the KB. In our case, the information resources for creating the KB are published research reports. The KB population process may face integration problems if the information resources use heterogeneous representations of the same concepts [9]. The Interchange Standard Approach has been recommended as one of the best solution options for semantic integration [23]. Currently available tools to manage ontologies usually require a knowledge engineer with significant ontology expertise.

Therefore, ontology non-experts need effective and efficient interfaces for both, importing and exporting knowledge, as well as for querying [20].

2.3. Software Inspections

Software inspections (SI) improve software product quality by the analysis of software artifacts, detecting defects for removal before these artifacts are delivered to later software life cycle activities [1]. The traditional SI process by Fagan [24] involves a moderator planning the inspection, inspectors reviewing the artifact, a team meeting to discuss and register defects, passing the defects to the author for rework, so they can be corrected, and a final follow-up evaluation by the moderator on the need of a new inspection.

Over the years, many contributions on SI have been proposed, including alternative SI processes, SI methods to improve the effectiveness and efficiency of inspectors in detecting defects, models and guidelines supporting tasks of the inspection process that involve decision making, as well as tool support [25]. Much knowledge on those contributions has been acquired by conducting empirical studies and can be considered part of a BoK in the SI area. However, to our knowledge no recent SLR or evidence aggregation efforts have been conducted to organize such knowledge in the context of a BoK. The last well-known survey on software inspections has been published in 2002 [26]. We also identified a recent mapping study on inspection experiments [27], which categorizes the experiments, but does not detail the experiment results and aggregations. Therefore, it still takes

a Web Ontology Language – Description Logic: http://www.w3.org/TR/owl-ref/#OWLDL

(7)

considerable expertise and effort to identify studies and study results relevant for a given domain concept in the scope of SI.

To support the software inspection community in building up their BoK, specific BoK topics can help to define the scope of knowledge. The IEEE Software Engineering BoK (SWEBoK) [28] breaks down the Software Testing knowledge area into the following topics: Fundamentals, Test Levels, Test Techniques, Test-Related Measures, Test Process, and Software Testing Tools. SI relates to Test Techniques in the context of the SWEBoK.

Nevertheless, they can be seen as a similar topic of interest and a hierarchical structure for them could be useful to facilitate organizing knowledge.

However, a fixed BoK topic structure, based on the SWEBoK, may be limiting, since it is possible to apply several variant options of SI. Laitenberger and DeBaud [29] provide some parameters that help define SI variant options based on an early literature review on SI experiments: SI artifacts (e.g., requirements, design, or code), SI process (e.g., SI with or without a group meeting), and SI methods (e.g., reading techniques). A combination of such a list of parameters could be used as a starting point for a flexible SI BoK topic structure.

2.4. Software Product Lines

Software Product Lines (SPLs) are a reuse-driven software development paradigm that aims at developing software products faster, at lower cost, and with better quality than when developing single systems [2]. SPLs have been successfully applied in several application domains (e.g., automotive, industry) [2][30].

The SPL BoK is growing at an increasing rate, i.e., more and more papers focusing on SPL are published both at specific (e.g., the International Software Product Line Conference Series (SPLC^b; www.splc.net) and the International Workshop Series on Variability Modelling of Software-Intensive Systems (VaMoS^c; www.vamos- workshop.net) and general broader scoped software engineering venues (e.g., ASE, ESEC, ICSE, and SEKE). Several textbooks have been published, focusing on the process of SPL engineering [2][31][32] or on collecting research results [30][33]. Several institutions and researchers have made efforts to organize selected knowledge and provide it online to the community, e.g., the SEI^d and SPLOT^e. However, these collections have been created for particular goals, e.g., focusing on the SEI framework for SPL practice or on SPL online tools. None of these publications has analyzed the SPL BoK systematically. Instead information has been added in an ad-hoc manner based on specific authors’ preferences and goals. Moreover, the existing collections cannot address the constant inflow of new knowledge easily.

Attempts have been made to systematically analyze parts of the SPL BoK, i.e., several SLRs and systematic mapping studies have been conducted since 2009, focusing on

b SPLC: http://www.splc.net

c VAMOS: http://www.vamos-workshop.net

d SEI: http://www.sei.cmu.edu/productlines/

e SPLOT: http://www.splot-research.org/

(8)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research particular topics of the SPL research field, such as domain analysis [34], requirements engineering [35], variability management [36], product derivation [13], multi product lines [37], agile SPL engineering [38], and SPL testing [39][40][41].

While these SLRs systematically collected relevant research on a particular SPL topic or address particular research questions, an important limitation is that only the results of the SLRs have been published, targeting a particular venue and addressing particular research questions and goals. Thus, the significant effort that went into the collection and analysis of hundreds of SPL publications and the associated information remains unavailable to the SPL community, making future extensions and additional analyses more difficult than necessary [42]. Therefore, the SLR reports only help to a limited extent to provide support for incremental knowledge building on SPL theory concepts and findings.

As for SI, to support the SPL community in building up their BoK, specific BoK topics can help define the scope of knowledge. Unfortunately, within SWEBoK [26] SPLs are only briefly mentioned in the Software Design knowledge area’s Software Structure and Architecture topic and no further tailoring on how to organize knowledge on SPLs is provided. A starting point for a SPL BoK topic structure could be obtained by analyzing the types of elements (theory concepts) used in a theoretical description of SPL engineering processes, e.g., by Pohl et al. [2].

3. Research Issues

The overall goal of the SKE approach is to enable incrementally building up BoKs by providing a process for knowledge acquisition and querying, including a method for BoK KB design and population with tool support. Therefrom we derive three research issues.

3.1. Research Issue RI-1: SKE Requirements Analysis

This research issue focuses on the stakeholder (researchers, community) needs concerning a BoK and on the most relevant stakeholder queries to knowledge on evidence in the BoK.

More specifically, researchers want to synthesize BoKs in a particular research area but tend to focus on producing specific research reports and seem to spend much less effort on considering data management to provide their BoK community with suitable access to the created knowledge other than by published research reports [9]. Based on discussions with SLR researchers, the BoK user interface for data import should be based on spreadsheet technology and the user interface for querying should be based on web technologies to enable easy access by the BoK community. The queries of interest to researchers may vary according to the software engineering area (e.g., SI and SPL) to which the BoK relates.

Therefore, we suggest conducting surveys with the specific BoK community to gather relevant queries. We conducted informal surveys with SI and SPL researchers. We are aware of the limitations of these surveys and see its outcome as a preliminary working result, which is still useful to drive the SKE research at this stage, and can be extended by future surveys that are more formal.

(9)

3.2. Research Issue RI-2: SKE Process and KB Design

This research issue investigates how the well-established traditional SLR process can be adapted to better support incrementally building BoKs focusing on the data elements that are necessary to consistently capture evidence of a specific type (e.g., types of empirical studies, experience reports) and on a specific topic (e.g., SI and SPL) to enable answering relevant stakeholder queries.

The SKE process builds on the traditional SLR process [11] and on the Collaborative Design Approach (CDA) [22] for knowledge engineering. The key idea is to loosen the tight connection between the SLR process steps data extraction and data synthesis in order to allow collecting knowledge from research papers in a KB, as input to a range of research synthesis methods in a BoK community. A second key goal is to enable incrementally building up knowledge in the context of a BoK. We evaluate the resulting SKE process regarding feasibility, effort, benefits, and risk for building BoKs in comparison to the traditional SLR process baseline.

Concerning designing the KB, SKE aims at designing a common KB model that captures both, concepts from the BoK domain (e.g., SI and SPL), and from the selected types of studies (e.g., controlled experiments and experience reports), to allow applying a variety of semantic queries according to specific goals when analyzing the evidence.

3.3. Research Issue RI-3: SKE Tool Support

Finally, this research issue concerns the functions that are necessary to automate key steps in the SKE process, i.e., efficient data integration and querying. It is important also to clarify that the integration of separate BoK KB instances is not considered yet in the context of this research issue and concerns future work.

For automating the SKE process a knowledge base (KB) using semantic technology [12] is a major element to provide the desired semantic capabilities. The SKE KB is based on semantic technology with ontologies. Using ontologies makes the KB model extensible and facilitates semantic search [20]. Based on the requirements coming from RI-1 the user interface for data import should be based on spreadsheet technology and the interface for querying should be based on web technologies.

3.4. Evaluation Approach

For evaluation, we conducted two proofs of concept. First, we applied the SKE process and tool support for building a Software Inspection (SI) BoK KB with knowledge acquired from published experiments identified in digital libraries [10]. Thereafter, to strengthen initial indications of feasibility, we applied SKE in another domain and to a different type of evidence to build a Software Product Line (SPL) BoK KB with knowledge acquired mainly from published experience reports identified in a previously conducted SLR. The differences between the two applications are shown in Table 1.

(10)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research Table 1. SKE case studies for building BoK KBs

BoK Scope Type of Evidence Information Source Software Inspection Controlled Experiments Digital Libraries

Software Product Lines Experience Reports SLR [13], including private SLR work documents During these SKE proofs of concept, we evaluate the feasibility of applying the process and tool support and the required effort, when compared to building BoKs following the traditional SLR process.

For the feasibility, during the proofs of concept, we evaluate the ability to build the two BoK KBs following the process (RI-2) and tool support (RI-3) and the effectiveness to answer the identified stakeholder queries (RI-1). Concerning effort, we measured the person hours required for applying each SKE phase and compare it to the effort of conducting SLRs.

Based on the evaluations and on our previous experiences conducting SLRs we discuss benefits and risks of applying SKE to incrementally build and provide access to knowledge in a BoK as well as lessons learned.

The SKE process is described in Section 4. The evaluations of SKE when building the Software Inspection BoK KB and the Software Product Line BoK KB are detailed in Sections 5 and 6, respectively. The discussion of the evaluation results is provided in Section 7.

4. Systematic Knowledge Engineering

This section presents an extended description of the SKE process [10]. Therefore, the following subsections provide details on how each research issue (requirements analysis, SKE process and KB design, and SKE tool support) was addressed.

4.1. SKE Requirements Analysis (RI-1)

Fig. 2 shows how the SKE approach addresses the challenges posed in Fig. 1 by introducing a BoK KB and the role of a knowledge engineer. In this context (1) BoK researchers extract data from research reports published in digital libraries and have that extracted data (SKE Extracted Data) integrated into the BoK KB by the knowledge engineer; and (2) the collected knowledge is then available for semantic querying (Semantic Queries) and export (KB Query Data) from the KB also to the general readership, including other researchers and practitioners. Consequently, stakeholders can formulate needs that become semantic queries (e.g., “which is the most effective inspection method for requirements inspections?”, “Which are the main findings concerning SPL product derivation?”) to the BoK KB based on well-known BoK theory concepts (e.g., effectiveness, inspection method, findings, product derivation). Therefore, an important part of such a BoK KB is a glossary to define the precise terminology and synonyms of the BoK theory concepts, including variants of term definitions that may occur in different parts of a research area.

Based on discussions with SLR researchers, the BoK KB user interface for data import should be based on spreadsheet technology (commonly used when conducting SLRs) and

(11)

the interface for querying should be based on web technologies to enable easy access for a whole community. The stakeholder queries of interest may vary according to the software engineering area (e.g., SI and SPL). Therefore, we suggest conducting surveys with the specific BoK community to gather relevant queries. However, the set of queries should be extensible and based on semantic technologies, so the knowledge engineer can always add new queries of interest and tailor them to retrieve precise semantic results based on well- defined domain concepts from both, the software engineering area (e.g., SI or SPL) and the types of evidence (e.g., experiment or experience report).

Fig. 2. The SKE process stakeholders and technology, based on [10].

4.2. SKE Process and KB Design (RI-2)

Fig. 3 compares the phases and stages of the SLR [11] and SKE processes [10]. While the first phase (Planning the Review/BoK Creation) – is similar in both processes, the phases 2 (Modeling the BoK KB) and 4 (Populating the BoK KB) are new in the SKE process and require the participation of the knowledge engineer. The key innovation of the SKE process comes (a) from decoupling the process step data extraction from data synthesis and (b) from integrating extracted data into a KB designed for BoK building.

This SKE innovation aims at making the extracted data available to the BoK research community, rather than using the data to apply a particular synthesis method for answering a specific SLR research question in the format of a SLR report. Thus, the SKE approach allows the community to extend the knowledge gathered during data extraction and reusing it, with the KB’s semantic search facilities, as building blocks for a variety of analyses (e.g., aggregated analysis of the evidence on specific BoK theory concepts). The process of data synthesis and reporting follows the SKE process building on the SKE results, i.e., the BoK KB with instances and query access.

Details on each of the four SKE process stages and how they can be applied to build a BoK KB follow (see Sections 5 and 6 for examples of the application of the SKE process to build a SI and a SPL BoK).

(12)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research 4.2.1. SKE Phase 1: Planning BoK Creation

Similar to conducting a SLR, the first SKE phase starts with identifying the need and commissioning the creation of the BoK. Since SKE has a predefined purpose of building a BoK, instead of specifying research questions, SKE just needs to specify the BoK topics (e.g., SI or SPL) and the types of evidence of interest (e.g., controlled experiments or experience reports). Then, the SKE protocol is built based on a specific configuration of the PICO (population, intervention, comparison, outcome) strategy [17] to derive search strings that can be applied to digital libraries in the “P and I and C and O” format. In this configuration, the population represents the specified BoK or some of its topics. The intervention represents the specified empirical study types. The comparison is blank (unless the BoK scope of interest is reduced to a specific comparison between topics) and the outcome represents mandatory elements of interest to extract from the research papers (e.g., hypotheses, research questions, findings).

As in a SLR, the protocol includes the search strategy with the definition of sources of primary studies (e.g., digital libraries), the study selection criteria and procedure, the quality assessment procedures, and the data extraction strategy.

Systematic Literature Review

2. Conducting the Review Identification of research Selection of primary studies

Study quality assessment Data extraction and monitoring

Data synthesis

3. Reporting the Review Specifying dissemination

Formatting main report Evaluating the report 1. Planning the Review Ident. the need for a review

Comissioning a review Specifying the research question(s)

Developing the review protocol

Systematic Knowledge Engineering

3. Conducting Data Extraction Identification of research Selection of primary studies

Study quality assessment Data extraction and monitoring

2. Designing the BoK KB Designing the KB common data model

1. Planning BoK Creation Ident. the need for the BoK Comissioning BoK creation Specifying the BoK topic and evidence type

Developing the protocol

Data Synthesis + Reporting the Review 4. Populating the BoK KB Integrating data extraction into KB

Providing KB query facilities

Fig. 3. Side-by-side comparison of the SLR [11] and SKE processes, based on [10].

4.2.2. SKE Phase 2: Designing the BoK KB

Once planning is accomplished all the necessary information for designing the BoK KB data model is available. This phase is not included in the SLR process, although, besides

(13)

being necessary for building the KB, we believe that a data model also improves the reasoning for structuring the data to be extracted from the selected papers. If SKE is applied for extending an existing BoK KB, this phase may become just a review of the previously designed KB data model. For designing the KB data model collaboratively with the support of the knowledge engineer the Collaborative Design Approach (CDA) [22] for knowledge engineering should be considered.

Fig. 4 shows a high-abstraction view on the context in which evidence is generated with three connected parts and elements to be considered for tailoring the data model.

Basically, researchers of different research groups provide publications containing evidence on certain topics in a BoK. The evidence is based on certain data/artifacts, typically not made available to the community when conducting a SLR. Indirectly all the elements shown in Fig.4 are connected, which allows in-depth querying possibilities (e.g., evidence on specific BoK topics, researcher groups active in a BoK). For example, when considering the SPL BoK, a query on the BoK Topic “product derivation” allows to get all related research papers as well as to find out which research groups are working on this particular topic and which artifacts (e.g., requirements, architecture, code, tests) in the SPL engineering process are addressed by which approaches.

Research Group

Publication BoK Topic

Data/Artifacts BoK

Evidence

Fig. 4. Major areas of the SKE data model.

Thus, three connected model parts should be designed in this phase: one part concerning the BoK and the tailoring of its topics (e.g., Software Inspection, Software Product Lines, and their topic tailoring); a second part concerning the evidence domain concepts of the type of evidence of interest (e.g., controlled experiments, case studies, experience reports); and a third part on the research groups and the publications they produce.

The reason for using three connected model parts is that, while the modeling of the BoK and its topics (white part in Fig. 4) is likely to change significantly from one SKE application to another, the modeling for similar types of evidence (dark grey part in Fig. 4) should be fairly stable, and the model part on research groups and publications (light grey part in Fig. 4) is expected to stay even more stable and can potentially be directly reused.

Note that for applying SKE the design of the KB should be done based on the BoK topics and the type of evidence of interest, not directly on query needs. Modeling the KB design in a flexible (query-agnostic) way is likely to facilitate better integration of related BoK KB instances as needed.

(14)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research 4.2.3. SKE Phase 3: Conducting Data Extraction

The data extraction phase follows the search, selection, and assessment strategies in the protocol for extracting relevant data. A spreadsheet should be prepared for this phase to gather relevant data, according to the information to be loaded into the KB data model. The specific format of the spreadsheet is not important, as data integration and mapping to the domain concepts of the data model can be conducted by the knowledge engineer by using the Interchange Standard Approach [23]. During data extraction, the researcher may also add entries for relevant identified domain concepts, their synonyms and related concepts in the online glossary tool.

As data synthesis in SKE is decoupled from data extraction, the goal is integrating the data into a KB, so the BoK community can reuse the knowledge afterwards by applying semantic queries. Thus, the next SKE phase concerns creating and populating the BoK KB.

4.2.4. SKE Phase 4: Populating the BoK KB

In this phase, the knowledge engineer has to integrate the extracted data into the BoK KB.

For the integration of data from heterogeneous data sources (e.g., from differently structured spreadsheets) the Interchange Standard Approach [23] can be applied. It allows mapping extracted data elements to precisely defined domain concepts included in the KB data model (e.g., research questions, hypotheses, response variables) and to conduct the integration in a flexible way.

After population, the knowledge engineer is also responsible for providing query facilities. Those queries should reflect the stakeholder information needs, which could be gathered in a survey (see Section 4.1 on RI-1), so the query outcomes can be used as building blocks for further analyses by researchers in and beyond a work group. Concrete examples of such stakeholder queries on SI and SPLs can be found in Sections 5 and 6.

4.3. SKE Tool Support (RI-3)

Fig. 5 provides an overview on the architecture of the SKE tool support. The KB was implemented using the Protégé framework^f and uses semantic technology with ontologies to facilitate semantic searches. Besides the KB, the tool support comprises a spreadsheet data contribution interface and a web prototype for querying. The data contribution interface was automated in Java by using a spreadsheet reader library (Apache POI^g) and an ontology library (Apache Jena^h). The Interchange Standard Approach [23] enables the integration of heterogeneously structured data. The queries of the web prototype were implemented using the SPARQLⁱ query language.

Using ontology-specific features, the knowledge engineer enhanced the KB by implementing semantic search functions (e.g., searching on domain concepts, their

f Protégé: http://protege.stanford.edu/

g Apache POI: http://poi.apache.org/

h Apache Jena; http://jena.apache.org/

i SPARQL: http://www.w3.org/TR/rdf‐sparql‐query/

(15)

synonyms and related concepts). Additionally, given the lack of an agreed terminology in many software engineering knowledge areas, a glossary tool was implemented, allowing defining precise terminology and synonyms for relevant BoK theory concepts. Thus, the semantic query results (e.g., searching for synonyms and related concepts), when searching for specific BoK theory concepts, can be improved by providing synonyms for theory concepts in the online glossary tool.

Fig. 5. Overview on the architecture of the SKE tool support.

Having provided this overview on the SKE process and its tool support, the next sections describe the experience of applying them to build the SI and SPL BoK KBs.

5. Proof of Concept 1: Applying SKE to build a Software Inspection BoK KB We applied the SKE process phases with tool support to build a SI BoK KB based on knowledge acquired from controlled experiments. During this experience, information was extracted from 30 typical research papers and integrated into the KB. While the authors of this paper provided the SKE process and data model design, the data extraction was conducted by an independent experimentation expert team. In the subsections hereafter we describe the identified relevant queries, how SKE was applied, the resulting SI BoK KB, and the evaluation results regarding feasibility and effort.

5.1. Relevant Queries on Software Inspection Experiments

As suggested in Section 4, the relevant BoK KB stakeholder queries were identified by conducting a survey. We focused on SI researchers as main stakeholders, who need to be aware on relevant research in their area. Based on an informal survey with SI researchers in six research groups (located in Austria, Brazil, Chile, Ecuador, and Spain), we identified a set of query candidates. The selection of most relevant queries was based on a limited

(16)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research budget of value points, which each stakeholder could spend on the query candidates. Then the query candidates were sorted in descending order by the total number of points of each query. Overall, ten researchers from the six research groups contributed to this selection process. The six most relevant stakeholder queries were:

Q1 Software Inspection Methods. Which inspection methods were effective (or efficient) in finding defects in requirements artifacts?

Q2 Software Inspection Experiment Results. What are the results of experiments that report on a given BoK Topic, e.g., inspection method “Perspective-Based Reading” (PBR)?

Q3 Software Inspection Experiment Overview. Which experiments were conducted with a given response variable, e.g., number of defects?

Q4 Software Inspection Hypotheses. Which hypotheses have been investigated on a given domain concept (and its synonyms), e.g., effectiveness?

Q5 Software Inspection Synonyms. Which synonyms have been used for a given domain concept, e.g., effectiveness?

Q6 Software Inspection Research Groups. Which research groups are working on certain BoK Topics with a given response variable, e.g., efficiency?

While gathering the queries we noticed great variation in the terms used by different researchers for similar concepts, highlighting the need for semantic querying facilities based on well-defined domain concepts (e.g., allowing to search on synonyms and related concepts).

5.2. Applying the SKE process

In the following subsections we describe the experience of applying each SKE phase to build the SI BoK KB. More details on the resulting SI BoK KB and the query results are provided in the next section.

Planning BoK Creation. As mentioned in Section 4, since SKE has a predefined purpose of building a BoK, instead of specifying research questions, SKE just needs to specify the BoK (topics) and the types of evidence of interest. In the case of building the SI BoK KB, the BoK was software inspection and the empirical studies of interest were controlled experiments. Then, the SKE protocol was built based on the suggested configuration of the PICO (population, intervention, comparison, outcome) strategy, with the population as software inspection, intervention as controlled experiments and outcome as experimental study results.

The protocol also includes the search strategy with the definition of sources of primary studies (e.g., digital libraries), the study selection criteria and procedure, the quality assessment procedures, and the data extraction strategy. In our case, a single digital library was chosen: Scopus, which claims to be the largest database of abstracts [11] and seems sufficient for our study purpose. The study selection and quality assessment criteria were:

the study should be an experiment published in a peer-reviewed publication medium. The search string to be applied on Scopus was derived from the synonyms used for the PICO

(17)

elements, adding two specific operators available at Scopus: (i) TITLE-ABS-KEY, avoiding searching in the reference metadata, which could lead to a significant amount of false positives concerning papers that only cite inspection experiments; and (ii) W/2, allowing a distance of up to two words between keywords. The resulting search string was:

TITLE-ABS-KEY ((software W/2 (inspection OR "defect detection" OR "reading technique")) AND ("experimental study" OR "experimental evaluation" OR "experiment"

OR "empirical study" OR "empirical evaluation") AND ("hypothesis" OR "evidence" OR

"finding" OR "result"))

Note that if there is already a BoK KB, the synonyms for the PICO elements can be derived directly from it by running a query to list the glossary synonym entries for a specific domain concept.

Designing the Software Inspection BoK KB. SI experts and the knowledge engineer designed the data model applying CDA [22]. More details on the KB data model for hosting information on software inspection experiments follow. As suggested by SKE, we designed the related data model parts. Consistently with Fig.4, throughout this section the part of the model concerning the BoK and its topics (in this case related to software inspections) will be shown in white, the part concerning the type of evidence (in this case controlled experiments) will be shown in dark grey, and the part on the research groups and publications will be shown in light grey.

Fig. 6 shows how experimental evidence was modeled in UML. We decided modeling in UML (readable for non ontology experts) and then having the knowledge engineer defining the concepts manually in the ontology using Protegé (see also [43] for direct UML-to-OWL transformation). The model UML model was based on the elements presented in Fig. 4 and on experimental concepts described by Wohlin et al. [4]. Note that this part of the model may be reused for building other BoKs based on evidence from controlled experiments.

Fig. 6. Software inspection experiment KB data model overview.

(18)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research To organize the aggregated knowledge in a flexible way and to link the evidence to inspection BoK topics, each topic was designed as relating to a set of inspection parameters, extended from the list of parameters discussed by Laitenberger and Debaud [29]. For instance, knowledge on requirements (artifact) inspections applying PBR (i.e., perspective based reading, a specific inspection and reading technique [44]) (method) during individual preparation (activity) when conducting Fagan inspections [24] (process).

This BoK topic configuration is shown in Fig. 7. Note that this kind of design is BoK specific and cannot be reused for different BoKs.

The resulting model allows querying evidence acquired from experimental studies.

Such queries can, for instance, list hypotheses of experiments related to specific BoK topics parameters (or their synonyms), the results for each hypothesis in the available experiment runs (confirmed/rejected), and information on their statistical confidence. Moreover, measurements that led to each of those results can also be obtained.

Fig. 7. Empirical studies linked to inspection BoK topics.

For data extraction, a spreadsheet was prepared to gather relevant experiment data, according to the information to be loaded into the KB data model on inspection experiments. This information concerns the entities shown in Fig. 6 and Fig. 7 and their attributes. The complete data model (including the attributes) of the SI BoK KB is available in the online prototype, which is presented in Section 5.3.

Conducting Data Extraction. During data extraction, the search, selection, and assessment strategies in the protocol for extracting relevant data are followed. In our case, we executed the search string in Scopus in December 2013. Overall, 156 papers were retrieved. After filtering by title and abstract, a set of 102 papers containing experiments on software inspections were identified, ranging from 1985 to 2013.

A sample consisting of the 30 most recent experiments (ranging from 2006 to 2013) was chosen as a basis for data extraction. Six independent local EMSE experts extracted information from those papers into spreadsheets, with an extra expert acting as a data checker. Data extraction took on average about 2 person hours per paper. Data checking took additional 0.5 person hours per paper.

(19)

Populating the Software Inspection BoK KB. In this phase the knowledge engineer (another author of this paper) conducted the integration of the data from the local spreadsheets into the SI BoK KB data model and implemented the requested stakeholder queries. Using the spreadsheet data contribution interface data from 6 spreadsheets, containing data on 30 experiments (5 different experiments in each) and including over 6,000 data elements (cells), were integrated into the KB in less than 3 minutes.

Concerning the queries, the knowledge engineer formulated them in the KB language so that results could be obtained. The simple web interface prototype (see Fig. 8) allows stakeholders to easily retrieve query results within a few seconds.

5.3. The Resulting Software Inspection BoK KB

It was feasible to build the SI BoK KB by applying SKE, the resulting SI BoK KB was effective in answering the requested stakeholder queries and its prototype is available online^j. The KB queries use semantic features, enabling to, when providing a specific domain concept as parameter (e.g., “Perspective-Based Reading”), also retrieve results on similar concepts, by considering the synonyms (e.g., “PBR”) and related concepts (e.g.,

“Scenario-Based Reading”) informed for the domain concept in the glossary, which is also available online^c. Thus, query results can be tuned by adjusting entries on domain concepts, synonyms and related concepts in the glossary.

To verify the query results, an independent researcher built a set of query test cases based on the 30 papers included in the KB. Queries Q2 to Q6 could be directly formulated, listing experiment results, experiments, hypotheses, synonyms, and research groups. The related test cases (including tests on synonym search capabilities) passed successfully with the query results, which can be accessed in the online prototype.

However, query Q1: “Which inspection methods were effective (or efficient) in finding defects in requirements artifacts?” addressing a practitioners’ need could not be answered directly, as this question needs expert data analysis on top of identifying appropriate experiment data and results. Thus, it had to be translated into terms of the underlying data model, to provide the information that could support such analysis. The first decision was to focus on experiments that reported on effectiveness or efficiency (or synonyms) in their hypotheses or response variables and that were related to BoK topics associated to inspection methods and to the artifact type requirements. For allowing semantic search on synonyms the additional domain concepts KB technology and the glossary were used. The test case built for this query was related to identifying the right set of experiments and passed successfully. Then, to answer the query, it was split into two separate queries that would provide an overview on the knowledge of interest out of those experiments. Q1.1 was to list the hypotheses and their results in all experiment runs. Q1.2 was to show the findings of all papers related to them.

j Online Software Inspection BoK KB Prototype and Glossary: http://cdlflex.org/prototypes/ske

(20)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research Fig. 8 shows results of Q1.1 enabling stakeholders to get an overview on experimental investigations (the hypotheses of the 6 identified experiments) on the effectiveness of requirements inspection methods and their results. Note that the screenshot shows 6 out of 34 experiments, as the KB has been growing with new contributions since the initially imported 30 experiments. Stakeholders can also see other focused and interesting information. For instance, that defects detected with PBR are more evenly distributed over the document, when compared to using checklists. Regarding Q1.2, it provided an insightful overview on findings of the papers on effectiveness of inspection methods. Thus, the verification of all required queries passed successfully in the scope of the SKE research prototype.

Fig. 8. Prototype screenshot answering query Q1.1.

5.4. Evaluation

In this section, we describe the evaluation results concerning SKE process feasibility and effort obtained during the experience of applying SKE to build the SI BoK KB. A discussion on benefits and risks of applying SKE is provided in Section 7.

SKE Process Feasibility. We evaluated the feasibility of applying each SKE phase (planning, designing the BoK KB, data extraction, and populating the BoK KB). During planning, the SKE PICO configuration effectively supported the identification of relevant experiments on software inspections. The accuracy of the derived search string was evaluated by comparing the retrieved list of papers against a well-known survey on software inspections [26] (published in 2002) and a recently published systematic mapping of experiments on the software inspection process [27]. The experiments reported in these papers and indexed in Scopus were successfully retrieved. Therefore, we see the identified papers as a representative set for inspection experiments.

Regarding the design and data extraction phases, data extraction into spreadsheets containing information on software inspection experiments based on the designed KB data model was successfully achieved. Thus, the designed data model for software inspection

(21)

experiments was effective in characterizing the inspection experiments (modeling similarities and variations) and their results.

Finally, concerning populating the BoK KB, it was possible to create the software inspection KB from the published experiment reports. The knowledge engineer conducted the integration of the data from the local spreadsheets into the KB data model, by using the spreadsheet contribution interface. The resulting Software Inspection BoK KB is available for querying through the web interface prototype^j and was effective in providing accurate answers to the identified stakeholder queries.

SKE Process Effort. Based on our previous SLR experiences, we found the overall effort of applying the SKE process comparable to the effort of conducting SLRs (around 104 person hours). However, the effort of extending the SKE results is likely to be considerably lower (especially if done by other researchers, by allowing directly reusing and extending the extracted data, currently not publicly available in most SLRs).

Table 2 shows the effort spent on each SKE phase to build the SI BoK KB and an informal effort comparison to the corresponding SLR phase. Planning takes slightly less effort with SKE, since SKE applies a predefined PICO configuration. Designing a KB is not considered in SLRs. Although in SKE data extraction from primary studies probably takes somewhat more effort than for the SLR (to gather relevant data based on the type of evidence for flexible querying instead of specific data for answering pre-defined research questions), the overall conduct phase takes about the same effort, since SKE does not apply data synthesis in this phase. Finally, populating the BoK KB is not considered in SLRs.

Table 2. SKE SI BoK KB process effort.

SKE Effort (person hours) Effort Description SKE vs. SLR

Planning BoK Creation 8 Building the protocol. ↓ (- 10% to -5%)

Designing the BoK KB 16 Designing the data model. N/A

Conducting Data Extraction 80 Filtering primary studies and data extraction and checking.

↔ (-5% to +5%) Populating the BoK KB < 0.05 KB data integration

(automated).

N/A

Note that SKE required some setup effort by the knowledge engineer for creating the spreadsheet importer, providing the query facilities, and developing a suitable user interface. Table 3 shows this setup effort. However, in the future, this setup effort will be considerably lower, i.e., only adjustments of the spreadsheet importer and maybe some polishing of the user interface might be required.

Table 3. Knowledge engineer setup effort.

Knowledge Engineer Effort (person hours) Effort Description

Creating BoK KB ontology model 16 Specifying the ontology model based on the data model.

Spreadsheet Importer Creation 40 Developing the spreadsheet importer using Apache Jena and POI.

Query creation 32 Translating queries into SPARQL (and testing if they

retrieve the right results based on the integrated data).

UI Development 40 Developing a querying user interface.

(22)

Systematic Knowledge Engineering: Building Bodies of Knowledge from Published Research 6. Proof of Concept 2: Applying SKE to build a Software Product Line BoK KB Motivated by the positive results obtained from applying SKE to build a SI BoK KB based on knowledge acquired from controlled experiments, we decided to further evaluate it to strengthen the initial indications of feasibility, switching the domain and the type of evidence. Therefore, we build on a previously conducted SPL SLR [13] with knowledge acquired mainly from published experience reports. During this experience, one of the authors of the original SLR was provided with the SKE process and data model design and extracted detailed information from 74 papers included in the original SLR again for data import and query resolution by the knowledge engineer. As done for the SI BoK KB, in the subsections hereafter we describe the identified relevant queries, how SKE was applied, the resulting SPL BoK KB, and the evaluation results.

6.1. Relevant Queries

This time we focused on SPL researchers as main stakeholders and informally discussed potentially useful queries with five of them at VaMoS 2014 to elicit relevant queries for accessing research evidence in a SPL BoK. The idea was not to come up with a complete set of queries but just to demonstrate the feasibility of such queries in the context of the SPL BoK KB. Other researchers can easily define and request the implementation of additional queries. As a result, we came up with the following four relevant queries:

Q1 Software Product Line Research Findings on BoK topic. Which are the research findings by a particular SPL BoK topic combination (i.e., a combination of SPL theory concepts on process, activity, artifact and/or application domain)? This query should enable to provide a combination of SPL theory concepts, up to one for each SPL BoK topic type.

The query should retrieve the list of findings related to the provided SPL theory concepts (or to their synonyms). For instance, all research findings on product derivation regarding the activity testing (see Fig. 8).

Q2 Software Product Line Research Findings on Domain Concept. Which are the research findings by a particular domain concept (e.g., tool support)? This query should run a full-text search on the reported findings, retrieving all the findings containing the provided domain concept (any concept that may appear in the findings) or its synonyms.

For instance, all research findings on product derivation tool support.

Q3 Software Product Line Research Groups. Who are the research groups active in a particular SPL topic combination? The query should retrieve the research groups and authors (extracted from BibTex) of publications related to the provided SPL theory concepts (and their synonyms). For instance, research groups or authors reporting research on product derivation.

Q4 Software Product Line Publications on Domain Concept. Which are the publications concerning a particular domain concept? This query should run a full-text search on three fields: (a) the publication citation (BibTex), (b) the experience report objective, and (c) the experience report context description. It should retrieve all publications containing the provided domain concept (or its synonyms) in any of the fields, for instance, all publications discussing product derivation issues.

(23)

Again, while gathering the queries in this proof of concept we also noticed great variation in the terms used by different researchers for similar concepts, highlighting the need for semantic querying facilities based on well-defined domain concepts (e.g., allowing to search on synonyms and related concepts).

6.2. Applying SKE

In the following subsections we describe the experience of applying each SKE phase to build the SPL BoK KB. More details on the resulting SPL BoK KB and the query results are provided in the next section.

Planning Software Product Line BoK Creation. This time we used the papers identified in a specific SLR [13] as basis for data extraction. Those were papers on the SPL BoK topic product derivation and the type of evidence was published experience reports. We did not need to build a protocol (e.g., based on the synonyms for the PICO elements) for identifying the research papers in digital libraries, as we used the set of papers of the original SLR. Thus, the search strategy, the sources of primary studies, the study selection criteria and procedure, and the quality assessment procedures were directly reused. The data extraction strategy was based on the KB design, which is discussed in the next subsection.

Designing the Software Product Line BoK KB. As suggested by SKE, we designed the KB data model based on the high-abstraction elements presented in Fig. 4 and on typical experience report elements (e.g., goals, research questions, context, and findings).

To model the SPL BoK topics in a flexible way, we analyzed the types of elements (theory concepts) used in the theoretical description of SPL engineering processes, e.g., by Pohl et al. [2]. These elements are shown in Fig. 9, and comprise processes (e.g., domain and application engineering, product derivation, and product line evolution/management), activities (e.g., domain requirements engineering, application requirements engineering), and artifacts (e.g., variability model, requirements, architecture, components, and tests).

Besides the processes, activities, and artifacts, we also considered the different application domains as relevant SPL BoK topics, given that research evidence may differ for distinct application domains, e.g., automotive, industry.

Fig. 10 shows an overview on the main elements in the SPL BoK KB data model, generated collaboratively by the BoK researchers and the knowledge engineer. The Evidence on the SPL BoK Topics (regarding Processes, Activities, Artifacts, and Application Domains) is gathered through Experience Reports (or feasibility studies), which may have defined Research Questions and have Findings highlighted in Publications published by different Research Groups. We decided to model evidence in this way due to the type of evidence available in the primary studies of our evaluation use case. This evidence, as reported also in other SPL SLRs [13][35][36], did not fit to rigorous types of evidence (such as empirical evidence provided by experiments). The complete KB