Automatic Generation of Synthetic Website Wireframe Datasets from Source Code

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

Automatic Generation of Synthetic

Website Wireframe Datasets from

Source Code

Bárbara Sofia Lopez de Carvalho Ferreira da Silva

Mestrado Integrado em Engenharia Informática e Computação Supervisor: André Restivo

Second Supervisor: Hugo Sereno Ferreira

(2)

(3)

Automatic Generation of Synthetic Website Wireframe

Datasets from Source Code

Bárbara Sofia Lopez de Carvalho Ferreira da Silva

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Chair: Prof. João Jacob

External Examiner: Prof. Tânia Rocha Supervisor: Prof. André Restivo July 23, 2020

(4)

(5)

Abstract

Wireframes are visual representations of user interfaces, stripped of any visual design or branding elements. User interface (UI) and user experience (UX) designers use them in the early stages of user-facing application development to outline the page layout and communicate what items it should have. It is a quick way of prototyping and expressing design ideas, mainly when hand-sketched. However, they do not reflect the look and feel of a webpage, so they must be converted into refined design formats, such as mockups or prototypes. Once designers approach their goal, front-end developers take over and implement their ideas as technical artifacts (i.e., code). This phase is often laborious and demands collaboration across different domains, skills, and knowledge. Despite all the involved work, the outcome may not appeal to the client, so designers must return to the wireframes, and then developers must implement those changes. It usually takes several iterations to achieve the ideal result, causing a significant waste of time and resources.

In an attempt to reduce the impact of design on the overall process of web development, new research directions have been investigated that tackle the plausibility of designers deliver a more accurate and tangible product in a shorter time. One of the most promising approaches is to automate this process by using machine learning. To achieve high-quality results, we need a sufficiently large dataset with many samples. Collecting such a dataset was one of the biggest challenges since there is very limited availability of curated datasets that contain annotated wireframe sketches made by humans. One way to address this shortcoming is to make usage of synthetic datasets for training, followed by a (possible) subsequent fine-tuning with real-world scenarios. Recent developments approached the problem of generating hand-drawn wireframes using different techniques. Despite displaying promising results, they still exhibit limitations in terms of quality, accuracy, and variety of the wireframe samples, thus not being able to create the needed dataset for proper training.

In this work, we propose the development of a configurable and extensible tool —– WebWire —– capable of generating images of hand-drawn-like wireframes from real websites by relying on the rendered DOM of an HTML document as well as on additional information provided by the CSS to extract the layout of a given webpage. It subsequently uses computer-generated imagery (CGI) techniques to draw the page mimicking free-hand sketches, thus producing a wireframe.

We then compare the results with two alternative approaches — CSS Based and MockGen — and inquire humans how they perceive the resemblance of the produced sketch with a real website. We conclude that WebWire is better at representing and generalizing real-world examples, with 90.63% of participants (vs. 63.23% and 9.86%) stating that the wireframes resembled real websites and 57.37% (vs. 32.57% and 57.17%) being convinced that the results were drawn by a human.

With the levels of realism attained with our approach, we believe the results of this work can be directly applied to improve the quality and accuracy of the learning phase of current machine learning algorithms, possibly by reducing the required fine-tuning phase with real sketches and thus allowing them to generalize better and produce higher quality outcomes. Therefore, our contribution will hopefully impact the future of UI design in general, and websites in particular. Keywords: Wireframes. Reverse Engineering. Web Mining. Synthetic Datasets.

(6)

(7)

Resumo

Wireframessão representações visuais à mão levantada, com o mínimo de design gráfico e branding, e largamente utilizadas numa fase inicial de desenvolvimento de aplicações centradas no utilizador. Na concepção de páginas web, os profissionais gráficos usam-nas para capturar e comunicar de forma crua os elementos de interface e interacção constituintes de uma página, representando de facto uma técnica de ideação e prototipagem rápida. Por este motivo, os wireframes não refletem a aparência final de um website, e necessitam de ser subsequentemente convertidos em formatos mais refinados, como mockups e protótipos. Uma vez alcançados os objectivos, os programadores convertem tais representações em artefactos técnicos interpretáveis pelas máquinas; uma fase tipicamente trabalhosa que exige uma colaboração transdisciplinar que atravessa domínios, habilidades e conhecimentos. E não obstante o esforço empregue, o resultado final revela-se normalmente aquém das expectativas iniciais, o que obriga aos intervenientes a repetir este ciclo, numa tentativa iterativa e incremental de alcançar o resultado ideal, à custa de tempo e recursos.

Numa tentativa de reduzir este impacto, têm sido investigadas novas abordagens que permitem aos designers concretizarem artefactos mais precisos e tangíveis em menos tempo. Uma das actuais direcções promissoras usa-se de técnicas de aprendizagem automática. Apesar dos resultados impressionantes, esta tem necessidade de ser treinada com grandes quantidades de dados de qualidade; dados esses (wireframes) que se encontram ainda disponíveis com as características desejáveis. Em condições similares, vários domínios têm demonstrado a eficácia da utilização de dados sintéticos para o treino, seguidos de uma aprendizagem mais fina com dados reais. Trabalhos recentes têm estudado diferentes técnicas de sintetização promissoras, mas apresentando ainda claras limitações em termos da qualidade, precisão e variedade das amostras geradas.

Neste trabalho propomos um conjunto de técnicas reunidas numa ferramenta configurável e extensível —– WebWire —– capaz de gerar wireframes de grande realismo a partir de websites existentes. Esta abordagem analisa o DOM e o CSS de um documento HTML, de forma a extrair layouts de alta fidelidade. Técnicas posteriores de computação gráfica permitem-nos subsequentemente produzir imagens imitando esboços à mão livre de variados estilos.

Os resultados do nosso trabalho foram comparados com duas abordagens recentes — CSS Based e MockGen — e empiricamente validadas através da exposição das mesmas a pessoas sem conhecimento prévio sobre a sua natureza sintetizada. Foi-nos possível concluir que a nossa ferramenta apresenta evidências claras na sua superioridade em representar e generalizar exemplos do mundo real, com 90.63% dos participantes (vs. 63.23% e 9.86%) a afirmar que os wireframes se assemelhavam a websites reais e 57.37% (vs. 32.57% e 57.17%) convencidos que foram feitos por humanos. Graças aos níves de realismo alcançados, acreditamos que os resultados do nosso trabalho tenham impacto directo no futuro do design de interfaces, em particular de websites, pela redução da necessidade de um elevado treino com wireframes reais, traduzindo-se numa melhoria da qualidade e precisão obtidas em técnicas de aprendizagem automática.

Keywords: Wireframes. Reverse Engineering. Web Mining. Synthetic Datasets.

(8)

(9)

Acknowledgements

My deep gratitude goes first to my dogs, Boby and Kika, for always keeping me company since the first grade. For the past 17 years, they stood by me not only while studying for the university entrance exams but also while writing this dissertation during the COVID-19 pandemic. They made my days sunny and bright.

To my family, for always providing the best opportunities, education, and endless love. In a way, they shaped the woman I am today.

To my boyfriend, Luís, for his unconditional love and support. I am grateful for all the strawberries (and watermelons poorly sliced), conversations, adventures, and moments. Thank you for cheering me up whenever I needed and reminding me that I am “tudo”. Maybe one day, your music taste will be as good as mine.

To my best friend, Nene, for being my soulmate, always and forever. My heart goes to her, especially since we share the same passion for My Chemical Romance. Thank you for partnering up with me against the good and bad times.

To all my friends from HUMUS, for their friendship, understanding, kindness, drama, and rants. Because of them, I bring with me the best memories of the past five years at university. They taught me what it was like to be fully accepted and embraced.

To all the friends I have made through Twitter and during my journey on planet earth. No matter how small, every action counts, and I am grateful for all of them.

Lastly, I am extremely thankful for all the support, motivation, and guidance from my supervi-sors, André and Hugo. They played a vital part throughout this work and continuously encouraged me to go up and beyond.

Sofia Silva (Sia)

(10)

(11)

“One day your life will flash before your eyes. Make sure it’s worth watching.”

Gerard Way

(12)

(13)

1 Introduction 1 1.1 Context . . . 1 1.2 Problem . . . 3 1.3 Motivation . . . 3 1.4 Objectives . . . 4 1.5 Document Structure . . . 4 2 Background 5 2.1 Software Development . . . 5 2.2 UI/UX Design . . . 8 2.2.1 Wireframes . . . 9 2.2.2 Mockups . . . 11 2.2.3 Prototypes . . . 11 2.3 Web Development . . . 13

2.3.1 Document Object Model . . . 13

2.3.2 Styles and Frameworks . . . 13

2.4 Summary . . . 14

3 State of the Art 15 3.1 Synthetic Datasets . . . 15

3.1.1 Mockup Generator . . . 16

3.1.2 Computer Vision Based Generator . . . 18

3.1.3 CSS Based Generator . . . 20

3.2 Code Generation from Hand-Drawn Wireframes . . . 21

3.2.1 Uizard . . . 21

3.2.2 Sketch2Code: Generating a website from a paper mockup . . . 23

3.2.3 WebGen: Live Web Prototypes from Hand-Drawn Mockups . . . 25

3.3 Layout Extraction . . . 27

3.3.1 Page Segmentation Methods . . . 27

3.3.2 Wirify . . . 32 3.4 Summary . . . 32 4 Problem Statement 35 4.1 Current Issues . . . 35 4.2 Hypothesis . . . 36 4.3 Research Challenges . . . 37 4.4 Proposal . . . 38 4.5 Validation Methodology . . . 38 ix

(14)

4.6 Summary . . . 39 5 Implementation 41 5.1 Overview . . . 41 5.2 WebWire . . . 42 5.2.1 Configuration . . . 45 5.2.2 Inspector . . . 46 5.2.3 Render . . . 48 5.3 Dataset . . . 51 5.4 Discussion . . . 51 5.5 Summary . . . 54 6 Empirical Evaluation 55 6.1 Objectives . . . 55 6.2 Questionnaire . . . 56 6.2.1 Preliminary Assessment . . . 56 6.2.2 Results . . . 57 6.3 Visual Comparison . . . 58

6.3.1 Wireframes of Different Websites . . . 59

6.3.2 Wireframes of the Same Website . . . 64

6.4 Discussion . . . 64

6.5 Validation Threats . . . 68

6.6 Summary . . . 69

7 Conclusions and Future Work 71 7.1 Main Findings . . . 71

7.2 Main Contributions . . . 72

7.3 Open Challenges and Future Work . . . 73

7.4 Conclusions . . . 74

References 77 A List of URLs for Dataset 83 B Configuration Files Examples 85 B.1 Inspector Configuration File Example . . . 85

B.2 Render Configuration File Example . . . 89

C Wireframe Generation Examples 91 C.1 Example of a JSON file generated by Inspector . . . 91

C.2 Examples of wireframes generated by Render . . . 95

D Questionnaire 101

(15)

List of Figures

1.1 Hand-drawn paper wireframes of a website’s homepage . . . 2

2.1 Waterfall software development life cycle model . . . 6

2.2 Agile software development life cycle model . . . 7

2.3 The classical workflow for building apps and websites, from design to code . . . 8

2.4 Design evolution of a mobile app . . . 9

2.5 Hand-drawn user journey wireframes of a mobile app . . . 10

2.6 The difference between low fidelity and high fidelity wireframes . . . 10

2.7 Examples of elements commonly used to represent UI elements in wireframes . . 11

2.8 The difference between wireframe and mockup . . . 12

2.9 Paper prototyping of a mobile app . . . 12

2.10 The HTML DOM tree of nodes and objects . . . 13

3.1 MockGenhigh-level dataset generation overview . . . 17

3.2 Example of a wireframe generated using MockGen . . . 18

3.3 Sketch2Codedataset generation overview . . . 19

3.4 Example a hand-drawn wireframe generated from a synthetic website through CSS modifications . . . 20

3.5 Overview of Uizard . . . 21

3.6 Overview of the pix2code model architecture . . . 22

3.7 A sample from pix2code dataset . . . 23

3.8 Example of Robinson’s Sketch2Code approach using deep learning segmentation 24 3.9 High-level overview of the WebGen pipeline . . . 25

3.10 Example of Ferreira’s WebGen approach of website generation from a hand-drawn wireframe image . . . 26

3.11 Top-down page segmentation . . . 28

3.12 VIPS algorithm overview . . . 29

3.13 Block-o-Matic’s webpage segmentation model . . . 31

3.14 Digital wireframe generation from the CNN website using Wirify . . . 33

5.1 Example of a Bootstrap website and its hand-drawn wireframe using WebWire . . 42

5.2 High-level overview of WebWire . . . 43

5.3 High-level overview of the Inspector module . . . 46

5.4 High-level overview of the Render module . . . 49

5.5 Point displacement process . . . 50

6.1 Highly accurate wireframe generation of a Bootstrap template using WebWire . . 59

6.2 Accurate wireframe generation of a Bootstrap pricing page using WebWire . . . . 60

(16)

6.3 Highly accurate wireframe generation of a Bootstrap template using WebWire with element overlap . . . 61 6.4 Highly accurate wireframe generation of the Bootstrap website using WebWire . . 62 6.5 Accurate wireframe generation of the GitHub Marketplace webpage using WebWire

with element overlap . . . 63 6.6 Version one of the wireframes generated by WebWire of the GitHub Education

webpage . . . 65 6.7 Version two and three of the wireframes generated by WebWire of the GitHub

Education webpage . . . 66 6.8 Version four and five of the wireframes generated by WebWire of the GitHub

(17)

List of Tables

5.1 A sample of URLs from the 55 websites used to create the dataset . . . 52

6.1 Summary of the questionnaire results regarding wireframe resemblance to real websites . . . 57

6.2 Summary of the questionnaire results regarding wireframe creation method . . . 58

6.3 Summary of the questionnaire results regarding how semantic elements influence wireframes to resemble real websites . . . 58

A.1 List of URLs used by WebWire to create a dataset . . . 84

E.1 Questionnaire results . . . 126

E.2 Questionnaire feedback . . . 127

(18)

(19)

Abbreviations

ANN Artificial Neural Network APP Application

CNN Convolutional Neural Network CSS Cascading Style Sheets DOM Document Object Model DSL Domain-Specific Language GUI Graphical User Interface HCD Human-Centered Design HCI Human–Computer Interaction HTML HyperText Markup Language JSON JavaScript Object Notation LSTM Long Short Term Memory MLP Multilayer Perceptron MVP Minimum Viable Product ReLU Rectified Linear Unit

SDLC Software Development Life Cycle UI User Interface

URL Uniform Resource Locator

UX User Experience

W3C World Wide Web Consortium XML Extensible Markup Language XPath XML Path Language

(20)

(21)

Chapter 1

Introduction

1.1 Context . . . 1 1.2 Problem . . . 3 1.3 Motivation . . . 3 1.4 Objectives . . . 4 1.5 Document Structure . . . 4

This chapter gives an introduction to this dissertation by contextualizing and supporting it through its motivations and objectives. Section1.1presents the background in which the problem on the conception of websites and their design emerges. Section1.2(p.3) defines the problem by introducing current limitations and existing techniques. Section1.3(p.3) describes the motivations behind this problem. Section1.4(p.4) outlines the goals. Finally, Section1.5(p.4) explains the structure and organization of this document.

1.1 Context

The software development process includes several different phases. Typical top-down approaches (cf. Section2.1, p.5) tend to introduce some precedence to these phases. First comes the project definition, where teams and stakeholders work closely to gather all the requirements, such as its purpose, goals, target audience, and features. After analysis, the project scope plan is conceived, which outlines specific activities and deliverables, along with specific timelines. Then, the software will be designed, developed, tested, and deployed. For the scope of this dissertation, we will focus on the development of websites and the design and implementation stages.

Designers are one of the most valuable and influential members of a web development team. Designing an accessible and usable website requires understanding users and how they may interact with its interface, i.e., the user experience. Human-Centered Design (HCD) is an approach particularly concerned with this concept, where designers strive to fulfill human needs when designing, testing, and iterating to achieve the ideal design [Tid10,Nor13]. In an early stage of

(22)

developing user-facing applications, such as websites, they begin with sketching wireframes on paper according to the project’s requirements, as exemplified in Figure 1.1. A wireframe is a low-fidelity design document, which visually represents a user interface, stripped of any graphic design, styles, or branding elements. It is a quick way of prototyping and expressing ideas, mainly when hand-drawn [LM95,InVd].

Figure 1.1: Hand-drawn paper wireframes for Taskly, a project management app. This figure represents the homepage of its website on mobile and desktop devices. [Sam]

As wireframes do not reflect the look and feel of a webpage, only the blueprint, designers have two alternatives, particularly (a) design a high-fidelity prototype using sophisticated tools or (b) resort to developers to turn their initial design into code. The latter holds many risks, as developers do not often have designing skills, and most of the visual style is missing. Therefore, they usually pursue the first alternative, which guarantees that developers will clearly understand what to implement without wasting time on design decisions (cf. Section2.2, p.8).

Once the design is approved, the developers will convert the high-fidelity prototype into code. This phase is often laborious and redundant (since the same work is converted from one format to another), and it requires a good collaboration between the designer and the developer [Ton17]. Finally, the website goes through testing and review. Unfortunately, it may not match the demands of stakeholders, so designers must go back to the wireframes, and developers must implement those changes. This cycle repeats as many times as needed, and when everything is approved, the website is ready to be launched (cf. Section2.1, p.5).

(23)

1.2 Problem 3

1.2 Problem

As stated above, the prototype design cycle repeats as demanded, which can be resource and time-consuming. Thus, different solutions have been conceived, attempting to reduce the impact that this cycle has on the overall process of web development (cf. Section3.2, p.21). The most successful approach to automate this cycle is to use machine learning. But, to achieve high-quality results, a sufficiently large dataset with many samples is needed [Jyo18]. Collecting such a dataset turns out to be one of the biggest challenges due to the limited availability of curated datasets that contain annotated wireframe sketches made by people. The answer usually is to use synthetic datasets for training, followed by a (possible) subsequent fine-tuning with real-world scenarios.

Current solutions can be divided into three different approaches. The solutions that (a) use a mockup generator that places high-level elements randomly in a 2D plane [dSF19], (b) find real websites, and automatically sketch them through computer vision techniques that extract the high-level structure from screenshots [Rob19], and (c) take an existing dataset of synthetic websites and modify each CSS stylesheet [Ash18].

These solutions face numerous challenges, mainly the need to collect a dataset that truly reflects real websites and mimics free-hand sketches of wireframes so that the trained model can achieve more reliable and accurate results. Despite showing promising results, the synthetic dataset generators (i) did not create wireframes that resembled real-world examples, (ii) did not generalize well, or (iii) used complex and unpredictable methods to create the wireframes (e.g., the use of computer vision techniques to deduce the structure of a webpage) (cf. Section4.1, p.35).

1.3 Motivation

Designers often make use of free-hand sketches of wireframes to convey their ideas with the team and stakeholders. These sketches allow them to communicate their thoughts in a short amount of time. However, one drawback of this approach is the representation itself, which may not represent the look and feel of the final webpage [LM95]. Therefore, a high-fidelity prototype must be designed so that developers do not make any assumptions on visual styles nor waste time on them. It usually takes several iterations to achieve the ideal result as it may not match the client’s vision, causing a significant waste of time and resources.

Recent works attempted to reduce this cycle with the use of machine learning (cf. Sec-tion3.2, p. 21) and created different synthetic dataset generators (cf. Section 3.1, p.15). The authors focused on the machine learning methodologies employed and architecture rather than on the development of a generator capable of producing a whole and accurate dataset. Although these generators had an overall satisfactory performance, we believe that there is a method that will increase both quality and variety of the generated wireframes, thus being able to create the needed dataset for proper training. By improving existing solutions of code generation from wireframes, we will provide a faster design iteration, accessibility, and the need for no middlemen (such as developers). Overall, designers will be capable of delivering a more realistic and tangible product.

(24)

1.4 Objectives

This dissertation seeks to explore whether there is a better method capable of generating hand-draw-like wireframes that resemble real-world websites and designs. Thus, it aims to improve existing machine learning techniques of shortening the prototype development cycle by contributing to the research field of prototype generation from wireframes.

By the end of this research, the tool developed should (a) rely on the overall structure of an existing webpage in a more predictable manner (which may imply initially discarding computer vision techniques), (b) be able to generalize authentic wireframes, and (c) draw wireframes in a way that mimics hand-drawn sketches. Lastly, it should outperform existing approaches and be able to easily generate a dataset. We will end this dissertation by conducting an empirical study, where participants will evaluate the wireframes generated by our tool and alternative approaches, thus determining which has a better performance. Besides, we will analyze the quality and accuracy of the wireframes when compared to current designs.

1.5 Document Structure

This chapter presented a prelude to the research carried out, addressing the context, the problem that it strives to solve, the motivation, and objectives. The remaining of this document contains six more chapters, and is structured as follows:

• Chapter2(p.5), Background, introduces the background that supports this research. • Chapter3(p.15), State of the Art, creates an analysis of the current state of the art related

to the conception of synthetic datasets of hand-drawn wireframes and to approaches that automatically generate websites using them. Different proposals emerged over the years but may not employ the best strategy. Additionally, this chapter presents the existing methodologies of page segmentation to obtain the structure of a webpage.

• Chapter 4(p. 35), Problem Statement, focuses on the issues and limitations of current wireframe generators, the hypothesis in which this dissertation believes, and the research challenges identified. Also, it briefly describes the proposed solution and the validation methodology to be conducted.

• Chapter5(p.41), Implementation, tackles the main research challenges and describes, step by step, the implementation of the proposed solution.

• Chapter6(p.55), Empirical Evaluation, presents the evaluation strategies used to validate the developed tool, analyzes in detail the obtained results and reflects on them.

• Finally, Chapter7(p.71), Conclusions and Future Work, presents the conclusions drawn by the author and the work to be developed in the future.

(25)

Chapter 2

Background

2.1 Software Development . . . 5 2.2 UI/UX Design . . . 8 2.3 Web Development . . . 13 2.4 Summary . . . 14

For the purpose of this dissertation, we regard that the design and development of websites, web apps, and mobile apps follow roughly the same approach. Moreover, in what concerns mobile apps, although they incorporate elements of human-computer interaction (HCI) that are substantially different from a typical website, we observe that most of the design tools blur this distinction. Despite most figures present mobile examples, we focus on the process itself, which websites and web apps also use.

In this chapter, we briefly describe and contextualize the most relevant topics approached in this dissertation, which are necessary for its good understanding. Section2.1presents the current methodologies of software development and drawbacks. Section2.2(p.8) explains the different types of design, process, and techniques. Section2.3(p.13) describes web development, how websites are built, styled, and rendered. Finally, Section2.4(p.14) summarizes this chapter.

2.1 Software Development

According to IBM Research [IBM], “software development refers to a set of computer science activities dedicated to the process of creating, designing, deploying, and supporting software.” Nowadays, we can build several types of software, but in this research, we will focus on the development of websites.

The software development process, also known as the software development life cycle (SDLC), is the process of dividing the work into distinct phases to improve the design and project management. Although there are numerous methodologies available, large-scale projects usually require an agile approach. When it comes to smaller projects where requirements are clearly defined and very well

(26)

understood, the Waterfall model is a good alternative. Certainly, Waterfall is the most traditional and sequential choice, but it does not mean it suits every project. The main issue with this approach is its lack of flexibility. The decisions made in the beginning must prevail, and if toward the end stages changes or mistakes need to be addressed, the Waterfall method generally requires a full restart [Mar18,Maj19].

Confronted with the above issues, the need for a flexible software development methodology was clear. For this reason, Agile was created. This method focuses on being highly iterative and incremental [WC03]. Also, it accommodates change and allows to build software faster. Scrum and Kanban are very known methods that implement agile.

When comparing the life cycle of two different methods, such as Waterfall (cf. Figure2.1) and Agile (cf. Figure2.2, p.7), we learn that despite the software development methodology used, the SDLC stages are identical. Therefore, as pointed by Kamatchi et al. [KIS13], the same occurs in web development. Requirements Analysis Design Implementation Testing Deployment Maintenance

Figure 2.1: Waterfall software development life cycle model, adapted from [FAF09]. Some modified models introduce feedback loops (pictured as a dashed arrow), either connecting the end-points or between different phases.

For any software development project, a methodology should be followed to ensure project consistency and completeness. But despite the method adopted, the web development life cycle includes the following phases:

1. Determine the project’s requirements and scope: This step is all about information-gathering. It is crucial to understand more about the client’s business and industry, their target audience and customers, the ultimate goal for the website and its purpose. In the end, the requirements for the project should be well defined.

2. Analyze the requirements and plan: Research and planning help to clarify the goals for the website and guide the design. The work to be done in the following phases must be planned according to the software development methodology employed.

(27)

2.1 Software Development 7 Requirements Analysis Design Implementation Testing Deployment Release

Figure 2.2: Agile software development life cycle model. This method breaks the product into small incremental builds, also known as sprints (pictured as a dashed arrow).

3. Design: Once the planning is complete, most designers start with sketching wireframes, which will eventually grow into mockups and prototypes. A wireframe is a low fidelity design document that outlines the basic structure of the website and does not contain any visual styles or branding elements. If possible, it should be reviewed by the client, and once approved, turned into a mockup [CN07,SMMS11].

4. Implementation: When the design is finished, it is translated into actual code by developers. This stage is often the most lengthy, and it can involve multiple people. Designers should confirm the final design with the client before moving on to any development since it is easier to try out ideas before they are converted into code.

5. Test and gather feedback: Finally, the website goes through testing and review. This phase shows if it meets the user’s expectations in terms of performance, usability, accessibility, and functionality. In order to accurately identify problems, a user experiment should be conducted and feedback given by the client.

6. Maintain: In agile methodologies, the website is deployed after testing to have a minimum viable product (MVP), and if the requirements are not fulfilled and testing fails, a new iteration is needed. Otherwise, it can be delivered to the client. In a sequential approach, such as Waterfall, the website is deployed and shipped only when all the requirements are fulfilled and validated. Next, the product enters its maintenance phase, which includes updating technologies, content, and correct bugs. If significant changes need to be made, the life cycle restarts.

In light of the aforementioned process, Figure2.3(p.8) illustrates in detail steps three and four. The majority of designers prefer to initially draw their ideas on paper, a whiteboard, or graphic

(28)

tablet instead of using a wireframing tool like Balsamiq1. Although these tools are enough for beginners, for proficient designers, it constrains their thoughts and breaks their creative flow. But, regardless of the method chosen, designers then have to recreate their drawings to get the layout approved by the client. This can be done either in a wireframing tool or by directly crafting the user interface in a more advanced design tool, like Sketch2. Consequently, as Beltramelli [Ton17] claims, designers must redo the same work twice by converting it from one format to another.

Figure 2.3: The classical workflow for building apps and websites, from design to code, adapted from [Ton17]. Some of its steps are redundant, requiring time that (a) designers could use to focus on creative tasks, and (b) developers could use to implement core functionalities and optimize the user experience.

Once designers finish the final design document, which could be a mockup or prototype, they transfer their work to front-end developers to have it implemented in code. The author also states that “implementing user interfaces consist in re-creating in code what the designers created graphically in a software.” Developers indeed want to focus on implementing core functionalities and optimizing the user experience, but they end up wasting most of the time coding user interfaces.

According to Figure2.3and as explained above, there are two steps that aim to achieve the same purpose in the design to code workflow, in what respects the “visualization” of the final artifacts. These steps bring no value to the feedback, as their only purpose is to convert the same user interface in a different format to enable the next steps. These conversions are time-consuming and expensive, and prevent quicker iterations to reach the final goal; this is due to the mismatch between the tools and artifacts used to sketch and those used to produce the end-results. This problem can be found in almost all software-development related activities [ARC+19].

2.2 UI/UX Design

User Interface (UI) and User Experience (UX) design are terms interchangeably used in web and app design. Although they are subsets of the broader category of design (cf. Section2.1, p.5) and both focus on the user interface, they have different purposes. UI design consists of the design of the graphical layout of an application, i.e., the components users interact with, such as buttons, screen arrangement, content, transitions, and animations. Therefore, every visual element must be

1_{https://balsamiq.com} 2_{https://www.sketch.com}

(29)

2.2 UI/UX Design 9

designed. On the other hand, UX design’s biggest concern is the user’s experience — how users interact with the application and with what UI designers created. Therefore, whereas UI designers concentrate on the aesthetics and decide how the user interface will look, UX designers determine how the user interface is structured and operates [UX 19].

Research shows that there is a misconception on the definition of wireframes, mockups, and prototypes. Unfortunately, at the time of this writing, no document accurately explained and distinguished these concepts. For this dissertation, we followed the guidelines of InVision3, a company with a prestigious product design platform. These concepts are well documented on their website, and we believe they are a reliable source.

The design of web and app prototypes, handled by UI/UX designers, makes use of various programs and tools to achieve the intended look [Ske,Fig,Adoc,InVa,Mar, Bal, Wira,Adob, Adoa]. Designers need to consider their target audience, the purpose of the website, and the visual appeal of the design. Hence, a good design is usually considered one that fulfills the following desiderata: easy to use, accessible, aesthetically pleasing, and suits the user group and brand.

The design process can be broken down into three phases: wireframes, mockups, and prototypes (cf. Figure2.4). Despite being the most common sequence, the design might not go through all the stages or have slight variations. It depends on the designer, team, and project [Jar18].

Figure 2.4: Design evolution of a mobile app. The design goes through three different phases: (i) wireframe, a static representation of the structure and functional requirements, (ii) mockup, essentially a wireframe but with visual design, and (iii) prototype, an interactive mockup. [InVb]

2.2.1 Wireframes

A wireframe is an outline of the layout of a webpage (or screen when referred to applications) that demonstrates what interface elements will exist on key pages. It is a low fidelity design document due to its simplicity and lack of visual styles and branding elements. Besides, it aims to provide

(30)

a basic visual understanding of a page early in a project to get stakeholders and team approval before the creative phase gets started. Wireframes can also be used to display navigation for user experience purposes and to ensure it meets expectations (cf. Figure2.5).

Figure 2.5: Hand-drawn user journey wireframes of a mobile app. Adapted from [Cle].

Digital and hand-drawn wireframes are respectively low-fidelity and high-fidelity wireframes (cf. Figure2.6). A low-fidelity wireframe is useful for early design stages and rapid iterations. It helps designers to quickly visualize rough ideas, create an initial model for the overall layout, and form a navigational structure. A high-fidelity wireframe is more detailed but yet simple. It is used for mocking up the final versions before adding any visual design [InVd].

Figure 2.6: The difference between low fidelity and high fidelity wireframes. Adapted from [Win19].

When it comes to wireframes, designers have different practices and preferences, they can (a) start with hand-drawn wireframes and then immediately craft mockups, (b) start with digital wireframes and then convert them into mockups, or (c) start with hand-drawn wireframes, convert them to a digital format and then to mockups.

(31)

2.2 UI/UX Design 11

Despite the existence of digital wireframing tools [InVa,Mar,Bal,Wira], most designers will often start with sketching on paper with a pen [CN07,LM95]. According to Landay et al. [LM95], this is explained by designers usually having a background in art, feeling restricted by digital tools, or being easier, with no need to learn any software. Although there is no agreed standard, wireframe sketches often use a similar set of symbols that have commonly understood meanings [Exp,InVd]. Figure2.7illustrates some of these elements.

(a) Image (b) Paragraph

(c) Button (d) Dropdown (e) Text

(f) Checkbox (g) Radio (h) Burger (i) Title (j) Link

(k) Text Field

Figure 2.7: Examples of elements commonly used to represent UI elements in wireframes.

2.2.2 Mockups

A mockup is a static high fidelity design document. Unlike a prototype, it does not contain any interactions. It proposes the final look of the design and is usually built between wireframing and prototyping. Wireframes are designed to represent the structure and functional requirements, which are then featured in mockups. Therefore, mockups are essentially wireframes with visual design, such as images, colors, and typography (cf. Figure2.8, p.12) [InVb].

2.2.3 Prototypes

A prototype is an interactive and dynamic design document, i.e., it behaves and operates as the final product. Although prototypes are not full-featured, they are more straightforward and quicker to

(32)

Figure 2.8: The difference between wireframe and mockup.

construct than code implementation, which makes them valuable for user testing. Hence, flaws can be found and alterations made before shipping the design to developers.

For designers, a prototype is usually perceived as a functional high-fidelity mockup and requires specific software programs and tools, such as Figma4. However, prototyping has three different levels: (i) low-fidelity, known as paper prototypes (cf. Figure2.9), which resembles hand-drawn wireframes and centers more on navigation, information architecture and user experience, (ii) mid-fidelity, which resembles digital wireframes and is the start of mocking up the actual interface, and (iii) high-fidelity, essentially digital mockups with high-quality visuals, interactions, and contents [InVc,Sum20].

Figure 2.9: Paper prototyping of a mobile app. [Sha]

(33)

2.3 Web Development 13

2.3 Web Development

Web development refers mostly to the implementation phase of software development focused on the web. When the design is approved, developers are responsible for converting it into code, thus creating websites. Although they can be built with only HTML and CSS, modern designs require the additional usage of the latest technology and frameworks.

HTML is crucial in the development of websites because it defines the structure of a webpage. It specifies whether the content is recognized as a heading, paragraph, link, image, button, or one of many other available elements. Plus, it defines how these elements are positioned relative to one another. Unfortunately, browsers do not strictly enforce HTML standards, and this leads to poor and sometimes invalid code [Mozc,Rob19].

2.3.1 Document Object Model

According to Mozilla [Mozb], “the document object model (DOM) is a programming interface for HTML and XML documents. It represents the page so that programs can change the document structure, style, and content.” The DOM is generated by browsers when loading a webpage and constructed as a tree of nodes and objects. Figure2.10illustrates an example of an HTML DOM tree, where the branches represent containers, and the leaves are elements that contain content. For this reason, programming languages like JavaScript can communicate to the page [W3S].

Figure 2.10: The HTML DOM tree of nodes and objects. [Sni]

2.3.2 Styles and Frameworks

Cascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in HTML or XML. CSS describes how elements should be rendered on screen, on paper, or other media. In other words, it is used to arrange and style the webpage, e.g., to alter the font, color, size, or add animations and other decorative features [Moza].

(34)

In order to facilitate the development of websites, using frontend frameworks and libraries is highly recommended. One of the most popular is Bootstrap5, which is an open-source toolkit for developing with HTML, CSS, and JavaScript. It is mainly known for quickly building responsive applications. Also, Bootstrap enforces good programming practices, W3C standards, and document cohesion.

2.4 Summary

This chapter covered the most relevant topics approached in this dissertation, which are necessary for its good understanding. First, we explained in detail the software development process and corresponding methodologies, such as Waterfall and Agile (cf. Section 2.1, p.5). Despite the method adopted, both design and implementation phases are vital in this process. However, they hold a few redundant steps, which prevent teams from spending more time on iteration cycles to improve the product, thus wasting time and resources.

Afterward, we described the notions of UI/UX design and the design process (cf. Section2.2, p.8). This process is broken down into three phases: wireframes (cf. Section2.2.1, p.9), mockups (cf. Sec-tion2.2.2, p.11), and prototypes (cf. Section2.2.3, p.11). For the scope of this dissertation, we focused on the first phase. A wireframe is a low fidelity design document due to its simplicity and lack of graphical components, visual styles, and branding elements. It aims to provide a basic visual understanding of a page early in a project. Despite the existence of digital wireframing tools, most designers often start with sketching on paper with a pen.

Lastly, we outlined the foundations of web development and defined its key concepts (cf. Sec-tion2.3, p.13). Additionally, we explained what the document object model (DOM) represents (cf. Section2.3.1, p.13) and introduced the modern styles and frameworks used in current websites (cf. Section2.3.2, p.13).

(35)

Chapter 3

State of the Art

3.1 Synthetic Datasets . . . 15

3.2 Code Generation from Hand-Drawn Wireframes . . . 21

3.3 Layout Extraction . . . 27

3.4 Summary . . . 32

This chapter addresses the different and most relevant tools and studies about the generation of synthetic datasets of hand-drawn wireframes of websites. Section3.1presents distinct synthetic dataset generators built to train machine learning solutions that attempt to convert freehand wire-frames into code. Section3.2(p.21) describes the existing machine learning methodologies that automate the process of designing web and mobile apps. Section3.3(p.27) introduces techniques for extracting the structure of a webpage’s layout. Finally, Section3.4(p.32) summarizes this chapter.

3.1 Synthetic Datasets

When testing and evaluating a solution, researchers usually resort to samples of data collected from real-life processes [ddd+18]. Common examples include observable variables during extreme weather phenomena, such as temperature, air pressure, and wind velocity; or hematological characterization of blood from people with certain conditions. These datasets are valuable because they contain information that we know to occur in real-life.

But there are also drawbacks, particularly in the control of specific features that might present themselves very rarely, or that are impractical to measure. Black-holes, for example, pose a problem for physicists, as up until very recently, no direct mechanism existed to probe their associated processes. Another problem is volume: some of these datasets are extremely valuable precisely because collecting them involves non-trivial costs and effort. If we consider some of the recent advancements in machine learning (particularly in deep learning), we realize that for them to generalize correctly, they need significant quantities of training data; data that might (a) not

(36)

exist, (b) lack the necessary quality and quantity, or (c) not be promptly available due to privacy claims [Tir18].

One way to solve this problem is to resort to the usage of synthetic datasets, i.e., repositories of programmatically generated data that were artificially produced rather that obtained from real-life events, such as surveys and experiments. These datasets have the benefit of providing finer control of data features, enabling researchers to manipulate certain aspects and then analyze their impact in isolation, as well as being inherently scalable (as they support property modification at will). There are numerous examples on using synthetic datasets to augment the shortage of data, from Ekbatani et al. [EPS17], that built a model for counting the number of pedestrians in street photos, as well as Bernardino et al. [BTF18], that used a 3D human-body model for simulating recovering patients with knee injuries.

Depending on the specific research needs, usually, a model is first developed from which it is possible to derive a generator that produces the intended data. This model is usually comprised of a set of mathematical formulas, algorithms, categories, characteristics, relationships, event causalities, or any other mechanism that captures the intended phenomena. This approach also tends to promote future work, as once a realistic model is achieved, researchers can make it public, and subsequently lead to the generation of new datasets of the same or different sizes with a variety of characteristics.

For the scope of this dissertation, we will focus on the creation of synthetic datasets to be applied in machine learning solutions that attempt to convert hand-drawn wireframes into code, as sourcing a high-quality and large dataset for this domain is challenging, mainly due to the limited availability of curated and annotated sketches made by designers. The drawback is that the current models are typically incomplete, unrealistic, or lack finer details that prevent learning algorithms from generalizing when presented with real sketches. Therefore, a two-step approach of first using synthetic datasets for bulk training, followed by real datasets for finetuning, is a strategy that has been demonstrated to lead to better performance. The most recent approaches are presented in more detail in Section3.1.1, Section3.1.2(p.18), and Section3.1.3(p.20).

3.1.1 Mockup Generator

Regarding website generation from mockups, the favored strategy is the use of machine learning methods, particularly deep learning, based on artificial neural networks (ANNs) with feature learning. These networks usually require large amounts of samples to train models, but collecting such data is a challenge because it might (a) not exist, (b) lack quality, (c) miss valuable annotations, and (d) not be suitable for the project’s needs. Therefore, the solution is either to manually create the dataset or generate a synthetic dataset.

Ferreira [dSF19] proposed a method to automatically build websites from hand-drawn mockups and created a synthetic dataset generator of hand-drawn wireframes. The generator, MockGen, considers two types of elements used to represent user interfaces: (i) atomic elements, such as buttons and dropdowns, and (ii) containers, which groups these elements. In terms of container hierarchy, only a depth of two is achieved, i.e., there is just one container layer. However, to have more complex hierarchies, the generation of containers step can be recursively used. The tool uses

(37)

3.1 Synthetic Datasets 17

HTML, CSS, JavaScript, and NodeJS to render an interactive web and command-line interface, which allows us to generate large quantities of samples. Additionally, RoughJS1is used to mimic hand drawings and to achieve a more human-like result. Figure3.1depicts a high-level overview of MockGen’s process, which will be described below.

(a) (b) (c) (d) (e)

Figure 3.1: MockGen high-level dataset generation overview. [dSF19]

(a) Mockup dimension and leftover calculation. (b) Mockup translation. (c) Container placement. (d) Element area definition. (e) Element placement.

First, the wireframe boundaries are determined. Given a canvas, which holds the available drawing area, with a predefined width and height, the working area’s dimensions are set according to its size. The working area is where the drawing of the wireframe will take place and is constituted by an arbitrary area made of cells, thus acting as a grid where the elements are placed. Then, the cells are assigned to a set of containers, which group UI elements together, so the container area must be greater than a single cell. Next, the cells, which are now linked to containers, are also assigned to areas designating different types of UI elements. Finally, these areas are filled with the drawing of the UI element they represent.

When drawing the elements, each undergoes a set of operations to add variation and give a hand-drawn feel to the wireframe, with the intent of making them look closer to non-synthetic. Each element shape is defined by points, e.g., a square-shaped element has four points, and endures two consecutive adjustments: (i) each point is moved by a random value, and (ii) new middle-points are added to each line (which is limited by two vertex points that defined the shape) so that it is no longer entirely straight. Plus, with the RoughJS library, it is possible to create multiple drawing styles, since brushes can have different weights, precision, among other features.

Essentially, MockGen places high-level elements randomly in a 2D plane, which leads to wireframes not resembling real-world websites (cf. Figure3.2, p.18). This could influence the quality of the generated websites, but it has yet to be confirmed. The author concluded that the generator could be improved by removing the arbitrary placement, using distortion filters, and using annotations to define the style, alignment, and behavior of elements.

(38)

(a) Input (b) Generated (c) Containers (d) Annotation

Figure 3.2: Example of a wireframe generated using MockGen. [dSF19]

3.1.2 Computer Vision Based Generator

The development of user-facing applications, such as websites, requires many iterations and close collaboration of designers, developers, and stakeholders. After gathering all the requirements and planning the following phases, the first step is usually the making of wireframes to outline the layout of the interface. Then, wireframes are transformed into mockups and/or prototypes. Once approved, it is forwarded to developers to implement in code, which is extremely time-consuming and demands experienced developers.

Robinson [Rob19] attempts to automate the aforementioned process with two different ap-proaches that translate wireframe sketches directly into code by using (a) classical computer vision techniques and (b) deep semantic segmentation networks. Both required a large-high-quality dataset of wireframes and corresponding code, yet, the author was unable to find such a dataset. Therefore, they decided to create their own and build it from websites, as it is more accessible and easier than mobile or desktop applications.

The author considered three methods to generate a synthetic dataset: (a) find websites and manually sketch them, (b) sketch websites manually and build the matching website, and (c) find websites and automatically sketch them. Since deep learning requires large datasets with many samples, both (a) and (b) would need significant resources. Besides, the quality of the dataset is defined by human error and different wireframing techniques. Consequently, only the latter was pursued, resulting in a tool that finds real websites (rather than randomly create them) and automatically draws them through computer vision techniques that extract the high-level structure from screenshots (cf. Figure3.3, p.19). By opting for this method, they were able to collect a wide variety of website wireframes and respective structural data.

When collecting the structure from websites, Robinson found several challenges: (a) wireframes have a smaller element set than HTML, e.g., an image can be a img or svg tag, (b) websites have different styles for identical elements, like buttons, and wireframes have only one, (c) wireframes are static while websites can have animations and transitions, (d) wireframes represent structure with no content, yet websites have content that changes its structure, e.g., pages can have a p element with a different number of lines, (e) HTML code can be invalid or poorly formatted, and

(39)

3.1 Synthetic Datasets 19

(a) Original website (b) Normalized version (c) Sketched version

Figure 3.3: Sketch2Code dataset generation overview. The original website (a) is normalized to simplify the extraction of its structure. Different colors are assigned to different elements, resulting in (b). Then, it uses computer vision techniques to extract the structure of the website. Lastly, it creates a wireframe (c) of the website by replacing elements with random sketches. [Rob19]

(f) in HTML the same structure has multiple possible representations.

To mitigate the above challenges, they developed a website normalization process using PhantomJS2. This process assigns HTML tags to broader categories (e.g., elements with img and svgtags are identified as images), replaces all styles, removes JavaScript and animations, and lastly, sets the width of titles and paragraphs to 100% and of other elements to constant. However, challenges (e) and (f) remained unsolved.

To extract the structure of a webpage, the author considered to (a) directly parsing the HTML, or (b) use computer vision to obtain structure from a screenshot. They opted for the latter because parsing HTML had many corner cases and did not solve the problem of multiple representations of the same structure (which computer vision solves). Besides, learning the exact position of elements was a non-trivial task due to CSS. The use of computer vision techniques to extract the structure of the website had three phases: (i) screenshot the webpage, (ii) extract elements from the screenshot, and (iii) deduce a structure from the extracted elements. Phase (ii) returns a list of elements and their properties but no hierarchical information. Thus, an algorithm was developed to build a tree based on bounding box hierarchy.

Furthermore, a total of 90 different sketches of UI elements were made to generate hand-drawn wireframes. For each website, every element is replaced with a randomly selected sketched version of the matching category, then, each element undergoes random variations of translation, scaling, and rotation to resemble real hand drawings. To maximize code quality, structure cohesion, and semantics, the author curated a list of 1750 Bootstrap website templates.

(40)

3.1.3 CSS Based Generator

The work of Kumar [Ash18] uses modern deep learning algorithms to significantly improve the design workflow and allow quick creation and testing of websites. The author combined two existing models: pix2code3and Screenshot to Code4. However, instead of using images of web-based user interfaces as input, they favored images of hard-drawn website wireframes. As stated in the aforementioned generators, a sufficiently large dataset of hand-drawn wireframe sketches and corresponding HTML code does not yet exist, so one had to be created.

In order to generate such a dataset, the author modified Beltramelli’s open-source dataset, which consists of 1750 screenshots of synthetically generated websites and associated source code, to make it resemble hand-drawn wireframes (cf. Figure3.4). Instead of using computer vision techniques to extract the information from the screenshots, the CSS stylesheet of the original websites was directly modified by performing the following operations:

1. Change the border-radius of buttons and containers so that the corners are rounded; 2. Adjust the border-width thickness to simulate drawn sketches;

3. Add drop shadows to borders;

4. Replace the website’s font with a handwriting typeface;

5. Add skews, shifts, and rotations to simulate the variability is hand-drawn sketches.

(a) Website from pix2code’s dataset (b) Hand-drawn website wireframe

Figure 3.4: Example a hand-drawn wireframe generated from a synthetic website through CSS modifications. [Ash]

By the time of this writing, there was no formal document explaining precisely how the author generated this dataset. We believe a pipeline was developed to automatically perform the above operations, but it is unknown how the webpage’s elements that go under these operations are detected and selected. Furthermore, we noticed that although the author claims only CSS modifications are made, the textual content is also modified.

3_{https://github.com/tonybeltramelli/pix2code} 4_{https://github.com/emilwallner/Screenshot-to-code}

(41)

3.2 Code Generation from Hand-Drawn Wireframes 21

3.2 Code Generation from Hand-Drawn Wireframes

Recently, different solutions have been conceived, attempting to reduce the impact that design has on the overall process of web development. The main approach to automate this process is to use machine learning. As discussed in the previous section, to achieve high-quality results, we need a sufficiently large dataset with many samples. Obtaining a curated dataset of hand-drawn wireframes with similar notation was a difficult obstacle to overcome due to its very limited availability. The solution was to use synthetic datasets for training, followed by a (possible) subsequent fine-tuning with real-world scenarios. In the next subsections, we describe some of the tools that already exist related to code generation from synthetic hand-drawn wireframes.

3.2.1 Uizard

Uizard5is a tool that automatically transforms hand-drawn mobile wireframes into digital design files and front-end code. This tool had several scientific contributions, such as pix2code6 and code2pix7. Although by the time of this writing, there was not a formal document explaining in detail their methodologies, we were able to take a look at their beta version.

According to Beltramelli [Ton19], their technology “uses computer vision and machine learning to transform wireframe images to high-fidelity mockups automatically. It has a built-in style guide system to customize components and a prototype engine to build interactive multi-screens user flows. It can export to Sketch files containing ready-to-use symbols, and to front-end code such as HTML/CSS.” Figure3.5gives an overview on how Uizard works.

Figure 3.5: Overview of Uizard. [Ton19]

Regarding pix2code [Bel18], it is a solution that reverses engineering user interfaces from a single image of a graphical user interface (GUI). Essentially, it converts an image, a screenshot, into code by using deep learning to predict it, and there is already an adaptation that supports

5_{https://uizard.io}

6_{https://github.com/tonybeltramelli/pix2code} 7_{https://github.com/ngundotra/code2pix}

(42)

hand-drawings8. Then these images were modified by applying operations such as skew, shifts, and rotations to produce additional irregularities [dSF19].

The work from Beltramelli uses an end-to-end methodology and inspired several authors to come up with different ideas for reverse engineering user interfaces from images. Their problem was divided into three sub-problems. First, a computer vision problem, since it needs to understand a given scene and infer the identified objects and their properties. Second, a language modeling problem, since it needs to understand text and generate syntactically and semantically correct code. Lastly, use the solutions to the two previous sub-problems to associate the detected objects with their corresponding textual description.

(a) Training (b) Sampling

Figure 3.6: Overview of the pix2code model architecture. [Bel18]

Confronted with the above problems, the solution was to conceive an architecture model with three neural networks (cf. Figure3.6): one CNN, which solves the computer vision problem, and two LSTMs, which solves the language model problem. The CNN receives an image I as input and encodes it into a vectorial representation p. One of the LSTMs encodes a current context xt into an intermediate vectorial representation qt. Then, the vision-encoded vector p and the language-encoded vector qt are concatenated into a single vector rt, which is fed into the second LSTM decoder. This decoder leans how to correlate the input image’s objects to the tokens present in the DSL code. Finally, to obtain the probability of the next DSL token, the decoder is attached to a softmax layer [Bel18].

The model was trained on a large synthetic set of GUI screenshots and associated source code (cf. Figure3.7, p.23) for three different platforms: iOS, Android, and web-based technologies. It has several benefits over previous techniques, mainly that it only requires example data and can learn the association for new styles. However, despite pix2code’s high performance on its stochastic synthetic dataset, it is not capable of generalizing to real-world examples [Rob19,MBC+18]. To evaluate the quality of the output, for each sampled DSL token, a classification error was measured and then averaged over the entire test dataset. Ultimately, the author concluded that accuracy could be drastically improved by training a bigger model on significantly more data [Bel18].

(43)

(a) Web-based GUI (b) DSL code describing the GUI

Figure 3.7: A sample of a web-based GUI written in markup-like DSL from the pix2code dataset. [Ton]

3.2.2 Sketch2Code: Generating a website from a paper mockup

As described in Section3.1.2(p.18), Robinson [Rob19] attempts to automate the workflow of designers with two different approaches that convert wireframe sketches directly into website code by using (a) classical computer vision techniques and (b) deep semantic segmentation networks. To demonstrate these approaches, they designed a framework that performs the pre and post-processing required to translate a wireframe captured by a camera through to a live-updating website that renders the generated code. The pre-processing step receives a raw photograph of a website’s wireframe, adjusts the positioning and lighting of the image, and forwards it to the experimental approach, which returns an intermediary DSL carried by a JSON syntax that represents the structure of the wireframe. The post-processing step has three phases: (i) transfer the generated code from the experimental approach to clients, (ii) translate the code into HTML, and (iii) live update the website.

Classical Computer Vision. This approach, which converts an image of a wireframe to code, is primarily based on computer vision. First, it identifies and classifies UI elements and properties (position, size, and type) from the sketch through computer vision techniques. Then, it infers how these elements are structured by building a hierarchical tree. Next, it classifies container structures as, for example, headers or footers. The classification process required the construction of a machine learning model to learn which features match to which container type based on training data from the dataset reported in Section3.1.2(p.18). Their model has two combined MLPs, which learn how to classify containers based on (a) their attributes (x, y, width, and height) and (b) the element types of child-elements. Then, the final MLP takes the concatenated result of both and delivers the final classification. ReLU was used as a hidden layer and softmax as the output layer to yield element class probabilities. The model was trained using the Adam algorithm to optimize accuracy and using 1250 containers samples with 250 reserved for testing and 250 for validation. Lastly, it normalizes the layout by fixing human errors often done in the sketching process, which are designated as rotations, imperfect shapes, translations, and scalings.

Deep Learning Segmentation. This approach uses deep learning to convert an image of a wireframe to code. First, it translates the image using an ANN to its normalized version, i.e., creates an image that shows the structure of the wireframe (similar to the method illustrated in

(44)

Figure3.3(p.19)). The normalized image labels elements and containers and mitigates human errors. The author considered designing their network but soon found out that existing segmentation networks could be employed and used for element detection, classification, and normalization. Although, for it to understand a normalized image, each pixel had to be labeled with the matched ele-ment class. The model was trained on 1250 images from the dataset reported in Section3.1.2(p.18), followed by a subsequent fine-tuning.

Figure3.8presents an example of how a sketch is segmented and transformed into a website by this approach. First, the original wireframe sketch is resized to 256x256 and fed into the segmentation network. The network outputs a single channel image containing pixel labels for each element class. Then, they are colored and overlaid above the original sketch. From the network output, a post-processing step and container classification are applied to create bounding boxes for each element. Finally, the tree of elements is fed into the post-processing phase of the framework producing the rendered page.

Figure 3.8: Example of Robinson’s Sketch2Code approach using deep learning segmentation. Starting with the original drawing, followed by its segmented version, and, finally, the rendered HTML. [Rob19]

The main goal of Robinson’s work was to compare the performance of the aforementioned approaches by measuring the micro and macro performance of each. Additionally, they covered the same metrics (one-tailed Mann-Whitney U) as similar research to support future studies. Regarding the micro performance, they measured the performance of detection and classification of elements by calculating the macro mean precision, recall, and F1 score. When it comes to macro performance, they used three methods to measure it:

1. Visual comparison. The two approaches were tested with 250 wireframes to calculate the structural similarity(SSIM) and mean squared error (MSE), which were used as metrics to evaluate the pixel level visual similarity between generated and original websites.

2. Structural comparison. The Wagner-Fischer implementation of Levenshtein’s edit distance was used to analyze the similarity of the hierarchical tree.

(45)

3. User study. Where websites generated from both approaches were tested with a total of 22 professionals - 9 web designers and 13 developers.

(a) For each sketch, the user had to select the webpage that best represented the wireframe. They had three choices, two generated by each approach and other generated randomly from another sketch. The test included 25 randomly selected wireframe sketches from the 250 evaluation set.

(b) For each website, users had to sketch a wireframe with the same symbols and materials. The author concluded that overall the deep learning approach outperforms classical computer vision techniques, but neither had a good enough performance to be used in production environments. Besides, the user study indicated that the dataset generator requires a larger variety in UI element sketches to improve deep learning techniques when faced with unknown styles.

By the time of this writing, we found several tools and research papers named Sketch2Code by different authors, particularly one from Microsoft9. Unfortunately, we could not find any documentation regarding the latter. By having the same name, it made research more laborious as sometimes we could not distinguish the works.

3.2.3 WebGen: Live Web Prototypes from Hand-Drawn Mockups

In an early stage of developing user-facing applications, designers make use of free-hand sketches, i.e., wireframes, to quickly convey their ideas with stakeholders. Then, the wireframes are usually converted into other high-level design forms, such as mockups. If the design is not refined before reaching developers, some miss assumptions might be made during the prototype development. Ultimately, the outcome may not meet the client’s expectations, which leads to the repetition of the aforementioned phases to fine-tune the prototype until an understanding is reached. This repetition implies a high cost of time and resources.

As described in Section3.1.1(p.16), the above issues motivated Ferreira [dSF19] to attempt to reduce the design workflow and propose a pipeline (cf. Figure3.9), composed of four phases, that generates HTML code from an image of a hand-drawn wireframe. Later, this pipeline became the foundation of WebGen. This tool allows designers to focus on creativity, developers to enhance user experience and center on the system’s logic, and to get immediate feedback from stakeholders.

IMAGE ACQUISITION Adaptive Threshold Morphologic Close Morphologic Erode [Rotation Detection] ELEMENT DETECTION YOLO (Elements) Pix2Pix + YOLO (Containers)

HIERARCHICAL RECONSTRUCTION

Merge Component Association

Input and Text Merge Vertical List Creation Horizontal List Creation

CODE GENERATION

HTML Code Generation

Figure 3.9: High-level overview of the WebGen pipeline. Adapted from [dSF19].

Automatic Generation of Synthetic Website Wireframe Datasets from Source Code

F

E

U

P

Automatic Generation of Synthetic

Website Wireframe Datasets from

Source Code

Bárbara Sofia Lopez de Carvalho Ferreira da Silva

Automatic Generation of Synthetic Website Wireframe

Datasets from Source Code

Bárbara Sofia Lopez de Carvalho Ferreira da Silva

Mestrado Integrado em Engenharia Informática e Computação

Approved in oral examination by the committee:

Abstract

Resumo

Acknowledgements

Contents

List of Figures

List of Tables

Abbreviations

Chapter 1

Introduction

1.1

Context

1.2

Problem

1.3

Motivation

1.4

Objectives

1.5

Document Structure

Chapter 2

Background

2.1

Software Development

2.2

UI/UX Design

2.3

Web Development

2.4

Summary

Chapter 3

State of the Art

3.1

Synthetic Datasets

3.2

Code Generation from Hand-Drawn Wireframes