Procedural terrain generation through image completion using GANs

(1)

Departamento de Informática e Matemática Aplicada Bachelor in Computer Science

Procedural Terrain Generation through Image

Completion using GANs

Lucas Torres de Souza

Natal-RN November 2019

(2)

Procedural Terrain Generation through Image

Completion using GANs

Bachelor’s dissertation presented to the De-partamento de Informática e Matemática Aplicada at the Centro de Ciências Exatas e da Terra at the Universidade Federal do Rio Grande do Norte in partial fulfillment of the requirements for the degree of bachelor in Computer Science.

Advisor

Bruno Motta de Carvalho, PhD

Universidade Federal do Rio Grande do Norte – UFRN Departamento de Informática e Matemática Aplicada – DIMAp

Natal-RN November 2019

(3)

pletion using GANs presented by Lucas Torres de Souza and accepted by the Departa-mento de Informática e Matemática Aplicada of the Centro de Ciências Exatas e da Terra of the Universidade Federal do Rio Grande do Norte, being approved by all the members of the thesis committee specified below:

Bruno Motta de Carvalho, PhD

Advisor

Departamento de Informática e Matemática Aplicada Universidade Federal do Rio Grande do Norte

André Maurício Cunha Campos, PhD

Selan Rodrigues dos Santos, PhD

(4)

(5)

Completion using GANs

Author: Lucas Torres de Souza Advisor: Bruno Motta de Carvalho, PhD

Abstract

Procedural terrain generation is the creation of virtual landscapes through algorithmic means. There are various well tested methods for terrain generation, but most require manual parameter tuning to obtain the expected results. In this work, we propose an system that generates terrain height maps and color textures based on real world exam-ples. This generator system is constructed using Generative Adversarial Networks, a deep learning architecture that, over the last years, has shown great results in image synthesis tasks. We model the terrain generation problem as a texture completion task. That re-sults in a system that can not only generate new terrain, but expand and connect existing ones. While the described system has limitations, it provides an useful framework for more complete systems as geospatial data becomes more readily available.

Keywords: Procedural Terrain Generation, Generative Adversarial Networks, Image Com-pletion.

(6)

Imagem utilizando Redes Adversárias Generativas

Autor: Lucas Torres de Souza Orientador: Dr. Bruno Motta de Carvalho

Resumo

Geração procedural de terrenos é a criação de paisagens virtuais através de métodos al-gorítmicos. Existem vários métodos bem testados para a geração de terrenos, mas a sua maioria exige a configuração manual de parâmetros. Neste trabalho, nós propomos um sistema que gera mapas de altura e texturas de cor para terrenos, baseado em exemplos do mundo real. Este sistema gerador é construído utilizando Redes Adversárias Genera-tivas, uma arquitetura de aprendizado profundo que, nos últimos anos, mostrou ótimos resultados em tarefas de geração de imagens. Nós modelamos o problema de geração de terreno como uma tarefa de compleção de textura. Isso resulta num sistema que não só é capaz de gerar novos terrenos, mas também expandir e conectar terrenos já existentes. Enquanto o sistema descrito possui limitações, ele provê um framework útil para sistemas mais completos, conforme dados geoespaciais se tornam mais disponíveis.

Palavras-chave: Geração Procedural de Terrenos, Redes Adversárias Generativas, Com-pleção de Imagens.

(7)

2.1 The image completion process. The input 2.1a is an incomplete image, whose unknown region is marked here in black. The process creates a

filled output image 2.1b. . . p. 14 2.2 Good and bad similarity after completion . . . p. 15 2.3 Good and bad continuity after completion . . . p. 16 2.4 Good and bad feature continuation after completion . . . p. 16 2.5 Good and bad isotropy after completion . . . p. 17 3.6 GAN architecture . . . p. 22 3.7 Terrain generated with the midpoint method, after 8 subdivisions . . . p. 24 3.8 Terrain generated with a Perlin noise height map . . . p. 24 3.9 Steep cliffs with talus slopes at their feet . . . p. 25 3.10 Terrain generated by (BECKHAM; PAL, 2017) . . . p. 26 3.11 Terrain generated by (SPICK; COWLING; WALKER, 2019) . . . p. 27 4.12 Our architecture . . . p. 28 4.13 Generator architecture . . . p. 31 4.14 Discriminator architecture . . . p. 32 4.15 Terrain mesh topology . . . p. 34 4.16 Visualization of real terrain sample from the Alps dataset . . . p. 35 5.17 Crops of size 256 × 256 of each test region (color map on the left, height

map on the right) . . . p. 37 5.18 Completion of 64 × 64 square inside crops of size 128 × 128, in the Alps

dataset . . . p. 38 5.19 Visualization of inpainting result depicted in Figure 5.18a . . . p. 38

(8)

datasets . . . p. 39 5.21 Expansion by 32 pixels of crops of size 128 × 128, in the Alps dataset . p. 41 5.22 Visualization of expansion result depicted in Figure 5.21a . . . p. 42 5.23 Expansion by 32 pixels of crops of size 128 × 128, in various datasets . p. 43 5.24 Crops of size 256 × 256 of each test region, with color map on the left

and height map on the right . . . p. 44 5.25 Visualization of generation result depicted in Figure 5.24a . . . p. 44 5.26 Original crops of size 128 × 128 of each test region . . . p. 45 5.27 Result of network trained on the Alps dataset . . . p. 46 5.28 Result of network trained on the Canyon dataset . . . p. 46 5.29 Result of network trained on the Ethiopia dataset . . . p. 47 5.30 Result of network trained on the Maze dataset . . . p. 47 5.31 Visualization of style transfer . . . p. 48

(9)

PCG – Procedural Content Generation GAN – Generative Adversarial Network RGB – Red Green Blue

ANN – Artificial Neural Network

DCGAN – Deep Convolutional Generative Adversarial Network SGAN – Spatial GAN

GPU – Graphics Processing Unit DEM – Digital Elevation Model HSV – Hue Saturation Value

(10)

1 Introduction p. 11 1.1 Objectives . . . p. 12 1.2 Outline . . . p. 12 2 Problem Definition p. 13 2.1 Terrain representation . . . p. 13 2.2 Image completion . . . p. 14 2.3 Generated terrain quality metrics . . . p. 15 2.3.1 Similarity . . . p. 15 2.3.2 Continuity . . . p. 15 2.3.3 Feature continuation . . . p. 16 2.3.4 Isotropy . . . p. 16

3 Techniques and Related Works p. 18 3.1 Generative Adversarial Networks . . . p. 18 3.1.1 Artificial neural networks . . . p. 18 3.1.1.1 Training . . . p. 19 3.1.2 Convolutional neural networks . . . p. 20 3.2 Generative adversarial networks . . . p. 21 3.3 Procedural terrain generation . . . p. 23 3.3.1 Traditional methods . . . p. 23 3.3.2 Methods based on data . . . p. 25

(11)

4 Methodology p. 28 4.1 Data Representation . . . p. 29 4.2 Generator network architecture . . . p. 30 4.3 Discriminant network architecture . . . p. 31 4.4 Loss function . . . p. 32 4.5 Training procedure . . . p. 33 4.6 Reconstruction and visualization . . . p. 34

5 Results p. 36 5.1 Inpainting . . . p. 37 5.2 Expansion . . . p. 40 5.3 Full Generation . . . p. 43 5.4 Style Transfer . . . p. 44 5.5 Discussion . . . p. 48 6 Final Remarks p. 50 6.1 Future Work . . . p. 50 6.1.1 Expanded architectures . . . p. 51 References p. 53

(12)

1 Introduction

Procedural content generation (PCG) is the creation of digital content through algo-rithmic means. It is a toolset frequently used by software developers for dynamic creation of various virtual items. By using PCG, it is possible to create a many variants of a kind of object without the need to design each form by hand, decreasing production cost. It also allows the creation of content after deployment, dynamically, as needed.

Most developments in the PCG area were motivated by game development. In games, procedural content generation allows increases replayability, as new content can be created for each session, and adaptability, as content can be adapted to different play styles (SMITH et al., 2011).

A major topic within procedural modeling is the automatic generation of terrains; studies on terrain elevation and vegetation have been developed since the 1980s. The studied methods involve both purely procedural and data-based methods (SMELIK et al., 2009).

Some more recent works attempt to use deep neural networks for generation ( BECK-HAM; PAL, 2017). However, many aspects of terrain generation using neural networks are still under study. There is a great range of generative tasks that modern neural net-works are capable of executing; however, current net-works limit themselves only to full image generation and colorization. Other functionalities, like harmonization, completion, super-resolution, style transfer and texture expansion have not been explored.

In this work, we propose a procedural terrain generation system based on completion: the task of filling in incomplete images. The advantage of the completion method is its flexibility; it allows both the harmonization with fixed, manually prepared terrain regions and infinite world generation through expansion.

Specifically, we train an Generative Adversarial Network (GAN) with real terrain samples and use it for terrain synthesis. GANs have been used with good results for a great variety of image synthesis tasks, such as super-resolution, translation, face synthesis

(13)

and texture synthesis. Our work is a specific example of the latter.

The wanted characteristics of the generated terrain are application dependent. Our system synthesizes terrains for 3D visualization purposes. Hence, we focus on two terrain aspects: height and color. Those features are encoded in a rectangular texture image, that is afterwards converted into a 3D mesh.

1.1 Objectives

Our objective in this work is to explore the capabilities of a completion-oriented terrain generation system. We expect that such system would be able to

• generate new terrains (height and color information) that are similar to a reference terrain;

• expand and combine patches of given terrains, filling missing regions with generated terrain;

• harmonize terrains so that they are similar to the reference terrain.

Within the scope of this work, similarity is a visual metric. Therefore, we would achieve our goals as long as the results in the described tasks look like the reference terrain; statistical equivalence and physical coherence are not required.

1.2 Outline

The terrain generation problem is detailed and formalized in Chapter 2. In Chapter 3 we give a brief introduction to Generative Adversarial Networks and comment on the related works on procedural terrain generation and texture synthesis. The full description of our system, including architecture and training method, is given on Chapter 4. The results are presented and analyzed on Chapter 5. Finally, we reflect on achieved and missing features of our generator on Chapter 6.

(14)

2 Problem Definition

The objective of this work is to create a procedural terrain generation system that, given an incomplete description of a terrain region, fills the missing parts with terrain of a similar type, in a way such that the end result appears to be a realistic terrain.

2.1 Terrain representation

Not every terrain feature is relevant for visualization purposes. There are two impor-tant terrain features, its topography and its surface color.

We assume that the topography can be modeled as the graph of a height function h : A → R, that maps each point from an interest area A ⊂ R2 _{into its height above}

a reference level. That representation is unable to express every kind of terrain feature; for instance, caves can not be represented. Topographic maps like these have existed for centuries, and became a natural representation method for virtual terrains.

The perceived color of a surface depends on many factors, like the position of the observer, illumination and material properties. In our simplified model we will factor out illumination and assume relative observer position is irrelevant; the visual color will be represented by a function c : A → R3, that maps each point from an interest area A ⊂ R2 into its material color, represented in RGB. The use of two dimensional information to describe surface characteristics of three dimensional objects is one of the oldest ideas in computer graphics (CATMULL, 1974).

We can combine both height and color information in a single function f : A → R4_,

f (x) 7→ (h(x), c(x)).

Computationally, we discretize the domain of f to allow representation. We call the discrete representation an texture. For visualization purposes, we separate the terrain texture into a height map and a color map, encoding topography and perceived color respectively.

(15)

2.2 Image completion

The image completion problem is, as the name suggests, the task of filling an incom-plete image. It has also been called image inpainting, region filling, void filling, texture synthesis and interpolation of missing data. This task was originally motivated by the removal of artifacts and defects from photography (KOKARAM et al., 1995;BERTALMIO et al., 2000). Those first methods used spatial and temporal interpolation, which limited the size of the filled regions. Larger gaps motivated texture synthesis algorithms (ASHIKHMIN, 2001). Later, more flexible methods combining both approaches were developed ( CRIMIN-ISI; PÉREZ; TOYAMA, 2004). The advent of deep learning allowed image completion based on the semantics of the surrounding context (YEH et al., 2017).

Formally, let I be the of all images of domain A, a probability distribution p over I, K ⊂ A and j : K → R. We call K the set of known pixels, and j the known part of the image. Let Ij ⊂ I be the set of images that coincide with i in the known pixels, that is,

Ij = {i ∈ I | ∀k ∈ K, i(k) = j(k)} .

The image completion problem is to find the image i ∈ Ii that maximizes p(i). In other

words, we want to find the most likely image that agrees with what we already know about it. The probability distribution p defines what class of images we are looking for. For example, p might model the likelihood that an image is the texture of a valley.

An example of image completion can be seen in Figure 2.1.

(a) (b)

Figure 2.1: The image completion process. The input 2.1a is an incomplete image, whose unknown region is marked here in black. The process creates a filled output image 2.1b.

(16)

Many terrain generation tasks can be posed as the image completion problem. To generate a terrain texture from nothing, it is enough to set K = ∅. To extend an existing texture, just set K to the known image and expand the domain. We can also link two given textures, defining the gap between them.

2.3 Generated terrain quality metrics

The quality of the generated terrain is subjective. An extensive study of what features of real terrain are relevant to the perceptual realism was conducted by (RAJASEKARAN et al., 2019). In this work, we skip such detailed analysis; instead, we propose four general, visually distinctive metrics to evaluate our results. These metrics were chosen after obser-vation of our various results; therefore, they are closely associated with our methodology and not all are adequate for analysis of other works. The chosen metrics are similarity, continuity, feature continuation and isotropy.

2.3.1 Similarity

The completed output texture must match the terrain features in the incomplete input; deserts should not be completed with forests, plains should not be completed with mountains. The detail level must also not change; rough terrains should not be smoothed by the completion process, and small features in the incomplete image should also appear in the filled areas. We call this characteristic similarity (Figure 2.2).

(a) Incomplete image (b) Good similarity (c) Bad similarity

Figure 2.2: Good and bad similarity after completion

2.3.2 Continuity

In the output, the border between the known and filled regions should not be de-tectable; in this work, we call this property continuity (Figure 2.3). Any border artifacts

(17)

would induce unwanted, visible geometry, especially if the texture completion method is used repeatedly. For example, creating a larger texture by creating new tiles through continuation would generate a grid pattern on the terrain.

(a) Incomplete image (b) Good continuity (c) Bad continuity

Figure 2.3: Good and bad continuity after completion

2.3.3 Feature continuation

There are many large terrain features that are expected not to end abruptly. Rivers, roads and mountain ranges are examples of such features. Ideally, features in the known region are continued in the generated one, and no feature that is not continued is created. We call this property feature continuation (Figure 2.4).

(a) Incomplete image (b) Good feature

continua-tion (c) Bad feature continuation

Figure 2.4: Good and bad feature continuation after completion

2.3.4 Isotropy

Even if the border of the filled region itself is not detectable, the system might still use the border information for feature construction. That induces the creation of features that follow the border contours. That effect is not so critical if the completion process is used only once, but repeated use will again induce unwanted geometry. We call the independence between border direction and feature direction isotropy (Figure 2.5).

(18)

(a) Incomplete image (b) Good isotropy (c) Bad isotropy

(19)

3 Techniques and Related Works

3.1 Generative Adversarial Networks

In this chapter we introduce the fundamental concepts of Generative Adversarial Networks, a machine learning technique that has a central function in our system.

3.1.1 Artificial neural networks

Artificial neural networks (ANNs) are computing models based on studies of biologi-cal neural networks. The basic unit of an ANN is the neuron, a component that receives multiple inputs and generates an output. The fundamental computational model of the biological neuron was the MP model, proposed by (MCCULLOCH; PITTS, 1943). The percep-tron is an extension of the MP model, created specifically for an artificial neural network (ROSENBLATT, 1958). The function of an perceptron neuron is

f (x) =    1, if w · x > φ 0, otherwise

where x = (x1, . . . , xn)T is the input vector, w = (w1, . . . , wn)T the weight vector and φ

the threshold. The modern neuron model is an generalization of the perceptron, capable of generating non-binary outputs. Its function form is

f (x) = σ(w · x)

where σ is called the activation function. Typical activation functions include the Heavi-side step function

H(z) =    0, if z < 0, 1, if z ≥ 0,

(20)

that generates an perceptron, the inverse tangent function, the logistic function

L(z) = 1 1 + e−z,

the rectifier linear unit (ReLU) function (HAHNLOSER et al., 2000) ReLU (z) = max(z, 0),

and the leaky ReLU function (MAAS; HANNUN; NG, 2013)

LeakyReLU (z) =    z, z > 0, z, otherwise, where is a small value.

To create an multidimensional output, we can combine the output of multiple neurons; for m inputs and n outputs, the equivalent function f : Rm _{→ R}n _is

f (x) = σ (W x)

where W ∈ Rn×m is the weight matrix. This creates a single-layer network. We can sequentially compose many such layers in a feed-forward, k-layer, fully-connected network

F (x) = (f1◦ · · · ◦ fk−1)(x),

where fi(x) = σ (Wix).

There are other network models may not connect every pair of neurons in subsequent layers, have cycles, or even use different types of neurons. The selection and arrangement of the neurons in an artificial neural network is called its architecture.

3.1.1.1 Training

The universal approximation theorem (CSÁJI, 2001) guarantees that any continuous function that satisfy a few reasonable conditions can be approximated by a feed-forward network with 3 finite layers. However, finding the specific weights that define the wanted function is hard, especially since generally we have no explicit description of the wanted function beforehand. Therefore, we need two things to find a neural network for a certain task, a quantitative way to evaluate the fitness of a function to that task and a algorithm to find a function with maximal fitness.

(21)

function indicates how bad the network performs a certain task. For instance, let us consider a binary classification problem. Let A be a set and B ⊂ A. Define b : A → {0, 1},

b(x) =    0, x 6∈ B 1, x ∈ B.

Suppose we want a network that calculates b. We might use the loss function

`(F ) =X

x∈A

|F (x) − b(x)| .

Minimizing ` corresponds to to finding F that better emulates b; more specifically, it corresponds to finding the weight parameters whose F better emulates b.

In the machine learning approach, the evaluation of the wanted function is known only over a subset of the domain. We call that subset the training set, and use it to define the loss function.

Except for some very simple architectures and loss functions, there is no known algo-rithm that is guaranteed to find the global minimum of the loss function.

3.1.2 Convolutional neural networks

Convolutional neural networks are the fundamental tool of the modern deep learning systems. Their success on image classification tasks (KRIZHEVSKY; SUTSKEVER; HINTON, 2012) put in motion a new era of machine learning research.

The concept of a convolutional layer was introduced by (FUKUSHIMA, 1980). In kind of layer, each neuron acts as a convolutional filter over a signal; we will focus on 2-dimensional signals, i.e images. However, convolutional layers are also commonly used with one and three dimensional sources.

A convolution filter convolves a kernel over a signal and generates a new signal. A kernel K of size (2k + 1) is a (2k + 1) × (2k + 1) matrix of values. In a neural convolutional layer, the kernel is the parameter to be learned. The output J of the convolution filter over an image I is J (i, j) = k X `=−k k X m=−k K`mI(i − `, j − m) .

(22)

used to extend the input image is called the padding. Typical padding methods include the use of a constant value (usually 0), repeating the image and reflecting the image. Alternatively we may forgo the use of padding, but that results in an output image smaller than the input one.

The kernel does not need to trace the input image pixel by pixel to generate the output. The step in the input image between each kernel application in called the stride. A stride of s results in an image with 1/s the dimensions of the input.

Another kind of layer used to reduce the image size is the pooling layer, also introduced by (FUKUSHIMA, 1980). The pooling layer uses a statistic, like mean or maximum, to summarize the values in a fixed-sized window. For generative tasks, pooling layers have fallen out of favor; convolutional layers with non unitary strides have similar capabilities, while also being able to learn the summary function.

Neither convolutional nor pooling layers are capable of increasing the dimensions of the image. This upsampling task is performed by the transposed convolution operation (LONG; SHELHAMER; DARRELL, 2015), also know as deconvolution, backwards convolution or fractionally-strided convolution. As suggested by the alternative nomenclature, it has a role contrary to that of the convolution, using a similar kernel but applying it in reverse. Mathematically, it is equivalent to a convolution with fractional (less than one) stride. Transposed convolution layers are successfully used by networks for generation, auto-encoding and segmentation tasks.

A useful characteristic of convolution, pooling and transposed convolution layers is that they can be applied to images of arbitrary size, and are translation-invariant. We call networks composed only of those kinds of layers fully-convolutional networks. That kind of network is ideal for texture related tasks, as it limits its effect to a fixed distance neighbor region and cannot consider global aspects of the image.

3.2 Generative adversarial networks

As noted on Section 3.1.1.1, training a neural network requires the definition of a loss function. For more straightforward tasks, like classification, regression and segmentation, the distance between expected output and obtained output is easy to measure. On the other hand, image generation tasks provide no such simplicity. There is no mathematically sensible way to indicate how well an image represents, for example, a dog.

(23)

Figure 3.6: GAN architecture

The solution found was to simultaneously train two networks, a generator and a dis-criminator. The generator network is the one that performs the wanted image generations task. During training, the objective of the discriminator network is to distinguish between real images and generated images. The generator network fights against the discriminator, attempting to pass the generated images as real. These two networks are trained together, adapting to each other. This framework is called Generative Adversarial Network (GANs), and was first proposed by (GOODFELLOW et al., 2014). A diagram of this architecture can be found at Figure 3.6.

A Deep Convolutional Generative Adversarial Networks (DCGAN) is the combina-tion of the GAN architecture with deep convolucombina-tional generator and discriminator neural networks (RADFORD; METZ; CHINTALA, 2015). Fully-convolutional DCGANs have great results on style transfer and super-resolution tasks (JOHNSON; ALAHI; FEI-FEI, 2016). Fully-convolutional DCGANs that use a noise image instead of a noise vector are called spatial GANs (SGANs) (JETCHEV; BERGMANN; VOLLGRAF, 2016); they were developed specially for texture synthesis.

GANs have been successfully used for a wide variety of tasks, like

• texture synthesis, the generation of a new texture with second order characteristics similar to a reference one (GATYS; ECKER; BETHGE, 2015a);

• texture expansion (ZHOU et al., 2018);

• super-resolution, upsampling an image into a higher resolution (LEDIG et al., 2017); • image completion, filling holes in an image (IIZUKA; SIMO-SERRA; ISHIKAWA, 2017);

• face image synthesis (ZHANG; SONG; QI, 2017); • human image synthesis (ZHAO et al., 2018);

• image to image translation, transforming images from one given class to another (ZHU et al., 2017);

(24)

• image harmonization, modification of an image pasted over another to harmonize it with its surroundings (XIAODONG; CHI-MAN, 2019);

• style transfer, the transfer from one artwork style into another(GATYS; ECKER; BETHGE, 2015b);

• colorization, the addition of color to black and white images (CAO et al., 2017).

3.3 Procedural terrain generation

The generation of virtual landscapes is one of the oldest interest areas in procedural content generation research. Most of the work has focused on the generation of height maps, but hydrography (KELLEY; MALIN; NIELSON, 1988), vegetation (DEUSSEN et al.,

1998), roads (SUN et al., 2002) and urban environments(WATSON et al., 2008) have also been explored.

On this review, we considered only works that focus on the generation of either the height map or the color map of the terrain.

3.3.1 Traditional methods

The oldest height map generation methods are based on subdivision. The objective of those initial papers was not the generation of a terrain from scratch, but the addi-tion of details to already existing, low resoluaddi-tion terrain meshes (FOURNIER; FUSSELL; CARPENTER, 1982). The point-displacement method, described by (MILLER, 1986), is a recursive subdivision method that subdivides existing edges in their midpoint and dis-place that new point by a random height; the range of the random height decreases with the recursion. While the midpoint method can work on arbitrary mesh topologies, it was popularized in conjunction with the diamond-square subdivision. That terrains outputted by the midpoint method generally look like hilly areas or mountain sides; furthermore, they are fractal-like and do not have different features based on scale. Figure 3.7 shows a terrain output of the midpoint method.

Most popular methods are based on fractal noise generators (VOSS, 1985). The Perlin noise (PERLIN, 1985) is a smooth multidimensional function obtained by interpolation of a grid of vectors. The sum of those noise functions in various scales, related by powers of two, creates a fractal function; each of those functions is called an octave. Perlin noise has an advantage over subdivision methods: it can be sampled at any point in constant

(25)

Figure 3.7: Terrain generated with the midpoint method, after 8 subdivisions

Figure 3.8: Terrain generated with a Perlin noise height map

time, with only local information, as long as the vectors on the grid can also be calculated in constant time. This allows the creation of infinite terrains. A terrain with Perlin noise height map is displayed in Figure 3.8.

Height maps generated with a pure Perlin noise function are often too smooth, while simultaneously too steep in many regions. Many refinement methods have been proposed. Erosion, both thermal and fluvial, is treated in (MUSGRAVE, 1993). Fluvial erosion moves matter around according to local topography, creating valleys and drainage networks. Thermal erosion affects steep slopes, softening them and creating talus slopes at their feet. An example of talus slope can be seen on Figure 3.9.

(26)

previ-Figure 3.9: Steep cliffs with talus slopes at their feet

source: commons.wikimedia.org/wiki/File:TalusConesIsfjorden.jpg

ously cited approaches, these methods are capable of generating caves and overhangs. In (BENES; FORSBACH, 2001), many layers are used to represent the terrain, each with distinct material properties. This allows more realistic results, at a higher computational cost. Other methods, like (SANTAMARÍA-IBIRIKA et al., 2014), manipulate voxels.

In general, erosion computation is resource intensive, especially when working with voxels. Hence, more recent works implement erosion on the GPU. In (WEIß, 2016), a particle based hydraulic erosive system is simulated on the GPU.

3.3.2 Methods based on data

Among the procedural terrain generation methods that do not rely on machine learn-ing, few use real data. Real world topography data is represented in Digital Elevation Models (DEMs). The Terrainosaurus system (SAUNDERS, 2007) synthesizes terrain by genetically selecting DEMs that fit the given elevation profile. In (PARBERRY, 2014), ele-vation statistics are collected from DEMs and used to configure a variant of Perlin noise, called value noise; that allows the creation of terrains with elevation profile similar to a reference one.

3.3.3 Methods based on GANs

Since generative adversarial networks are still a relatively young method, only a few works have considered their application for procedural terrain generation.

(27)

Figure 3.10: Terrain generated by (BECKHAM; PAL, 2017) source: (BECKHAM; PAL, 2017)

one for height map generation and one for color. The height map is generated first with a DCGAN. The color map is generated from the height map, using a conditional image-to-image translation GAN known as pix2pix (ISOLA et al., 2017). The networks were trained with a cross-entropy loss function, using 512×512 sized images of the Earth, at a resolution of about 0.58 kilometers per pixel. The results of this work suggest that the height map is insufficient for generating color information, as snow was added to a desert terrain. Its results also display the glitchy artifacts commonly found in other GAN based texture generation systems. A sample of the genrated terrain can be seen in Figure 3.10.

Height maps with visually better quality was obtained by (SPICK; COWLING; WALKER, 2019) (Figure 3.11). Its methodology is somewhat similar to ours, as it trains a SGAN with small crops from a large region, to generate similar patches. It does not, however, generate color maps.

(28)

Figure 3.11: Terrain generated by (SPICK; COWLING; WALKER, 2019) source: (SPICK; COWLING; WALKER, 2019)

(29)

4 Methodology

The main component of the terrain generation system is a generative neural network G. It receives an incomplete terrain texture, with partial color and height map, and outputs a complete texture.

To train G, we utilize a Generative Adversarial Network architecture, as defined in Section 3.1. As such, we have an associated discriminative network D, that classifies textures as either real or fabricated. The G network is trained with complete/incomplete image pairs, with two objectives: to fill the incomplete image so that it resembles the completed one, and to make D classify its output as real. The network D, on the other hand, has an competing target: to classify G’s output as fabricated and real images as real. Both G and D are trained concurrently.

An overview of our GAN architecture can be found in Figure 4.12. Our neural networks are modified SGANs. Therefore, they are fully-convolutional and can work with arbitrarily sized data.

In this chapter we describe the details of our proposed architecture; Section 4.1 ex-plains how texture data is represented and prepared for training; the architectures of G

(30)

and D are described in Section 4.2 and Section 4.3, respectively; training targets are listed in Section 4.4; the training procedure itself is detailed in Section 4.5 finally, we briefly discuss our approach to terrain mesh reconstruction and visualization on Section 4.6.

4.1 Data Representation

Our texture image has four channels, where three represent color and one represents the height. The color is encoded in RGB; other works with GANs often use an HSV color space encoding (SMITH, 1978) , but that option would be incompatible with the style loss function. For similar reasons, no gamma correction is performed on the color texture. Height is represented linearly by

ˆ

h = h − hmin hmax− hmin

where ˆh is the encoded height, h is the real height, hmin the minimal height and hmax the

maximal height. The values of hmin and hmax must be chosen manually for each terrain

class so that 0 ≤ ˆh ≤ 1.

In our training data and result visualization software, each channel can assume at most 256 distinct values. That limitation is not the result of any network property, but a constraint imposed by the available training data. The system can adapted to input or output higher color and height resolutions at no additional cost.

The input to an image completion problem is an image segmented in two parts, a known region and a unknown region. Such segmentation can be encoded into the input using various approaches:

1. Set the unknown pixels to a constant color/height;

2. Set each unknown pixel to a random color/height;

3. Set the unknown pixels to a constant color/height, and mark them in a new channel;

4. Set each unknown pixel to a random color/height, and mark it in a new channel;

The use of constant color/height information has a unavoidable side-effect in our archi-tecture: it creates a small repeating pattern far from the known region. Not using a mask makes the problem ambiguous between style transfer and completion. Therefore, we chose to use both noise and a unknown region mask layer.

(31)

In training, each training texture is obtained through random 128 × 128 cropping of a single large reference image. To diversify the inputs, those crops may be flipped horizontally or vertically, with independent probability 0.5 for each. To feed the network G, we take those textures and remove a region, encoding it using one of the methods above described. The removal region is chosen at random from a fixed selection of masks. As a final step, pixel values are remapped to a [−1, 1] floating point interval.

4.2 Generator network architecture

The layers of G are organized in three sequential zones:

• The reduction zone, where the spatial resolution of the image decreases and the number of channels increases;

• The residual zone, where both spatial resolution and number of channels is constant;

• The expansion zone, where the spatial resolution increases and the number of chan-nels decreases.

The reduction zone has three convolution layers. The first layer does not reduce size, but increases the number of channels to 64. Each one of the following two layers decreases the spatial resolution by half and doubles the number of channels. Each convolution layer is followed by a 2D batch normalization and a ReLU activation function. If the input of the reduction zone is a k × k × d image, the output is an k/4 × k/4 × 256 one.

The residual zone is formed by six sequential residual blocks. A residual block is so called because its result is added to the input (and therefore what is calculated is the residue). Each residue block has two convolution layers, followed by batch normalization and separated by a ReLU activation and, in training, a dropout layer.

The expansion zone is a reflection of the reduction zone, with the dimension reducing convolution layers replaced by dimension increasing transposed convolutions. If the input image of this zone is of size k × k × 256, the output is 4k × 4k × d.

The full generator architecture can be observed in Figure 4.13.

Due to the complex nature of neural networks, it is difficult to discern exactly what function each of these zones has in the solution of the problem. However, it is possible compare this architecture with those of similar networks. Without the residual zone, we

(32)

Figure 4.13: Generator architecture

obtain a traditional auto-encoder setup. Auto-encoders reconstruct images after reducing their spatial dimension; for that purpose, they need to learn the local patterns of the training images. Therefore, we might assume that the function of the reduction and ex-pansion layers is similar. Hence, the purpose of the residual layer is the actual completion of the input texture.

The generator is a fully-convolutional network, i.e. it is formed only by convolutional and deconvolutional layers, with no fully connected layers. That property implies that the same G can process images with arbitrary sizes, with a fixed size ratio between input and output. For our network, the input and output size is always the same.

Another effect of having a fully-convolutional network is that, for a large enough input image, no pixel value change will affect the output image everywhere. There is a maximum pixel influence radius determined by the network architecture. The influence radius for our network is 91 pixels. Therefore, while there is no limit to the input image size, the dimensions of the unknown region are limited; any unknown area too far away from the known region will be filled with content unrelated to its known neighborhood. If necessary for an application, we can increase the radius by adding either size reduction layers or residual blocks.

4.3 Discriminant network architecture

The discriminant network D is formed by a sequence of six size reducing convolution layers, the first five followed by a LeakyReLU activation function and the last followed by a Sigmoid one. Its full architecture can be observed on (Figure 4.14).

(33)

Figure 4.14: Discriminator architecture

4.4 Loss function

The total generative loss LG function is composed by three terms

LG = Ladv + λ1LL1+ λ2Lstyle

where Ladv is the adversarial loss, LL1 is the L1 loss and Lstyle is the style loss. The

generator network is trained to minimize the loss function.

The adversarial loss Ladv is the traditional metric described by (GOODFELLOW et al.,

2014), in the seminal paper on GANs. It is given by

Ez[log(1 − D(G(z))]

where

• G(z) is the generator output given incomplete texture z;

• D(G(z)) is the discriminator estimate of the probability the texture generated from z is real;

• Ez is the expected value over all incomplete textures.

By minimizing Ladv, the generator becomes better at fooling the the discriminator.

The L1 loss is the mean absolute error between each pixel of the real input and the

generated one. Minimizing LL1 corresponds to outputting an image similar to the input,

(34)

for known pixels, it expresses our intent of not modifying their value; for unknown pixels, it stabilizes their generated values to something not too different from the original image. The style loss Lstyle represents a more subtle concept. It is low when the source and

generated images have a similar visual texture. The style loss was originally described by (GATYS; ECKER; BETHGE, 2015a) in a paper about texture generation. First, we extract features of different sizes from the source image. To do so, we run it through the initial layers of a already trained high-performing deep neural network. Each activated layer ` in a trained network corresponds to a set of N` non-linear feature filters, each with M`

outputs; we can represent the features of a layer with a feature matrix F` _{∈ R}N`×M`_{. The}

Gram matrix G`depicts the correlation between the features of the layer. It is given by

G`_ij =X

k

F_ik` · F_jk`

The Gram matrices related to a fixed layer set L, {G`}`∈L, encode the texture of the

corresponding image. Representing by G, ˆG the Gram matrices of the source and generated images, respectively, the style loss is

Lstyle = X `∈L X i,j w` 4N2 `M`2 G`_ij − ˆG`_ij 2

for fixed weights {w`}`∈L.

Our choice for feature network was the VGG-19 model trained for ImageNet, as used by (GATYS; ECKER; BETHGE, 2015a) and (ZHOU et al., 2018). However, unlike those works,

our output image has four channels. As VGG-19 receives images with three channels, we create two sets Gram matrices; the first with only the RGB channels, and another with the triplicated height channel. These two sets are combined for loss evaluation purposes.

4.5 Training procedure

Following the approach for texture generation in works like (ZHOU et al., 2018), we do not attempt to create a network that can generate every kind of terrain. Instead, we train a new network with a single, continuous piece of terrain that contain a repeated type of characterizing feature, like a mountain range, an archipelago or a river delta.

For each training class, the network is trained for 105 _{iterations. Optimization is done}

with the Adam (KINGMA; BA, 2014) stochastic method; the momentum is set to 0.5. The learning rate is set to 0.0002 from the start to iteration 5 · 104, and linearly decays to zero

(35)

until the final iteration.

The software used for training and testing was adapted from the one used by (ZHOU et al., 2018). It is written in the Python programming language using the PyTorch (PASZKE et al., 2017) deep learning library.

4.6 Reconstruction and visualization

For real-time visualization of a three dimensional surface on current hardware, we need to represent it as a triangular mesh. Our generated mesh has the simple repeated structure shown in Figure 4.15.

Figure 4.15: Terrain mesh topology

Each vertex of our mesh corresponds to a pixel in the generated texture; the position of the vertex is influenced only by the height map value at that pixel. Generally, the topology of the terrain is smoothed out before display; we do not smooth our results, as not to hide any artifacts generated by the previous steps.

The rendered terrains in this work were created with the Blender 3D software. A sample of rendered result can be seen in Figure 4.16.

(36)

(37)

5 Results

We trained the network with four distinct terrain datasets, and ran them through four experiments with distinct amounts of unknown region. That allows us to infer the efficacy of the system in various terrain generation schemes.

The Alps dataset (Figure 5.17a) contains a region of the Swiss Alps. It is characterized by a combination of grassy, fractal valleys and tall, snow-covered mountains. We consider this dataset contains all features necessary for proper training: a single, repetitive type of feature, with a high correlation between height and color information. The Canyon dataset (Figure 5.17b) consists mostly of rocky desert, planar terrain, with a singular canyon structure crossing the desert. It is based on the neighborhood of the Grand Canyon, US. The Ethiopia dataset (Figure 5.17c) is formed by irregular, hilly terrain covered by a semi-arid biome, taken from southeast Ethiopia. About half of it is covered with rough terrain sculpted by rivers, while the other half is mostly plain. The Maze dataset (Figure 5.17d) is an artificial terrain with very simple, straight features, and is used to allow us to evaluate our method in a more controlled way.

The training was executed in a GeForce 940MX GPU with 4 GB of RAM. The network was trained in batches of 8 samples at a time. A batch size of 8 is low relative to the related works (BECKHAM; PAL, 2017;SPICK; COWLING; WALKER, 2019), but no higher power of two could be used due to limited memory. Training each network took about 30 hours, a rate of about 1.08 s per iteration. The generation of a 128 × 128 crop takes about 4.3 ms, assuming that both the network and the input have already been loaded into the GPU.

(38)

(a) Alps (b) Canyon

(c) Ethiopia (d) Maze

Figure 5.17: Crops of size 256 × 256 of each test region (color map on the left, height map on the right)

5.1 Inpainting

The inpainting test analyzes the generation of content when surrounded by known terrain. We expect this to be the easiest of the four tasks for the generator system, as most of the training data contains inpainting unknown regions, and a lot of information about the surrounding terrain is available.

For the Alps dataset (Figure 5.18), while the color image quality is degraded, all our other objective metrics, continuity, isotropy and feature continuation, are successfully obtained. For instance, consider the inpainted terrain in Figure 5.18a; there is no visible border between the known and filled region, so continuity is achieved; feature continuation is clear, as valleys in the northwest, west, south and east of the unknown region are connected; finally, no anisotropic features can be observed. However, there are visible flaws in the results, such as the rough elevation artifact in the west of the result, more clearly visible in Figure 5.19.

(39)

(a) (b)

(c) (d)

Figure 5.18: Completion of 64 × 64 square inside crops of size 128 × 128, in the Alps dataset

(40)

The results for the Canyon dataset (Figure 5.20a) are of low quality. The generator fails to continue the canyon feature, and there is a visible border around the generated region. We believe this low performance is a result of the fact that most of the Canyon dataset is filled with plains.

The generator for the Ethiopia dataset (Figure 5.20b) contains the same faults at a lesser degree. In the valley diagonally crossing the image, the generated terrain has softer features than the surrounding terrain.

The Maze dataset (Figure 5.20c) has acceptable results, with proper similarity, con-tinuity and feature continuation. However, an artifact can be observed on the northwest corner.

(a) Canyon (b) Ethiopia

(c) Maze

(41)

5.2 Expansion

The expansion test tries to generate terrain around a known area. More specifically, we generated a border of 32 pixels around a 128×128 terrain patch. The terrain expansion task has medium difficulty; there is some context neighborhood, but not as much as in the inpainting experiment.

For the Alps network (Figure 5.21), the generated terrain is properly similar, continu-ous and feature-continucontinu-ous. However, the results are not isotropic. Observe the generated valleys at the south edge in Figure 5.21a, east and southwest in Figure 5.21b, north and east in Figure 5.21c and south in Figure 5.21d; they follow the direction of the edge of expansion and create unnatural features. The same elevation artifacts previously observed appear again in this sample, altough in different locations (Figure 5.22).

(42)

(a) (b)

(c) (d)

(43)

Figure 5.22: Visualization of expansion result depicted in Figure 5.21a

The other generator networks (Figure 5.23) also produce non-ideal results. Feature continuation is mostly absent with Canyon and Ethiopia networks, a more serious problem in the former. Maze does have good feature continuation, but isotropy is poor on the height map, with unjustified high elevation at the four expansion edges.

(44)

(a) Canyon (b) Ethiopia

(c) Maze

Figure 5.23: Expansion by 32 pixels of crops of size 128 × 128, in various datasets

5.3 Full Generation

If fed with only noise, the generator network will still attempt to generate terrain. The results of this task are the kind of terrain expected when generating terrain beyond the influence radius of any known region. They also indicate the variability expected when distinct noise is used with the same known region. As there is no present terrain information, and very few of the training samples in each network request full generation, this is a specially hard task for the networks.

(45)

The results for all test datasets (Figure 5.24) show a lack of large scale features, and contain only a amorphous incoherent texture that somewhat resembles the reference terrain. A rendered version can be seen in Figure 5.25.

(a) Alps (b) Canyon

Figure 5.24: Crops of size 256 × 256 of each test region, with color map on the left and height map on the right

Figure 5.25: Visualization of generation result depicted in Figure 5.24a

5.4 Style Transfer

In the style transfer task, the input contains no unknown regions. This case is not a directly trained, but a side-effect of the style networks. However, that does not mean that

(46)

the task itself is useless. When it receives a terrain as input unlike the one it was trained with, the generator networks translates it into the features of the trained terrain.

It is interesting to note that the transfer occurs not only within the color map, but also in the height map. When attempting to translate height maps with the Canyon-based network, that is mostly plain, only blurred features are passed (Figure 5.28d); desertic hills are converted in green mountains by the Alps network (Figure 5.31); the swirling valleys of an Ethiopia sample is built from the vertical and horizontal features of the Maze (Figure 5.30c). This characteristic allows another method for new terrain generation: to create a base terrain with a traditional method and pass it trough the generator network.

(a) Alps (b) Canyon

(47)

(a) (b)

(c) (d)

Figure 5.27: Result of network trained on the Alps dataset

(a) (b)

(c) (d)

(48)

(a) (b)

(c) (d)

Figure 5.29: Result of network trained on the Ethiopia dataset

(a) (b)

(c) (d)

(49)

(a) Original terrain depicted in Figure 5.26c

(b) Style transfer result depicted in Figure 5.27c

Figure 5.31: Visualization of style transfer

5.5 Discussion

Through all four experiments, it is clear that two of the networks had considerably better performance, the ones based on Alps and Maze datasets. They are different from the Canyon and Ethiopia datasets in a crucial aspect; while Canyon and Ethiopia have a diverse set of terrain features, Alps and Maze are homogeneous and contain a singular kind of feature. This suggests that a carefull selection of the source dataset is necessary, and that using the network to generate multiple kinds of terrain would require a more powerful network architecture.

(50)

When compared to inpainting, expansion and style transfer, the full generation task obtained very low quality results. Hence, it is easy to generate new terrain when some neighbor is already available, but hard to create brand new terrain. We suggest another method to generate new terrain, using our network:

1. generate a height and color map using a traditional method (Section 3.3.1);

2. apply style transfer;

3. expand style transfer result;

4. crop the region generated by the expansion.

Our system could be used for both real-time and offline terrain generation. The limit-ing factor is not execution time, but available GPU memory. Our trainlimit-ing hardware could handle the generation of textures 512 × 512 in 21 ms; inputs sized 1024 × 1024 lead to “out of memory” errors. As the influence radius is small and the network output is translation invariant, size limitations can be overcome by splitting apart the input; in this case, the performance bottleneck is the transference of textures between the main memory and the GPU.

We obtained results of higher quality than the ones presented by (BECKHAM; PAL, 2017). However, since we were unable to generate terrains from a desert biome, a more de-tailed comparison is impossible. The paper also does not provide information on hardware, training hyper-parameters or performance.

The height maps generated by our method are not clearly better or worse than the re-sults by (SPICK; COWLING; WALKER, 2019), and we cannot compare perceptual realism on humans because we executed no such study. When considering only the outputs presented by the authors in the paper, we believe that our work achieved, with the Alps network, better feature continuity but worse similarity. Our network is leaner; ours has depth 2 while theirs has depths between 4 and 6. They also do not generate color information.

(51)

6 Final Remarks

This work provided a series of new contributions to the procedural terrain generation problem, we

• showed that texture completion is a valid way of structuring the procedural terrain generation task;

• created a system that can generate color and height information with a single net-work;

• described a working architecture that, unlike similar works in terrain generation with GANs, explicitly considers texture properties of the data;

• demonstrated the necessity of a homogeneous training dataset;

• highlighted the benefits of adding noise to the unknown region of the input of image completion networks;

• provided an initial analysis the effects of style transfer in height maps, as it has been studied mainly in the context of color images.

While our results have clear flaws in some aspects, those kinds of artifacts have been found in other GAN based systems and corrected by following works. Therefore, we believe that with further research a ready-for-use version of the system could be developed.

6.1 Future Work

There are some possible improvements to our system that, due to hardware limita-tions, were not explored.

The analysis in related works show that the ideal batch size for training is around 64. Our batch size was 8. It has been shown that the mini-batch size influences training speed and stability.

(52)

Another parameter to try to increase is the generator network depth. Similar networks obtained better results with depth between 5 and 6. A increased network depth also increases the influence radius, which may in turn induce better feature continuation.

Since our system generates unwanted artifacts near the image borders, alternative padding settings should also be considered. As the effects of padding should increase with the influence radius, it might also be interesting to adapt the system to use no padding at all, or at least no padding outside the residual blocks. A network with no padding would inevitably output an image smaller than the input; proper preprocessing should be able to cancel out the effect.

An alternative architecture for the generator is the U-Net (RONNEBERGER; FISCHER; BROX, 2015). As our architecture, U-Net is a fully-convolutional network, capable of pro-cessing images of arbitrary size. Unlike our architecture, U-Net connects convolutional and deconvolutional layers of same size; we believe such characteristic would allow recon-struction of the known region with higher fidelity.

Finally, it is worth observing that the terrain description our system produces (height and color) is not enough for rendering a fully realistic environment in modern graphical pipelines. Necessary additional information includes ground reflectivity properties, water location and depth, and vegetation density. Since all of those can be encoded as textures, we believe our architecture should be able to handle those additional features with little modification.

6.1.1 Expanded architectures

The need to train a distinct generator network for each kind of terrain is a big limi-tation of our system. Ideally, the end user should be able to pick from a pallete of terrain types and combine them as needed, in a interactive system like (ZHU et al., 2016). We believe that a network architecture similar to ours would work for such task; however, training such network would require a extensive, carefully labeled dataset, whose con-struction falls beyond the scope of this work.

Ideally, a procedural terrain generation network should simultaneously execute two tasks: texture completion and multi-resolution. The purpose of texture completion has already been explored in this work; the main advantage of generating terrains through completion is the ability to expand already existing terrains. Multi-resolution allows us to generate multiple levels of detail as needed; this is important because storing the full

(53)

terrain texture with high detail level is extremely memory intensive. StyleGANs are able to tackle each of those tasks individually, but no studies were found that consider them combined.

Our described approach is incapable of generating caves and overhangs, and poor at generating cliffs and other high slope structures. In traditional methods, those structures have been created over a voxel representation. Convolutional Neural Networks are capable of working with voxels, but it is not clear how our architecture could be adapted to three dimensional information.

(54)

References

ASHIKHMIN, M. Synthesizing natural textures. SI3D, Citeseer, v. 1, p. 217–226, 2001. BECKHAM, C.; PAL, C. A step towards procedural terrain generation with gans. arXiv preprint arXiv:1707.03383, 2017.

BENES, B.; FORSBACH, R. Layered data representation for visual simulation of terrain erosion. In: IEEE. Proceedings Spring Conference on Computer Graphics. [S.l.], 2001. p. 80–86.

BERTALMIO, M. et al. Image inpainting. In: ACM PRESS/ADDISON-WESLEY PUBLISHING CO. Proceedings of the 27th annual conference on Computer graphics and interactive techniques. [S.l.], 2000. p. 417–424.

CAO, Y. et al. Unsupervised diverse colorization via generative adversarial networks. In: SPRINGER. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. [S.l.], 2017. p. 151–166.

CATMULL, E. A subdivision algorithm for computer display of curved surfaces. [S.l.], 1974.

CRIMINISI, A.; PÉREZ, P.; TOYAMA, K. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on image processing, v. 13, n. 9, p. 1200–1212, 2004.

CSÁJI, B. C. Approximation with artificial neural networks. Faculty of Sciences, Etvs Lornd University, Hungary, Citeseer, v. 24, p. 48, 2001.

DEUSSEN, O. et al. Realistic modeling and rendering of plant ecosystems. In: ACM. Proceedings of the 25th annual conference on Computer graphics and interactive techniques. [S.l.], 1998. p. 275–286.

FOURNIER, A.; FUSSELL, D.; CARPENTER, L. Computer rendering of stochastic models. Communications of the ACM, ACM, v. 25, n. 6, p. 371–384, 1982.

FUKUSHIMA, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, Springer, v. 36, n. 4, p. 193–202, 1980.

GATYS, L.; ECKER, A. S.; BETHGE, M. Texture synthesis using convolutional neural networks. In: Advances in neural information processing systems. [S.l.: s.n.], 2015. p. 262–270.

GATYS, L. A.; ECKER, A. S.; BETHGE, M. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.

(55)

GOODFELLOW, I. et al. Generative adversarial nets. In: Advances in neural information processing systems. [S.l.: s.n.], 2014. p. 2672–2680.

HAHNLOSER, R. H. et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, Nature Publishing Group, v. 405, n. 6789, p. 947, 2000.

IIZUKA, S.; SIMO-SERRA, E.; ISHIKAWA, H. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), ACM, v. 36, n. 4, p. 107, 2017. ISOLA, P. et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. [S.l.: s.n.], 2017. p. 1125–1134.

JETCHEV, N.; BERGMANN, U.; VOLLGRAF, R. Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207, 2016.

JOHNSON, J.; ALAHI, A.; FEI-FEI, L. Perceptual losses for real-time style transfer and super-resolution. In: SPRINGER. European conference on computer vision. [S.l.], 2016. p. 694–711.

KELLEY, A. D.; MALIN, M. C.; NIELSON, G. M. Terrain simulation using a model of stream erosion. [S.l.]: ACM, 1988.

KINGMA, D. P.; BA, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

KOKARAM, A. C. et al. Interpolation of missing data in image sequences. IEEE Transactions on Image Processing, IEEE, v. 4, n. 11, p. 1509–1519, 1995.

KRIZHEVSKY, A.; SUTSKEVER, I.; HINTON, G. E. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. [S.l.: s.n.], 2012. p. 1097–1105.

LEDIG, C. et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. [S.l.: s.n.], 2017. p. 4681–4690.

LONG, J.; SHELHAMER, E.; DARRELL, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. [S.l.: s.n.], 2015. p. 3431–3440.

MAAS, A. L.; HANNUN, A. Y.; NG, A. Y. Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml. [S.l.: s.n.], 2013. v. 30, n. 1, p. 3.

MCCULLOCH, W. S.; PITTS, W. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, Springer, v. 5, n. 4, p. 115–133, 1943. MILLER, G. S. The definition and rendering of terrain maps. In: ACM. ACM SIGGRAPH Computer Graphics. [S.l.], 1986. v. 20, n. 4, p. 39–48.

MUSGRAVE, F. K. Methods for realistic landscape imaging. Yale University, New Haven, CT, p. 21, 1993.

(56)

PARBERRY, I. Designer worlds: Procedural generation of infinite terrain from real-world elevation data. Journal of Computer Graphics Techniques, v. 3, n. 1, 2014.

PASZKE, A. et al. Automatic differentiation in pytorch. 2017.

PERLIN, K. An image synthesizer. ACM Siggraph Computer Graphics, v. 19, n. 3, p. 287–296, 1985.

RADFORD, A.; METZ, L.; CHINTALA, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

RAJASEKARAN, S. D. et al. Ptrm: Perceived terrain realism metrics. arXiv preprint arXiv:1909.04610, 2019.

RONNEBERGER, O.; FISCHER, P.; BROX, T. U-net: Convolutional networks for biomedical image segmentation. In: SPRINGER. International Conference on Medical image computing and computer-assisted intervention. [S.l.], 2015. p. 234–241.

ROSENBLATT, F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, American Psychological Association, v. 65, n. 6, p. 386, 1958.

SANTAMARÍA-IBIRIKA, A. et al. Procedural approach to volumetric terrain generation. The Visual Computer, Springer, v. 30, n. 9, p. 997–1007, 2014.

SAUNDERS, R. L. Terrainosaurus: realistic terrain synthesis using genetic algorithms. Tese (Doutorado) — Texas A&M University, 2007.

SMELIK, R. M. et al. A survey of procedural methods for terrain modelling. In: Proceedings of the CASA Workshop on 3D Advanced Media In Gaming And Simulation (3AMIGAS). [S.l.: s.n.], 2009. p. 25–34.

SMITH, A. R. Color gamut transform pairs. ACM Siggraph Computer Graphics, ACM, v. 12, n. 3, p. 12–19, 1978.

SMITH, G. et al. Pcg-based game design: enabling new play experiences through procedural content generation. In: ACM. Proceedings of the 2nd international workshop on procedural content generation in games. [S.l.], 2011. p. 7.

SPICK, R. J.; COWLING, P.; WALKER, J. A. Procedural generation using spatial gans for region-specific learning of elevation data. 2019 IEEE Conference on Games (CoG), p. 1–8, 2019.

SUN, J. et al. Template-based generation of road networks for virtual city modeling. In: ACM. Proceedings of the ACM symposium on Virtual reality software and technology. [S.l.], 2002. p. 33–40.

VOSS, R. F. Random fractal forgeries. In: Fundamental algorithms for computer graphics. [S.l.]: Springer, 1985. p. 805–835.

WATSON, B. et al. Procedural urban modeling in practice. IEEE Computer Graphics and Applications, IEEE, v. 28, n. 3, p. 18–26, 2008.

(57)

WEIß, S. Fast Voxel-Based Hydraulic Erosion. Dissertação (Bachelorarbeit) — Technische Universität München, 2016.

XIAODONG, C.; CHI-MAN, P. Improving the harmony of the composite image by spatial-separated attention module. arXiv preprint arXiv:1907.06406, 2019.

YEH, R. A. et al. Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2017. p. 5485–5493.

ZHANG, Z.; SONG, Y.; QI, H. Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [S.l.: s.n.], 2017. p. 5810–5818.

ZHAO, B. et al. Multi-view image generation from a single-view. In: ACM. 2018 ACM Multimedia Conference on Multimedia Conference. [S.l.], 2018. p. 383–391.

ZHOU, Y. et al. Non-stationary texture synthesis by adversarial expansion. ACM Transactions on Graphics (TOG), ACM, v. 37, n. 4, p. 49, 2018.

ZHU, J.-Y. et al. Generative visual manipulation on the natural image manifold. In: SPRINGER. European Conference on Computer Vision. [S.l.], 2016. p. 597–613.

ZHU, J.-Y. et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. [S.l.: s.n.], 2017. p. 2223–2232.