Real-time data processing and multimodal interactive visualization using GPU and data structures for geology and geophysics

(1)

UNIVERSIDADEFEDERALDO RIO GRANDE DO NORTE PROGRAMA DEPÓS-GRADUAÇÃO EMSISTEMAS ECOMPUTAÇÃO

Real-time Data Processing and Multimodal

Interactive Visualization Using GPU and Data

Structures for Geology and Geophysics

Waldson Patrício do Nascimento Leandro

Advisor Professor: Bruno Motta de Carvalho, Ph.D. Co-advisor Professor: Selan Rodrigues dos Santos, Ph.D.

Doctoral Thesis presented to Systems and Compu-ting Post-graduation Program of UFRN (concentra-tion area: Graphics Processing and Computa(concentra-tional In-teligence) as a mandatory requirement to earn a Ph.D. title in Computing and Systems.

(2)

Catalogação de Publicação na Fonte. UFRN - Biblioteca Setorial Prof. Ronaldo Xavier de Arruda - CCET

Leandro, Waldson Patrício do Nascimento.

Real-time data processing and multimodal interactive visualization using GPU and data structures for geology and geophysics / Waldson Patrício do Nasci-mento Leandro. - 2019.

89f.: il.

Tese (doutorado) - Universidade Federal do Rio Grande do Norte, Centro de Ciências Exatas e da Terra, Departamento de Informática e Matemática Apli-cada. Natal, 2019.

Orientador: Bruno Motta de Carvalho Coorientador: Selan Rodrigues dos Santos

1. Computação - Tese. 2. Renderização de dados massivos - Tese. 3. Proces-samento via GPU - Tese. 4. Interação - Tese. 5. Estruturas de aceleração - Tese. I. Carvalho, Bruno Motta de. II. Santos, Selan Rodrigues dos. III. Título.

(3)

Massive Interactive Visualization Using GPU

and Data Structures for Geology and

Geophysics

Waldson Patricio do Nascimento Leandro

Tese de Doutorado aprovada em 30 de agosto de 2019 pela banca examinadora composta pelos seguintes membros:

Prof. Dr. Bruno Motta de Carvalho (orientador) . . . DIMAp/UFRN

Prof. Dr. Selan Rodrigues dos Santos (coorientador) . . . DIMAp/UFRN

Prof. Dr. Marcio Eduardo Kreutz . . . DIMAp/UFFN

Prof. Dr. Joaquim Bento Cavalcante Neto . . . UFC

(4)

(5)

Ao meu orientador e ao meu coorientador, professores Bruno Carvalho e Selan Rodrigues, sou grato pela orientação e pelos ensinamentos.

À Raquel Assunção, pelo apoio, companheirismo e compreensão durante toda esta jor-nada.

À minha família, por acreditar em mim e me incentivar. Sem eles eu não teria chegado tão longe.

(6)

With the advent of new sensors, the amount of data being analyzed has increased more than the ability to process them in real time. If we consider joining data from different sensors to display simultaneously, it becomes a complicated task to perform using con-ventional visualization algorithms and techniques. In this work, we explore methods to process data in real time and visualize multivariate massive data within interactive time using Geosciences as background. The use of several devices and methods are part of the daily activities of this knowledge niche and both faster processing and interactive visual-ization bring new analysis and interpretation possibilities of these massive datasets.

Keywords: Massive Data Rendering, GPU Processing, Interaction, Acceleration Stru-ctures.

(7)

Com o advento de novos sensores, a quantidade de dados a ser analisada aumentou mais do que a capacidade de processá-los em tempo real. E quando levamos em con-sideração a junção de dados de diferentes sensores para uma visualização única, esta se torna uma tarefa complicada de se realizar utilizando algoritmos e técnicas convencionais de visualização. Neste trabalho, nós exploramos métodos para processar em tempo real e visualizar interativamente dados multivariados massivos, usando como plano de fundo a área de Geociências. A utilização de diversos dispositivos e métodos fazem parte das atividades cotidianas deste nicho do conhecimento e tanto o processamento rápido quanto a visualização interativa trazem novas possibilidades de análise e interpretação destes da-dos massivos.

Palavras-chave: Renderização de Dados Massivos, Processamento via GPU, Inter-ação, Estruturas de aceleração.

(8)

Summary i

List of Figures iii

List of Tables v

List of Algorithms vi

1 Introduction 1

2 Background 4

2.1 Geophysical Data Acquisition and Processing . . . 4

2.1.1 Ground Penetrating Radar – GPR . . . 4

2.1.2 Light Detection And Ranging – LIDAR . . . 6

2.1.3 Electrical Resistivity Tomography – ERT . . . 6

2.1.4 Geophones . . . 7

2.2 General Programming GPU & Parallelization . . . 8

2.3 Multivariate Data Fusion . . . 9

2.4 Visualization Techniques for Massive Datasets . . . 10

2.4.1 Octrees . . . 10

2.4.2 KD-trees . . . 13

2.5 Challenges . . . 14

2.5.1 Analysis of Multimodal Spatial Data . . . 14

2.5.2 Big Data . . . 15

2.6 Motivation . . . 15

2.7 Goals . . . 16

2.8 Hypotheses . . . 17

2.9 Methodology . . . 17

3 Parallel Source Scanning Algorithm - PSSA 19 3.1 Introduction . . . 19

(9)

3.3.1 SSA . . . 22

3.3.2 Paralelization of SSA . . . 26

3.4 Numerical Experiments . . . 28

3.4.1 Test Cases . . . 28

3.4.2 The tests platforms . . . 31

3.4.3 Performance results . . . 31

3.4.4 Client-Server pSSA for Real-Time Processing . . . 33

3.4.5 Client-Server Architecture . . . 34

3.5 Discussion . . . 35

3.6 Conclusion . . . 36

4 Massive Multivariate Data Visualization 38 4.1 Introduction . . . 38

4.2 Background . . . 38

4.3 Methodology . . . 42

4.3.1 Traditional Visualization Pipeline . . . 42

4.3.2 Multivariate Visualization Pipelines . . . 43

4.3.3 Big Data Interactive Multivariate Visualization Pipeline . . . 45

4.3.4 Testbed Implementation . . . 46 4.4 Results . . . 57 4.5 Discussion . . . 58 4.6 Conclusion . . . 59 5 Conclusion 60 References 61

(10)

2.1 GPR Visualization . . . 6

2.2 Point cloud visualization . . . 7

2.3 ERT sample visualization . . . 7

2.4 Shallow seismic data visualization. . . 8

2.5 Visual representation of an octree . . . 11

2.6 Visual representation of a tridimensional kd-tree . . . 14

3.1 Output of the SSA location process is a coherence volume for each time step (∆t). The maximum coherence is attained at the hypocenter and the origin time of the event. . . 22

3.2 Pipeline of SSA brightness volume generation. . . 23

3.3 Station log file format. . . 24

3.4 Brightness voxel file format. . . 26

3.5 Station location on the surface (black triangles) and synthetics events lo-cation between 200 and 300 m depth shown as black circles . . . 29

3.6 (a) Noise-free synthetic seismograms for the events at 500 m depth at the center of the array shown in Figure 3.5. (b) Noise-added version of (a) . . 30

3.7 Brightness map for the event positioned at x=0.0000 km, y=0.0000 km, depth=0.5000 km e t=60.000 s (central event from Figure 3.5). This event was located at we pSSA algorithm on position x=0.0062 km, y=-0.0062, depth=0.5062 km e t=60.015 s. These coordinates showed the highest brightness (coherence) value for the event. . . 31

3.8 Bar graph illustrating the results from the test cases for SSA, OpenMP and pSSA, for platforms (a) and (b) . . . 33

4.1 Data Visualization Example - Tokio Metro . . . 39

4.2 Data Visualization Exapmle - Dow Jones . . . 39

4.3 Scientific Visualization Example - Segmented Hand . . . 40

(11)

4.6 Multivariate extended visualization model . . . 44

4.7 Multivariate pipelines with modifications at different steps . . . 44

4.8 Big Data Interactive Visualization Pipeline. . . 45

4.9 View Frustum . . . 49

4.10 Test case #1 . . . 51

4.11 Test case #2 . . . 52

4.12 Test case #3 . . . 52

4.13 Test case #4 . . . 53

4.14 Test case #5 and #6 . . . 53

4.15 Test case #7 and #8 . . . 54

4.16 Test case #9 . . . 54 4.17 Test case #10 . . . 55 4.18 Test case #11 . . . 55 4.19 Test case #12 . . . 56 4.20 Test case #13 . . . 56 4.21 Test case #14 . . . 57 4.22 Average FPS Chart . . . 58

(12)

2.1 Geophysical Methods . . . 5

2.2 Application and their methods. . . 5

3.1 Execution times (in seconds) for platform a. S/O and S/C represent the speedups achieved by the OpenMP and CUDA versions, respectively, when compared with the sequential version. . . 32

3.2 Execution times (in seconds) for platform b. S/O and S/C represent the speedups achieved by the OpenMP and CUDA versions, respectively, when compared with the sequential version. . . 32

4.1 Experiment’s datasets . . . 50

4.2 Experiment’s test cases . . . 51

4.3 Results of all tests cases. . . 57

(13)

2.1 buildOctree. . . 12

2.2 updateVisibility . . . 12

2.3 kdtree . . . 13

3.1 Travel time between sensor and grid position . . . 24

3.2 Amplitude Stacking . . . 25

3.3 Brightness volumes and maximum brightness . . . 26

3.4 Parallel Stacking . . . 27

(14)

Introduction

Visualization is a computational method employed to graphically represent complex data. In scientific and engineering areas, visualization has become a tool of discovery and understanding. Through its use, it is possible to ensure the integrity of an analysis, delve into the data and transmit that information in a way that is easier to comprehend than when using only numerical data. Visualization is mainly used to simulate physical processes and phenomena that are very small, very big, very fast, very slow, or very dangerous [102] with comprehension as the ultimate goal. Ware [151] lists some advantages of the usage of visualization:

• It provides understanding of a huge amount of data; • It allows the perception of unseen emerging properties; • It generally shows problems in data quickly;

• It eases the comprehension of both big and small features on datasets.

Interaction is an essential component of visualization. It can influence the perception of the data set as well as the utility of the visualization itself. The effective integration of visualization and interaction is called Interactive Visualization. Zudilova, Adriaansen, and Liere [169] define four possibilities of interaction between users and an interactive visualization system:

• The user can adjust visualization parameters;

• The user can interact with visible objects and scenes; • The user can remove information from the original dataset; • The user can modify visualized data.

(15)

Visualization tools are widely used by researches in different fields such as molec-ular modeling, medical imaging, cerebral structure and functioning, Mathematics, Geo-sciences, space exploration, Astrophysics, computational fluid dynamics, and finite ele-ment analysis [38]. Using visualization tools for data analysis with comprehension and introspection as goal is called Scientific Visualization [16, 12]. Scientific visualizations are focused on the representation of real objects with spatial dimensions, typically in three dimensions (3D) [144, 123].

A large number of application have been using multifaceted data [63, 11, 47, 99, 71]. Multifaceted data are heterogeneous input used together. Kehrer e Hauser [82] classify multifaceted scientific data as:

• Spatiotemporal: data that represents spatial structures and/or dynamic processes; • Multivariate: data with different attributes;

• Multimodal: data acquired from different modalities;

• Multirun: data gathered from different simulation runs with distinct parameters; • Multimodel: data from the simulation of the interaction of different models.

In various areas, such as Medicine and Geosciences, the use of large amount of data is frequent. The Visible Human Project [1], for example, is a multimodal dataset that represents average adults of both sexes, has approximately 55 gibabytes (GB) of data. There are various problems that may occur by using a dataset of this magnitude. One of them is allocation in the main memory. Data that can not be processed in a conventional way are called big-data [101]. The set of techniques that solve this memory problems of big-data datasets are called out-of-core techniques. They work by partitioning large amounts of data that do not fit in main memory (in-core) to secondary storage devices and control how these chunks of data should be processed. An overview of classical out-of-core techniques applied to visualization and computer graphics can be seen in Silva et al. [138].

Interactive visualization for large datasets is still a relevant area of research [66, 8, 11, 52, 57, 135, 134, 76]. Despite the significantly increase of dataset sizes, visualization techniques have not changed as fast as it should, and the use of large renderings clusters are still widely used [143, 32, 142]. As an alternative to this approach, there are visu-alization techniques that utilizes Graphical Processing Units (GPU) to reduce response time. These techniques brougth interaction in visualizations where traditional CPU based

(16)

techniques were not able to do [89, 154, 11, 87]. Beyer, Hadwiger, and Pfister [10] show an overview of the state of the art of GPU large scale interactive visualization techniques. Another important aspect when dealing with large amounts of data is how to process them efficiently. In order to take some useful information from raw data, we need to process it somehow. When dealing with large amounts of data, this process is generally slow. By slow we mean that the time to process the data is higher than the time to acquire it. As in many other areas, the ability to have information as fast as possible is crucial in the process of decision making and could potentially save resources in Geophysics and Geology. For example, real time interaction while a survey is being conducted in the field can reduce the duration of the survey and improve the quality and precision of data collected, saving both time and money. The processing of large amounts of data is still relevant due the fact that the amount of data generated is steadily increasing [162, 118].

This work is focused on techniques for creation of accelerated large scale multimodal visualization tools and big data processing using GPU and data structures. We propose two methods for accelerate processing and visualization of big data: the first uses GPUs to process seismic data in real-time, and the second utilizes state-of-the-art techniques to render large point cloud data that reduces the time needed to render it, and enables multimodal interactive rendering.

The rest of this document is divided as follows: Chapter 2 focuses on definitions, literature review, problem definition, hypothesis, and goals; Chapter 3 presents PSSA, our GPU powered seismic data real time processor; Chapter 4 shows our multimodal massive data renderer; Chapter 5 is where we draw our general conclusions.

(17)

Background

This chapter presents an overview of some of the methods used to collect geophysical data, as well as General Programming GPU, and visualization acceleration strucutures. At the end, we present the challenges we are tackling, our goals, and hypothesis.

2.1 Geophysical Data Acquisition and Processing

The use of multimodal data is very common in geophysical analysis. Depending on the application, one or more methods can be employed to figure out properties of the physical structure and a possible geological importance of what is being analyzed. Ex-ploration geophysical methods can be divided into 8 categories: Seismic (S), Gravity (G), Magnetic (M), Electrical Resistivity (ER), Induced Polarization (IP), Self-potential (SP), Electromagnetic (EM) and Radar (R) [81].

Each one of the methods uses a physical property to operate. Table 2.1 presents the methods, the measured parameter, and operative physical property. Each of the meth-ods is used for different purposes. Table 2.2 displays examples of applications and the appropriate methods for each of them.

Each of the geophysical methods presented in Table 2.1 has specific devices for cap-turing data. Subsections 2.1.1, 2.1.2, 2.1.3, and 2.1.4 briefly talk about some of these devices.

2.1.1 Ground Penetrating Radar – GPR

The basic principle of GPR is the transmission of a short electromagnetic pulse, with a specific frequency, directly through the ground, and the recording of reflected energy as a function of time, amplitude, and phase. The specified center frequency is controlled by a transmitter and a receiving antenna.

(18)

Method Measured Parameter Operative Physical Property Seismic Travel times of reflected and

refracted seismic waves

Elastic moduli, which deter-mine the propagation velocity of seismic waves

Gravity

Spatial variations in the mag-nitude of the gravitational fi-eld of the Earth

Density

Magnetic

Spatial variations in the mag-nitude of the geomagnetic fi-eld

Magnetic sensitivity and re-manence

Electrical Resistivity Earth resistance Electrical conductivity Induced Polarization

Polarization voltages or fquency dependent ground re-sistance

Electrical capacitance Self-potential Electrical potentials Electrical conductivity Eletromagnetic Response to electromagnetic

radiation

Electrical conductivity and in-ductance

Radar Travel times of reflected radar

pulses Dielectric constant Table 2.1: Geophysical Methods. From Kearey, Brooks, and Hill [81]. Application Appropriate Method* Exploration for fossil fuels (oil, gas, coal) S, G, M, (EM)

Exploration for metalliferous mineral deposits M, EM, E, SP, IP, R Exploration for bulk mineral deposits (sand and gravel) S, (E), (G)

Exploration for underground water supplies E, S, (G), (Rd) Engineering/construction site investigation E, S, Rd. (G), (M) Archaeological investigations Rd, E, EM, M, (S)

Table 2.2: Application and their appropriate methods. From Kearey, Brooks, and Hill [81].

* G, gravity; M, magnetic; S, seismic; E, electrical resistivity; SP, self-potential; IP, in-duced polarization; EM, electromagnetic; R, radiometric; Rd, ground-penetrating radar. Subsidiary methods in brackets.

The propagation of electromagnetic waves is determined by the properties of materials such as electrical permeability, electrical conductivity, and magnetic permeability. If the electromagnetic wave velocity of the subsurface material is known, the returned energy

(19)

Figure 2.1: GPR Sample Visualization. From: Saarenketo e Scullion [128].

can be converted in depth [34]. Figure 2.1 shows an example of a visualization of GPR data.

2.1.2 Light Detection And Ranging – LIDAR

This technology uses an optical-mechanical system consisting of servomotors, low-energy and low-emission LASERs (Light Amplification by Stimulated Emission of Ra-diation), and light sensors to measure properties of a distant object [147]. In general, measuring the distance between the equipment and the target uses two basic principles: time-of-flight (TOF) and triangulation. The acquisition process basically consists of emit-ting a laser pulse directed with the help of the scanning mirror that is commanded by the servomotor. Upon reaching the object, part of the energy is reflected to the equipment and captured by the sensor. The product of a scan is a “point cloud”, which in turn is the Surface Numerical Model (SNM) of the object of study. Figure 2.2 shows an example of a point cloud generated by a LIDAR.

2.1.3 Electrical Resistivity Tomography – ERT

Electrical Resistivity Tomography is an electrical test method where electrical cur-rent is induced on the ground using two electrodes. The drop in electrical potential is read using two other electrodes. By changing the distance between the electrodes,

(20)

differ-Figure 2.2: Visualization of a point cloud generated by LIDAR. From: Schutz [133].

ent volumes of the subsurface are sensed and additional information about resistivity at different points is obtained [33]. Figure 2.3 exhibits an example of displaying ERT data.

Figure 2.3: Sample visualization of ERT data. From: Chambers et al. [22].

2.1.4 Geophones

Geophones are sensors that detect signals of elastic waves generated by seismic sour-ces. These sources can be natural or artificial (for example, generated by vibrating de-vices, hammers, or explosions). The detected signals are sent to a seismograph where they are recorded [140]. Geophones are commonly used to capture seismic data from subsurfaces near the surface (they are also called the shallow subsurfaces). The Figure

(21)

2.4 shows an example of subsurface data collected with geophones.

Figure 2.4: Shallow seismic data visualization.

2.2 General Programming GPU & Parallelization

Graphical Processing Units (GPUs) are used primarily to process graphical data and speed up render time. They are composed of a large number of small processors that are not so powerful, but can run computations simultaneously. They are very efficient in doing single instruction multiple data (SIMD) [109] operations. These features are useful not only in graphical processing, but in a large set of different problems from different areas. The usage of GPUs to solve problems not related to graphics processing is called GPGPU (General Purpose Graphical Processing Unit). In the early days of GPUs, there were not an easy way to do this, and researchers have to adapt the problem to simulate a graphical one. Nowadays it is easier to use GPUs to solve general purpose problems. GPUs vendors and the scientific community created libraries to ease the task, such as CUDA [110] and OpenCL [141]. A significant reduction in processing time is the expected result of using a GPU to do a computational task. Many surveys GPGPU

(22)

architectures and applications can be found in the literature [114, 39, 18, 136, 139, 145, 148].

2.3 Multivariate Data Fusion

Data fusion, as a general and popular multi-discipline approach, combines data from multiple sources to improve the potential values and interpretation performances of the source data, and to produce a high-quality visible representation of the data [165]. As consequence of its use, there is an improvement in accuracy, precision, and reduction in uncertainty.

Besides these advantages, data fusion is a challenging task. There are a lot of problems that can occur when trying to fuse data. Khaleghi et al. [83] enumerates some of these challenges:

• Data imperfection;

• Outliers and spurious data; • Conflicting data;

• Data modality; • Data correlation;

• Data alignment/registration; • Data association;

• Static vs Dynamic Phenomena; • Data dimensionality.

Another problem is the visualization of these merged data. In fact, it is generally believed that not a single visualization technique can completely describre all important aspects of a dataset [158].

The data fusion process can be categorized in three main levels: Measurement level, Feature level, and Decision level. Measurement level is the actual combination of all sen-sors’ data; Feature level combines features of the sensors to create new features that can be used as a more precise descriptor of the information; Decision level uses the afore-mentioned levels’ outputs to answer questions about the analyzed data, generally with a confidence score.

(23)

In visualization of large datasets, data fusion can increase the insightfulness of the analysis and possibly reveal some information that were not visible using only one type of data.

2.4 Visualization Techniques for Massive Datasets

We can divide tridimensional scientific visualization in three types: volumetric, sur-face, and point cloud. Volumetric rendering is the direct visualization of samples collected in three dimensions [155] and which are controlled through transfer functions. The trans-fer functions map scalar values to custom colors and opacities, being able to diftrans-ferentiate different components in the data analyzed; Surface rendering works similarly to volume rendering, but assumes that each scalar value belongs to a structure and displays that structure as a single object, usually opaque. This type of rendering is useful when we want to analyze structures and delimitations of an object of interest in volumetric data; Point cloud rendering consists of displaying unstructured discrete values with an associ-ated intensity [164]. This type of visualization has become quite common with the advent of low-cost sensors such as Kinect [167].

For each of the types, there are several visualization techniques that allow display of large amounts of data [161, 26, 56, 5, 122, 121, 59, 130, 166]. A large percentage of these solutions use out-of-core techniques – which use a secondary memory to page the data to be displayed – and an acceleration structure. These acceleration structures generally work with spatial subdivision to reduce the search complexity of the elements that must be rendered. Among the acceleration structures, the most used ones are Octrees and KD-trees. The Subsections 2.4.1, and 2.4.2 presents them with more details.

2.4.1 Octrees

Octree is a data structure that subdivides a three-dimensional volume recursively into eight disjoint cells, called octant [104]. Recursion occurs until a cell contains a number of objects less than a predefined threshold or until it reaches a maximum depth. Each cell is subdivided into three planes, usually in the center of the parent node. There is also the possibility of not allocating any of the children, which allows the efficient storage of sparse data. Octrees are widely used in computer graphics to optimize collision detection, closer neighborhood search, frustrum culling, and so on. This optimization occurs due to the fact that instead of checking each object in the scene, each octant is checked, which transforms the complexity of this search from linear (O(N)) to sublinear (O(log(N))).

(24)

Algorithm 2.1 presents the algorithm for the construction of an octree. Figure 2.5 shows a visual representation of an octree.

Figure 2.5: Visual representation of an octree. From: Wikipedia [157].

For visualizing massive data, octree is used in out-of-core algorithms where each cell (namely octet) is loaded or unloaded from main memory according to its visibility to the user. The visibility of each octet is tested: If the cell is in a fully visible position for the user, it is loaded into main memory and displayed; If the cell is partially visible to the user, each child node in this octet is tested separately; If the cell is not visible and is in main memory, it is removed from memory. Algorithm 2.2 presents this strategy for viewing massive data using an octree.

(25)

Algorithm 2.1: buildOctree

Input: Ob jects, boundingBox, level, maxLevel, nodeMaxQtty Output: OctreeRootNode

1 Ob jetctsInside← Vector() 2 for each o ∈ Ob jetcs do

3 if boundingBox.contains(o) then 4 Ob jectsInside.push(o) 5 node← OctreeNode()

6 no.setBB(boundingBox)

7 isLea f ← Ob jectsInside.size() < nodeMaxQtty || level = maxLevel 8 if isLea f then

9 node.setChildren(Ob jectsInside) 10 else

11 BbChildren← subdivide(boundingBox) 12 for each bb ∈ BbChildren do

13 node.addChild(buildOctree(Ob jectsInside, bb, level +

1, maxLevel, nodeMaxQtty))

14 return node

Algorithm 2.2: updateVisibility Input: OctreeNode

1 for each o ∈ children(OctreeNode) do 2 state← getState(o)

3 if state = ”fully_visible” then

4 loadFromMainMemoryAndDisplay(o) 5 else if state = ”partialy_visible” then 6 atualizaVisibilidade(o)

7 else if isInMainMemory(o) then 8 removeFromMainMemory(o)

(26)

2.4.2 KD-trees

Kd-trees [6] are binary trees used for spatial partitioning for organizing points in a k-dimensional space. Each non leaf node is a k-dimensional point in space and is divided in two parts: the left and the right child of the node. Each level of tree uses one coordinate of the point position (e.g.: for x coordinate on level 1, then y coordinate on level 2, and x again on level 3). The dividing point is generally the median point. The child on the left side has a coordinate value lesser than the value of the node, and the child on the right are greater than or equal to the node value. Kd-trees are broadly used in solution to problems that involve nearest neighbors searches. Unlike octrees, each partition contains at least one element. Algorithm 2.3 shows the algorithm for the construction of a kd-tree. Figure 2.6 shows a visual representation of a tridimensional kd-tree.

Algorithm 2.3: kdtree Input: Points, depth, k Output: kdTreeRootNode

1 coord← depth mod k

2 value← getMedianO f Coord(Points, coord) 3 node← Node(value)

4 node.le f tChild ←

kdtree(pointsBelow(Points, value), depth + 1, k))

5 node.rightChild ←

kdtree(pointsAboveOrEqual(Points, value), depth + 1, k)

(27)

Figure 2.6: Visual representation of a tridimensional kd-tree. The red, green, and blue lines represent the three first subdivisions of this tridimensional tree, respectively. From: Wikimedia Commons [28].

2.5 Challenges

This subsection presents challenges related to processing, analysis and visualization of multimodal large datasets.

2.5.1 Analysis of Multimodal Spatial Data

Although multivariate data analysis is common [67, 108, 44], the same does not apply to multimodal spatial data such as those presented in the previous subsections. Because the number of possibilities to present data visually are limited, simultaneously displaying multiple information may be inefficient and confused. Bertin [7] enumerates seven visual variables that can be used to convey information with apparent meaning. Two of them are

(28)

planar – x and y – and the other 5 are called “retinal” – color, size, shape, orientation and color tint. As spatial data use intrinsically the planar variables and some kind of intensity, there are overload on them to display more than one information simultaneously.

When it comes to three-dimensional visualizations, the situation is a bit more problem-atic since the depth is mapped to the planar variables. Three-dimensional images are ren-dered using only x and y. A perspective is used to give the impression of depth, but it is just that, an impression. We have not found any studies that analyze three-dimensional mul-timodal spatial data visualization, only two-dimensional spatial multivariate data, mainly using different maps and graphs [4, 53, 13].

2.5.2 Big Data

As you can see in Table 2.2, several methods are used for each application. With the advance in the sensors of the devices currently available, the amount of data to be analyzed is very large. Cota et al. [31] present the five “V”s challenges that big-data visualizations have:

• Volume. Refers to bit amount of data generated each second; • Variety. Refers to differtent types of generated data;

• Velocity. Refers to the speed of generated and managed data; • Veracity. Refers to the quality and reliability of colected data; • Value. Refers to generate knowledge from analyzed data.

Since our goal is to generate knowledge from the visualization of massive multimodal data, the use of mass data visualization and analysis techniques joins the previous problem and makes the development of solutions for both more challenging.

2.6 Motivation

With the advent of new technologies to capture even more accurate data, the need for new tools to creating value from the big-data is increasingly evident. There is also the need for new technologies and techniques to process, integrate, analyze, visualize and consume the increasing amount of datasets [8]. The creation of these technologies and techniques for big-data is the next frontier for innovation, competition and productivity [75].

(29)

Issues are still open in processing and interactive rendering of big-data, and in the fu-sion of multimodal data. Beyer, Hadwiger and Pfister [10] argue that traditional rendering in GPUs have several shortcomings, especially in scalability. Lahat, Adaly and Jutten [88] present several challenges still open in various aspects of multimodal data fusion.

There are open questions in the interactive visualization area as well. Childs [25] lists a number of research questions that visualization tools should resolve now. Among them, Childs asks questions about how big-data will change the interaction in these tools.

The problem we want to tackle comes from all of these open question and can be summarized as: How to process and visualize multimodal large datasets efficiently?

It is necessary to make clear that this problem can have a large number of different so-lutions. We focused our efforts in Geophysics and Geology datasets. From our research, we were able to find only two related works that tries to solve similar problems. In “3D data visualization: The advantages of volume graphics and big data to support geologic interpretation”, Byers and Woo [17] were able to display large multivariate data by con-verting them to voxels and displaying them in a sparse grid. They convert all different types of data (volumetric, surface, and point cloud) to voxels which limits the visualiza-tion to this kind of rendering. They also reduce the size of the dataset by subsampling it, which causes loss of information. In “Potree: Rendering Large PointClouds in Web Browsers”, Schuetz [133] presents a web browser based point cloud renderer capable of handling billions of points. A octree structure is used to store the points and his work is considered state-of-the-art. The limitation of this method is that it is capable of render point clouds only.

2.7 Goals

From investigation of both interactive multimodal big-data rendering and big-data processing, the goal of this work is to provide methods to process and visualize interactive scalable big-data focused on multimodal data integration using acceleration structures, out-of-core algorithms, having Geophysics and Geology as a study case.

The scope of this work is limited to investigating data scalability, multimodal data visualization, efficient rendering, efficient processing, and interaction problems fo-cused on Geophysics and Geology applications.

(30)

2.8 Hypotheses

H1. It is possible to improve performance of currently available solutions for pro-cessing big-data using GPUs and data structures;

H2. It is possible to provide an interactive rate for a visualization of simultaneous multimodal large datasets using GPUs and data structures;

2.9 Methodology

In order to investigate our hypotheses, we worked on two different lines: one focused on processing, discussed in Chapter 3, and another focused and multimodal visualization, discussed in Chapter 4.

On Chapter 3 we present “Parallel Source Scanning Algorithm” (PSSA), which is the implementation of the Source Scanning Algorithm (SSA) [78] using two parallel different approaches (CPU and GPU based). A server was also developed to provide real time processing of the results.

On Chapter 4 we present a renderer capable of handling multimodal massive data. It works by modifying the traditional visualization pipeline in order to satisfy the need that this kind of visualization imposes. As a proof of concept, we implemented a state-of-the-art point cloud pstate-of-the-artitioner that increases the framerate up to 250 times, which allows other datasets to be rendered along with it.

Despite different nature, both approaches share some common properties:

• Lossless. This means that there is not any kind of data loss in the process. This feature is important because the reduction of information can result in imprecise results and can potentially hide characteristics buried in the datasets.

• Run on mid-end devices. It is a common practice to process and visualize large datasets on high-end computers like dedicated servers, clusters, and even on spe-cialized cloud services. Due to the intrinsic nature of the work, sometimes it is necessary to analyze this data in remote areas with no internet connection, and the usage of these high-end options are not feasible. The projects presented here are capable of running on any mid-end regular computer with a off-the-shelf graphics card. This means that a regular laptop can be used as the host for the applications, which enables the analysis to be made in loco.

(31)

• Quantitative Results. They both are concerned in perform their respective tasks more efficiently. While the PSSA seeks to reduce the time to process information, the multimodal renderer focuses on the increase in the number of frames per second.

(32)

Parallel Source Scanning Algorithm

-PSSA

3.1 Introduction

Earthquake location is a crucial parameter to be estimated in seismology and hence, it is a very active research topic because this parameter is critical to active fault characteri-zation. The hypocentral location problem is generally posed as finding a set of parameters (hypocentral coordinates and origin time) from a set of phase arrival times. It is interest-ing to note that the posinterest-ing of this problem assumes that: 1) the arrivals are visible across a sufficient number of stations in the recording network; 2) the arrivals have to be cor-rectly identified before the actual location of the seismic event [117, 50]. Traditionally, single-event hypocentral location algorithms rely on Geiger’s method in which an initial guess for the hypocentre is iteratively perturbed towards a minimum of a fitting functional between modeled and observed arrival times [91, 84, 85, 97].

In the last two decades, an increasing number of dense microseismic monitoring net-works (ranging from local to global scales) having continuous digital records have become the norm for seismological and industrial applications. This has lead to a vast amount of data being processed, and thus making the visual inspection with consequent phase pick-ing practically impossible. Particularly, in industrial applications, it is common that the desired signal in buried under noise which makes correct phase picking a challenging task [125].

Single-event hypocentral inversion procedures which uses arrival time picking make little use or simply ignore the wealth of information contained in the waveforms recorded at the different stations [62]. Therefore, The use of methods using waveform stacking has become increasingly popular among the microseismic monitoring community because they preclude neither manual nor automatic phase identification [62, 21].

(33)

Such methods are classified into two groups, according to [21]. The first group en-compasses those methods using time-reversed seismograms as virtual sources in which the wave field is backpropagated from each virtual source to the original source, corre-sponding to the location in space having the maximum energy [103, 50, 54].

The second group contains those methods using the notions of delay and stacking of recorded seismic waveforms [43, 55, 40, 61, 163] and they are all somewhat varia-tions of the pioneering work of [79] named Source Scanning Algorithm – SSA. In the SSA, the source location is estimated using a brightness function obtained from stacking the absolute amplitudes normalized seismograms recorded at several stations. After this work, many modified approaches have been proposed (see [21] for a comprehensive list of references).

One of the advantages of the SSA is that the approach is quite easy to implement but has the drawback of been computationally expensive. For instance, surface microseismic monitoring used in the oil industry relies of continuous records on thousands of channels sampling typically at 2 ms for hours or even months yielding to several Terabytes (TBs) of data [42, 137, 95].

When using SSA-like methods, a typical scan would have 400 voxels in the x and y dimensions, 200 voxels in the z dimension and 107time steps per day [40]. When dealing with such cumbersome amount of data, serial processing can be quite time consuming and in many cases realistically unfeasible. Hence, it is of paramount importance to develop alternative strategies to handle such data processing. The reasoning applies in both post-data collection processing and becomes more critical if real time processing comes into play, and also when using time-reversal approaches to locate the microseismic events [160, 159].

Our approach is then to parallelize the SSA using Graphics Processing Units (GPUs). GPUs are used primarily to process graphical data and speed up render time. They are composed of a large number of small processors that are not so powerful, but can run computations simultaneously. They are very efficient in doing single instruction multiple data (SIMD) [109] operations. These features are useful not only in graphical processing, but in a large set of different problems from different areas. The usage of GPUs to solve problems not related to graphics processing is called GPGPU (General Purpose Graphical Processing Unit). In the early days of GPUs, this task was complicated because the re-searchers had to adapt the problems to simulate a graphical one. Nowadays it is easier to use GPUs to solve general purpose problems. GPUs vendors and the scientific community created libraries to ease the task, such as CUDA [110] and OpenCL [141]. These small graphical processors are called CUDA cores when using CUDA, and compute units when

(34)

using OpenCL. A significant reduction in processing time is the expected result of using a GPU to do a computational task. A survey of GPGPU architectures and applications can be found at [114].

Graphics processing unit (GPU) acceleration has been widely applied in various computing-intensive geophysical tasks recently [126, 160, 69, 127, 93, 94, 2]. Using a GPU, many computationally intensive algorithms and applications in geophysics can be better han-dled because they run considerably faster. We present here a GPU parallel processing version of the SSA, that we named pSSA. In the following sections, we review the tradi-tional SSA method, followed by our parallel implementation. Further on, we present our results, draw conclusions, and expose our future developments.

3.2 The Source Scanning Algorithm

The SSA approach is very straightforward as it discretizes the time and region of inter-est in a 4D vector and, for each grid point and origin time, a brigthness value (seismogram stacking) is calculated. For a given set of velocity model parameters, when the origin time and hypocenter of the seismic event are correct, a maximum value for brightness (or co-herence) is achieved. The result is displayed on brightness volumes for each origin time. We use the definition of Kao & Shan (2004) for brightness:

br(−→x,t∗) = 1 S S

∑

s=1 |un(t∗+ t−→x s)|, (3.1)

where unin the normalized seismogram recorded at the s-th sensor and t−→x sis the predicted

travel-time from position−→x to the s-th sensor. If all the largest amplitudes originate from a seismic source at position −→x and time t∗, then br(−→x,t∗) reaches a maximum. We therefore systematically search for the entire set of trial positions−→x and origin times t∗to map the brightness distribution in space and time. shows a 2D seismic location problem with sensors deployed on the surface. The Figure 3.1 illustrates the output from a SSA location. For each time step (∆t), a coherence matrix is calculated (Figure 3.1) and at the correct location and time step t0 (Figure 3.1, middle panel), the waveform-stacking

(35)

− 0.2 − 0.1 0.0 0.1 0.2 East ing (km ) − 0.2 − 0.1 0.0 0.1 0.2 N o rt h in g ( k m ) − 0.2 − 0.1 0.0 0.1 0.2 East ing (km ) − 0.2 − 0.1 0.0 0.1 0.2 N o rt h in g ( k m ) − 0.2 − 0.1 0.0 0.1 0.2 East ing (km ) − 0.2 − 0.1 0.0 0.1 0.2 N o rt h in g ( k m ) 0.0 0.2 0.4 0.6 0.8 1.0 co h e re n ce

Figure 3.1: Output of the SSA location process is a coherence volume for each time step (∆t). The maximum coherence is attained at the hypocenter and the origin time of the event.

3.3 The Parallel Source Scanning Algorithm

3.3.1 SSA

In practice, we can divide the calculation process of the brightness volumes into 3 steps: Setup, Stacking, and Thresholding. Figure 3.2 shows the complete pipeline for the generation of brightness volumes such as the one shown on Figure 3.1, and each step is described in this section. In our implementation, we assume that the data we use has already been band pass filtered in the frequency range of interest, is normalized by the maximum value in each recorded seismogram, and taken the absolute value of each sample.

(36)

Filtering

Stacking Setup

Thresholding

Brightness maps

Crustal Model Grid Limits Brightness Threshold

Stations’s Logs

Figure 3.2: Pipeline of SSA brightness volume generation.

Setup

The setup step is responsible for receiving the input data parameters into main mem-ory, generating the searching grid for the area of interest, and calculating the travel time of the seismic waves between the grid points and the receivers. The search volume is discretized in the x, y, and z directions with a resolution given by nx, ny, and nz,

respec-tively. In this step we also remove the mean and the linear trend of the data, and take the absolute value of each amplitude recorded. Algorithm 3.1 shows how to generate a grid with the travel time values for each position of the area of interest for searching the source. More sophisticated model parametrizations could be used as input. Algorithm 3.1 receives the vector of positions in the grid GGG, the vector of receivers SSS and the vector with the thickness and velocity of each layer CCC. In our implementation we used thins layer cake parametrization based on the work of [129] in which travel times between the hypocenter and the sensors are computed using Fermat’s Principle. However, another forward modelling could be implemented and used as input for our parallel computation. Its output is a travel-time output vector KKK with size nx× ny× nz× nS, because a vector

with size nG= nx× ny× nz is generated for each station in SSS. For example, if one has to

search the source in a 20×20×10 grid using (nS=)100 stations, KKK has 20 × 20 × 10

| {z }

nG

(37)

100 |{z}

nS

= 400, 000 lines.

Algorithm 3.1: Travel time between sensor and grid position Input: GGG,,, SSS,,,CCC Output: KKK 1 for each s ∈ SSSdo 2 for each p ∈ GGGdo 3 K_sp← calculate_travel_time(p, position(s),CCC) 4 return KKK;

The data of the receiving sensors have the following information: name (or code) and sensor position in the first line, and time (or timestamp) and corresponding recorded am-plitude which appears in two identical columns in the subsequent lines. The use of du-plicated amplitude columns happens because we can make, in a single loop, the stacking in both the ‘p_wave_amplitude’ and in the ‘s_wave_amplitude’ data points. Figure 3.3 shows the format of the sensors files. The sampling interval of the recorded seismograms containing n samples is given by m (in milliseconds). The records are read from the files and placed in the main memory.

station_name,x,y,z 0,p_wave_amplitude,s_wave_amplitude m,p_wave_amplitude1,s_wave_amplitude1 2 × m,p_wave_amplitude2,s_wave_amplitude2 ... n× m,p_wave_amplituden_{,s_wave_amplitude}n

Figure 3.3: Station log file format.

The brightness threshold will not be used in this step and will be described with details in Subsection 9.

Stacking

After calculating the travel time for each point of the grid, it is necessary to stack the amplitudes to calculate the brightness of each position p in GGG for each trial origin time (ot), and for every sample of the entire data length n. The Algorithm 3.2 illustrates the

(38)

procedure of stacking amplitudes. It receives as input GGG, a vector of travel times between all grid positions and all stations in KKK, the recorded data length n, the initial time t0and the

increment of time m. The output is a vector ΣΣΣ of size nΣ= nG× (n/m) with stacking sum

for each position for all instants. For example, for a 1 hour duration hydrofracking moni-toring with a sampling rate of 2 ms, Σ would be a vector with 20 × 20 × 10

| {z }

nG

× 3, 600/0.002 | {z }

n/m

= 7.2×109elements containing, for all the possible origin times (ot), a vector containing a brightness volume.

In Algorithm 3.2, log(s, ot + travel_time) means the data amplitude in a given station (s), at the instant ot +travel_time, and not the logarithm of (s, ot +travel_time). Stacking is, by far, the most time consuming step of the entire process.

Algorithm 3.2: Amplitude Stacking Input: G, K, n,t0, m Output: Σ 1 ot← t₀ 2 while ot < n do 3 for each p ∈ G do 4 stack← 0 5 for each s ∈ S do

6 travel_time ← K_sp stack← stack + log(s, ot + travel_time) 7 Sot,p ← stack

8 ot← ot + m 9 return Σ

Thresholding

The last step of the process is the computation of the brightness for all positions of the grid at all instants and extracts some attributes of the stacked volume for each time step. Algorithm 3.3 shows the calculation of the brightness volumes. During the process, the maximum brightness across each volume is also calculated for each time step. It receives as input ΣΣΣ, GGG, nS, n, t0 and m. The result is a vector BBB with size nΣ composed of the

brightness volume and a vector BBBMMMaaaxxx with the values of the maximum brightness for each time step. The maximum brightness values will be used later for the computation of values that we associate with the quality of the result.

(39)

Algorithm 3.3: Brightness volumes and maximum brightness Input: SSS,,, GGG,,, qqq,,, nnn,,,ttt₀00,,, mmm Output: BBB,,, BBBMMMaaaxxx 1 while ot < n do 2 max_brightness ← −∞ 3 for each p ∈ GGGdo 4 brightness← B_ot,pn_S

5 if brightness > max_brightness then 6 max_brightness ← brightness 7 BMax_ot← max_brightness

8 ot← ot + m 9 return (BBB,,, BBBMMMaaaxxx)

After computing the brightness volumes, we extract an attribute associated with the fo-cusing of the energy in the brightness volume at each instant m. This attribute, we name η, is the number of voxels at each instant above a brightness threshold c. For example, we can define c as being 80 % of the greatest value of brightness for each time step m.

At the end of processing, brightness voxel files are generated for each time step. They contain information about η, and the brightness of each position of the grid. Figure 3.4 shows the format of the brightness voxel file. We have decided to make a very straight-forward attribute volume for each time step. One can, instead of η, use any other attribute (averaging, or median in the time domain, for instance).

η

x₀, y0, z0, brightness0

x1, y1, z1, brightness1

...

xn, yn, zn, brightnessn

Figure 3.4: Brightness voxel file format.

3.3.2 Paralelization of SSA

As shown in all algorithms in Subsection 3.3.1, brightness voxels calculations using SSA is a computationally costly task. The stacking is the most intense stage because it

(40)

needs to sum the amplitudes for all stations and for every time step. Typically, 100’s (or 1,000’s) of sensors record data at 5 or 2 ms (sometimes even greater sampling rate) for a few hours.

To mitigate this problem, we created a parallel version of SSA, we name pSSA — Parallel Source Scanning Algorithm. The pSSA replaces the stacking step with a ver-sion that calculates the sum of several time steps simultaneously using a graphics card. Algorithm 3.4 illustrates the modification in the stacking step to perform the parallel cal-culation. It receives the same parameters from the sequential version: GGG, KKK, n, the initial time step t0and m. The output is also equal to the original output: ΣΣΣ with size nG× (n/m)

containing each stacking value for each position at all time steps. The number of time steps of all voxels that are processed at a time is limited by the memory of the graphics card. This limitation can be dealt by sending the data in blocks to the GPU, in two ways, grouping the data temporally or spatially. Grouping or blocking the data temporally is the appropriate way if we want to process data in real-time, sending short bursts of data to be processed as soon as they are collected, while the spatial grouping or blocking can be done in offline processing.

Algorithm 3.4: Parallel Stacking Input: GGG,,, KKK, n,t0, m

Output: ΣΣΣ

1 ot_qtty ← nm

2 simultaneous_ots ← compute_GPU _capacity(ot_qtty, GGG) 3 times← ceil(nsimultaneous_ots)

4 for i from 0 to times − 1 do

5 send_to_GPU (GGG,,, KKK, n,t₀+ (i × simultaneous_ots), 6 simultaneous_ots)

7 stack_on_gpu()

8 move_ f rom_GPU _to_vector(SSS,t₀+

9 (i × simultaneous_ots), simultaneous_ots); 10 return Σ

In our implementation, we used CUDA to parallelize the computation of the SSA. CUDA is a general purpose parallel computing platform and programming model that allows a programmer to define functions (kernels) in some language, such as C, that, when called, are executed in parallel by different CUDA threads. These CUDA threads are organized in blocks that are grouped in grids and may access data from several memory spaces

(41)

dur-ing their execution. These memory spaces are organized as followdur-ing: each thread has its private local memory, and each thread block has a shared memory visible to all threads of the block and with the same lifetime as the block, while all threads have access to the same global memory. There are also the constant memory and the texture memory, that are used for specific purposes. The global, constant, and texture memory spaces are per-sistent across kernel launches by the same application. In our implementation, the data is stored in the global memory, because of its size.

3.4 Numerical Experiments

In the following experiments, we compare the sequential and parallel implementations according only to their execution times. We do that because the results produced by the different versions are numerically equivalent.

3.4.1 Test Cases

To evaluate the performance of pSSA over traditional SSA, we used a synthetic data set containing 10 minutes of records with a sampling rate of 5 ms, giving a total of 120,000 samples per station. To evaluate the performance of pSSA over sequential SSA, we run three different tests (Cases 1, 2 and 3), with record lengths of 10, 60 and 120 minutes each, recorded with 100 sensors on the surface deployed in a star-like array, as shown in Figure 3.5. To simulate the seismic pulse, we used a Ricker pulse with central frequency of 30 Hz. We simulated 5 synthetics events, which are located at the center of the model at 500m depth. The SSA discretizes the search space and the resolution of the grid is defined in 3 dimensions. For each dimension, the model parameters (minimum value, maximum value, and increment size) are defined. The grid spacing we used is 6.25 m, yielding to a 65 × 65 × 64 (= 270, 400 elements) grid volume. In our SSA Test Cases, we do not expect the travel time to be greater than 1s, and SSA needs to read records ahead of the current position. As a matter of fact, this time length can be modified by the user. In our case, the last 200 samples were not processed.

(42)

Figure 3.5: Station location on the surface (black triangles) and synthetics events location between 200 and 300 m depth shown as black circles

Figures 3.6a shows the seismograms records from the surface array using as the mi-croseismic event is a Ricker pulse with 30 Hz as central frequency. In Figure 3.6b we show the seismograms with noise from a real hydrofracking operation. The maximum noise amplitude is the same as the maximum signal amplitude. In Figure 3.7 we illustrate the brightness map produced from our code using this semi-synthetic data. The maximum brightness position is shown together with the ground truth position.

(43)

time (s) 58 59 60 61 62 seis mogra ms SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 time (s) 58 59 60 61 62 seis mogra ms SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 SEP 09 (253), 2016 10:00:00.000 b) a)

Figure 3.6: (a) Noise-free synthetic seismograms for the events at 500 m depth at the center of the array shown in Figure 3.5. (b) Noise-added version of (a)

(44)

− 0.2

− 0.1

0.0

0.1

0.2 East ing (km )

− 0.2

− 0.1

0.0

0.1

0.2 N

o

rt

h

in

g

(

k

m

)

t rue posit ion pSSA solut ion

0.0

0.2

0.4

0.6

0.8

1.0 co

h

e

re

n

ce

Figure 3.7: Brightness map for the event positioned at x=0.0000 km, y=0.0000 km, depth=0.5000 km e t=60.000 s (central event from Figure 3.5). This event was located at we pSSA algorithm on position x=0.0062 km, y=-0.0062, depth=0.5062 km e t=60.015 s. These coordinates showed the highest brightness (coherence) value for the event.

3.4.2 The tests platforms

To run the experiments, we used two different machines: (a) a notebook running Ubuntu 16.04, with a Intel Core i5 processor with 2.3 GHz and 4 cores, 8 GB of RAM, and one NVIDIA GeForce 930M graphics card that has 384 CUDA cores and 2GB of memory, and (b) a HP Z840 desktop workstation running Ubuntu 14.04, with a Intel Xeon processor with 3.4 GHz and 15 cores, 512 GB of RAM, and one NVIDIA Tesla K40 graphics card with 2,880 CUDA cores and 12 GB of memory. After each test the computer was restarted. SSA and pSSA were both implemented in C++11, CUDA toolkit, and Thrust library [111].

3.4.3 Performance results

The pSSA was implemented both using OpenMP and CUDA, in order to compare the speedups achieved by different degrees of parallelization. OpenMP (Open

(45)

Multi-Processing) [146] is an application programming interface (API) that gives support to multi-platform shared memory multiprocessing programming in several languages. It is widely used to perform parallelization on platforms ranging from personal computers to supercomputers. In all tests the number of elements was 270,400, corresponding to a 65 × 65 × 64 grid volume.

In Tables 3.1 and 3.2 we summarize the performance results for the three cases we considered in both platforms. In all the tests we used CUDA, we actually run the full numerical tests with the whole data set (10, 60 and 120 minutes). For the sequential and OpenMP implementations, the total execution times was estimated from the average of 5 runs using a fraction of the dataset so that we could perform the tests in a reasonable time. For instance, platform (a) would take 104 days to execute Case 3 sequentially. For platforms (a) and (b), we achieve speedups of around 1.6× and 4.4×, respectively, when using the OpenMP implementation as compared to the sequential one. When we compare the CUDA pSSA implementation to the sequential one, the speed up are 45× for platform (a) and 125× for all the cases we considered using platform (b). The speedup of the OpenMP version compared to the sequential version is nearly 3× greater when we use plataform b.

Test Sequential OpenMP S/O CUDA S/C Case 1 746,808 461,922 1.6 16,625 45.0 Case 2 4,490,208 2,777,322 1.6 99,961 45.0 Case 3 8,982,288 5,555,802 1.6 199,964 45.0

Table 3.1: Execution times (in seconds) for platform a. S/O and S/C represent the speedups achieved by the OpenMP and CUDA versions, respectively, when compared with the sequential version.

Test Sequential OpenMP S/O CUDA S/C Case 1 540,745 123,562 4.4 4,312 125.4 Case 2 3,243,324 742,920 4.4 25,196 125.4 Case 3 6,488,000 1,486,150 4.4 50,550 125.4

Table 3.2: Execution times (in seconds) for platform b. S/O and S/C represent the speedups achieved by the OpenMP and CUDA versions, respectively, when compared with the sequential version.

In Figure 3.8 we present the data in graphical form, so the impact of the paralelization becomes more evident in both platforms we run the tests.

(46)

0 1x106 2x106 3x106 4x106 5x106 6x106 7x106 8x106 9x106

SSA(a) ompSSA(a) pSSA(a) SSA(b) ompSSA(b) pSSA(b)

E x e cu ti o n t im e ( s) Test case 1 Test case 2 Test case 3

Figure 3.8: Bar graph illustrating the results from the test cases for SSA, OpenMP and pSSA, for platforms (a) and (b)

3.4.4 Client-Server pSSA for Real-Time Processing

The massively-parallel implementation of the SSA on GPUs allows us to quickly pro-cess huge amounts of data. For most real-life acquisition scenarios and middle to high-end current GPUs, the pSSA GPU processing time is much smaller than the data acquisition time. In our case, we verified that we can employ the pSSA to perform real-time pro-cessing during the data acquisition if the grid is up to 14 × 14 × 14 in platform a, and 32 × 32 × 32 in platform b.

As a proof of concept, we implemented a client-server architecture to receive and process the data. Our architecture can deal with multiple simultaneous clients as well as with out of order data packets, allowing for re-sending lost or corrupted data.

(47)

3.4.5 Client-Server Architecture

Our architecture has 3 main components: sessions, spans and logs. A session repre-sents a whole data set, and one session must be created for each data collection. Since a session can have an arbitrary time, we divide a session duration time into time intervals (spans). Each span can have its own duration, but those must be the same for all stations for a specific span. Finally, a log holds the data collected for a particular station and span. Figure 3.9 shows the organization of a data transfer.

The interactions between client and server in our architecture can be seen in Figure 3.10. First, a client has to ask for a new session by sending a session name and configura-tion parameters (number of staconfigura-tions, grid limits, brightness threshold and crustal model). The server then returns an unique ID (UUID). With this UUID, the client can create a span and receive an unique span ID. Then, the spans can be sent by sending the individual logs for each station. After a span is successfully transferred, the span data is processed by the pSSA and the result is returned to the client. If more than one client is sending data to the server and the server is busy processing some data span while receiving another data span, this new data span is queued for further processing.

(48)

In order to test the viability of real-time processing of this data, we defined all spans with to have the duration of 2s. We executed the server on the less powerful computer platform a. Even so, the server took on average 0.505s to process 2s of data, clearly demonstrating the feasibility of real-time processing of pSSA. Of course, a more powerful server would be needed in order to process more massive data arriving from multiple clients.

3.5 Discussion

We observe from the results we presented using a 65 × 65 × 64 grid that the pSSA version of the code reduced the total execution time of 75 days to 14 h in platform (b). For this platform, the 10-min record length the computations are made in 1.2 h as shown in Table 3.2. With pSSA, we obtained BVs with processing time less than the acquisi-tion time for a 32 × 32 × 32 grid. These results demonstrate that further real-time on-site techniques like adaptive noise canceling [70], and joint moment tensor solution and hypocenter [100, 3, 96] are feasible possibilities for further implementations, if adequate parametrization is supplied. The GPU parallelization further allows the simulation of