Mapping, Characterization And Acceleration Of Apache Spark Applications

Abstract

Μεγάλα δεδομένα (Big Data)

FPGAs στα Κέντρα Δεδομένων

Spark Core
Διαχειριστές Συμπλέγματος (Cluster Managers)
Ιδιότητες του Spark
Raspberry Pi 3 - Model B
Dragonboard 410c
Pynq-Z1

Cluster Manager An external service for retrieving cluster resources (eg standalone manager, Mesos, YARN). A parallel computation consisting of multiple tasks that are created in response to a Spark action (eg save, . collect); you will see this term used in driver logs.

Εφαρμογές του Spark

Στήσιμο του Spark

Προκλήσεις από τη Διαμόρφωση του Spark σε Εν- σωματωμένα Συστήματα
Προκλήσεις από την Εκτέλεση του Spark σε Εν- σωματωμένα Συστήματασωματωμένα Συστήματα

PYNQ-Z1

Το Spark στο Pynq (SPynq) Cluster

Ρυθμίσεις στο Spark
Ρυθμίσεις στο Hadoop

Επιτάχυνση Εφαρμογών στο Spark

Το Spark on Pynq (SPynq) Framework

Λίστα 4.18: Κώδικας σπινθήρα για χρήση του επιταχυντή υλικού Πιο συγκεκριμένα, στο περιβάλλον Python δημιουργείται ένα αντικείμενο LogisticRegression που υποστηρίζει διάφορες μεθόδους (εκπαίδευση, δοκιμή, πρόβλεψη κ.λπ.). Εισαγάγετε το αρχείο δεδομένων sample_linear_regression_data.txt sample_libsvm_data.txt sample_kmeans_data.txt. Αρχείο δεδομένων εισόδου Το όνομα του αρχείου δεδομένων που δίνεται ως είσοδος στην εφαρμογή. Πίνακας 5.4: Γλωσσάρι ορισμάτων εισαγωγής εφαρμογής.

Αʹ.1 Bash script

Αʹ.2 mllib_accel

1] FPGA Acceleration of Spark Applications in a Pynq Cluster, Christofors Kachris, Elias Koromilas, Ioannis Stamelos, Dimitrios Soudris. Amant, Karthikeyan Sankaralingam and Doug Burger http://www.cc.gatech.edu/hadi/doc/paper/2013-cacm-dark_silicon.pdf. 43] Performance and Power Evaluation of Spark Applications on Low Power SoCs, Christofors Kachris, Ioannis Stamelos, Dimitrios Soudris.

110 Библиография [44] SPynq: Fayyadama Barnoota Maashinii Ispaarkii saffisiisuu Pynq, Christofors Kachris, Elias Koromilas, Ioannis Stamelos, Dimitrios Soudris keessatti.

Αγγλικό Κείμενο - English Version

Acknowledgments

Contents

78 5.10 Execution time footprint using the proposed Python API Speedup versus the number of iterations using the proposed Python. 80 5.12 Runtime footprint using the proposed Python API Power consumption of Xeon and Zynq platforms based on.

List of Tables

In 2015, the total network traffic of the data centers was around 4.7 exabytes and it is estimated that it will cross the 8.5 exabyte mark by the end of 2018. The growing demands in both performance and energy efficiency have led companies to chart new paths for the development of energy-efficient platforms for heterogeneous data centers, therefore they have recently started to deploy FPGA accelerators and part of the workload further offloading to embedded processors (i.e. ARM processors) at a data center scale. In the final part of the thesis, we will first investigate the capabilities of the embedded platforms we used by taking performance metrics using a set of typical machine learning and graph processing algorithms and further comparing the performance and energy efficiency of each. system with a mainstream powerful server.

On the other hand, the proposed framework for using hardware accelerators in Spark shows that the heterogeneous ZYNQ MPSoC based on PYNQ accelerators can achieve up to 2x system speedup compared to a Xeon system and 18x better energy efficiency.

Publications

Modern Systems and Applications

As such, there is a growing demand for embedding sensors, actuators and other electronics into virtually any system so that the latter can collect and exchange data to perform tasks and communicate with other devices over a network. Modern systems use the Internet to communicate with other devices and distribute the processing load among other systems in the cloud. The internet is also used as a large database that can provide information on almost any subject.

To meet the large processing demands of new applications, new architectures are required in high-performance and energy-efficient processors[1].

Big Data

Considering all of the above, emerging applications such as big data analytics and IoT require powerful systems that can process large amounts of data without consuming much power. Big data is the term for data sets that are so large or complex that traditional data processing application software is not suitable for handling them [2]. We also use the term "big data" to refer to data or datasets derived from predictive, behavioral, and other advanced data analytics methods.

But the sheer volume of this data makes it difficult to handle and process.

Cluster Computing

Embedded Systems

FPGAs in the Data Center

Thesis Aim

For this reason, the attempt will be to create an API for the seamless use of hardware accelerators that can be used for both embedded systems and high-performance applications such as cloud computing, edge computing, and fog computing. Finally, performance metrics will be taken by comparing execution time when using only CPU cores and when invoking hardware accelerators. Spark is one of the most widely used frameworks in cloud computing, and it comes with a bunch of built-in libraries and applications (including those for machine learning and graph processing) on which to run metrics.

The proposed framework for seamless use on the hardware accelerators is also based on the Spark framework, so in the next chapter we will take a closer look at the functionality of Apache Spark and the main features it offers.

Figure 1.6: Example of heterogeneous platform architecture[15].

Overview

It has quickly grown to become the largest open source big data community, with over 1,000 contributors from over 250 organizations.

Beneﬁts

Speed
A Uniﬁed Engine
Ease of Use

Because Spark's core engine is both fast and general-purpose, it powers multiple higher-level components specialized for different workloads, such as SQL or machine learning. First, all higher-tier libraries and components in the stack benefit from lower-tier improvements. For example, if an optimization is added to Spark's core engine, SQL and machine learning libraries automatically speed up as well.

For example, you can write one application in Spark that uses machine learning to classify data in real time as it is consumed from streaming sources.

Components

Built-in Libraries
Cluster Managers
Monitoring

On the other hand, reduction is an operation that aggregates all the elements of the RDD using some function and returns the final result to the driver. Spark is agnostic to the underlying cluster manager, all supported cluster managers can be launched locally or on. Inclient scope mode, the renderer launches the driver outside the stack as an independent process, which is usually the same as the client process used to start the job.

Driver The process that executes the application's main() function and creates the SparkContext.

Graph Processing (GraphX) 15 queries with the programmatic data manipulations supported by RDDs in Python, Java, and Scala, all within a single application, thus combining SQL with complex analytics

Conﬁguration

Spark Properties
Environment Variables
Logging

Each Apache Spark application has a web interface for monitoring purposes, which is launched through the application's SparkContext. The web interface displays information about the scheduler's phases and tasks, a summary of RDD sizes and memory usage, as well as information about the active executors and the storage being used. A web interface is also available through the Standalone Cluster Manager, which contains information about cluster and job statistics, as well as detailed log output for each job.

Examples of the Spark user interface and History server will be presented later in the evaluation of our proposed framework.

Purpose

Kick Embedded Systems cents a year by packing an increasing number of transistors on chip, leading to higher performance. Microservers have recently gained attention as low-cost, low-power, reduced-footprint servers based mainly on energy-efficient, low-power SoC-based processors, such as those used in embedded systems. Comparing the results with those of a powerful mainstream server can give us answers as to whether embedded systems can play a key role in the data center and whether there are any advantages to running big data applications on them.

To perform this evaluation, we first had to supply a number of embedded systems and then find some representative applications for taking measurements.

Embedded Platforms

Based on the capable 64-bit Snapdragon 410E processor, the DragonBoard 410c is designed to support rapid software development, education and prototyping. It is important to note here, that although the Pi 3 and DragonBoard 410c share the same processor, the BCM2837 SoC is manufactured on a 40nm process while the Snapdragon 410 SoC is manufactured on a 28nm process. Also, both boards have 1 GB of RAM, but as we can see the memory module of the DragonBoard 410c board runs at a much lower frequency.

Undeniably, the specifications of this board are inferior to those of the Pi 3 and DragonBoard 410c.

Spark Applications

Machine Learning Applications
GraphX Applications

Although we don't expect this board to beat the performance of the previous ones, the main reason we decided to include it in our evaluation is that in Chapter 4 we will develop a framework for accelerating the algorithms and applications running on Iskra. Specifically, the connected components algorithm labels each connected component of a graph with the ID of its lowest numbered node. The basic assumption is that more important sites are likely to receive more links from other sites.

The algorithm determines the number of triangles passing through each node, which provides a measure of clustering.

Mapping Spark

Challenges of Building Spark on Embedded Systems
Challenges of Running Spark on Embedded Systems

A user can easily test that everything is working properly by running one of the samples from Spark, or by typing: ./bin/spark − shell.sh into the terminal (from the Spark home directory) to get a interactive Spark openable shell, where Scala and Java code can be evaluated. The main difference between these two cases is that in the former case the content of the spark-env.sh script is fetched at the start of Spark's execution, which means that the data is converted into environment variables, while in the latter case the contents of the spark-defaults.conf file are passed as configuration to each submitted application and merged with any further configurations set in the SparkConf() object. Spark limits the minimum amount of memory given to each process (i.e. master, driver, executor, etc.) to 475 MB.

In this way, even in the simplest case of creating a cluster consisting of a master and one.

Introduction

PYNQ-Z1 37 On the other hand, when it comes to speeding up an algorithm, it seems that only parts of it can be really speeded up. Low Power: As we already saw in Chapter 3, the PYNQ-Z1 board belongs to the family of low-power SoC embedded platforms. So, so far, there are two reasons that make the PYNQ-Z1 board ideal for our purpose.

Python Libraries: PYNQ-Z1 is a platform based on a fairly new Xilinx open source project called PYNQ.

Figure 4.1 depicts the block diagram of Zynq-7000 series.

The Spark on Pynq (SPynq) Cluster

Network Conﬁgurations
Security conﬁgurations
Spark Conﬁgurations
Hadoop Conﬁgurations

Thus, normally, all requests are made on the master node, while the PYNQ nodes take on the execution of the tasks and tasks assigned to them. Once we have this file set up, we can start or stop the cluster using one of Spark's available scripts, which are located under the SPARK_HOME/sbin folder. The amount of memory given is determined after many tests and is so limited due to the corresponding limitation in the memory of the Worker nodes.

Before we created HDFS, we had to put all input files given to an application sent by Spark to the same path as the master node (because applications are sent from the master node), so we spent time copying files to each node.

Figure 4.2: Proposed Cluster Scheme Conﬁguring the Spark Master node

Acceleration of Spark Applications

Related Work
The Spark on Pynq (SPynq) Framework

Spark on a Pynq Cluster IBM also announced in 2016 the availability of the SuperVessel cloud, a development framework for the OpenPOWER Foundation. The goal of the Super-Vessel cloud is to provide a virtual environment for developing, testing and piloting applications. The performance evaluation shows that the proposed system can achieve up to 1.44x speedup for the logistic regression and almost the same throughout for K-Means and 2.32x and 1.55x better energy efficiency respectively.

Behind this Python API is a C API that is called for the actual communication with the hardware accelerator and therefore acts as a driver.