Acceleration of Spark Applications - Mapping, Characterization And Acceleration Of Apache Spark

Publications

4.4 Acceleration of Spark Applications

In this section, we are going to describe our proposed framework for the seamless utilization of hardware accelerators for Spark applications in heterogeneous FPGA- based MPSoCs as well as the development of an eﬃcient set of libraries that hide the accelerator’s details, to simplify the incorporation of hardware accelerators in Spark. Furthermore, we are going to integrate these new libraries to our already built SPynq cluster and ﬁnally we are going to evaluate the gains of using hardware accelerators for a use-case scenario on machine learning (logistic regression).

4.4.1 Related Work

In the last few years, there are several eﬀorts for the eﬃcient deployment of hardware accelerators for cloud computing, as well as for Apache Spark applications.

In the paper entitled A survey on reconﬁgurable accelerators for cloud computing [41], a detailed survey on hardware accelerators for cloud computing applications has been presented. The survey shows both the programming framework that has been developed for the eﬃcient utilization of hardware accelerators as well as the accelerators that have been developed for several applications like machine learning, graph computation applications and databases.

52 Chapter 4. Spark on a Pynq Cluster IBM also announced in 2016, the availability of SuperVessel cloud, a development framework for the OpenPOWER Foundation. SuperVessel has been developed by IBM Systems Labs and IBM Research based in Beijing. The goal of the Super- Vessel cloud is to deliver a virtual environment for the development, testing and piloting of applications. The SuperVessel cloud framework takes advantage of IBM POWER 8 processors. Developers have access to Xilinx FPGA accelerators which use IBMs Coherent Accelerator Processor Interface (CAPI). Using CAPI an FPGA is able to appear to the POWER 8 processor as if it were part of the processor.

Xilinx has also announced in late 2016 a new framework called Reconfigurable Acceleration Stack. The FPGA boards can be hosted in typical servers and are utilized based on application specific libraries and framework integration for the five key workloads. These include machine learning inference, SQL query and data analytics, video transcoding, storage compression, and network acceleration [42].

According to Xilinx, the acceleration stack based on the FPGAs can deliver up to 20x acceleration over traditional CPUs with a ﬂexible, reprogrammable platform for rapidly evolving workloads and algorithms.

In the paper entitled FPGAs in the Cloud: Booting Virtualized Hardware Accel- erators with OpenStack [43], a novel approach is presented for integrating virtualized FPGA based hardware resources into cloud computing systems with minimal overhead. The proposed framework allows cloud users to load and utilize hardware accelerators across multiple FPGAs using the same methods as the utilization of Virtual Machines. The reconﬁgurable resources of the FPGA are oﬀered to the users as a generic cloud resources through OpenStack.

Finally, a relative framework called Blaze [44], was presented by Jason Cong et al.

for the eﬃcient utilization of hardware accelerators under the Spark framework.

Their proposed scheme is based on a cluster-wise accelerator programming model and runtime system, that is portable across accelerator platforms. Blaze is mapped to the Spark cluster programming framework. The accelerators are abstracted as subroutines for Spark tasks. These subroutines can be executed on local accelerators when they are available. Otherwise the subroutines will be executed on the CPU to guarantee application correctness. The proposed scheme has been mapped to a cluster of 8 Xilinx Zynq boards, each hosting two ARM processors and a recon-

The Spark on Pynq (SPynq) Framework 53 ﬁgurable logic block. The performance evaluation shows that the proposed system can achieve up to 1.44x speedup for the Logistic regression and almost the same throughout for the K-Means and 2.32x and 1.55x better energy eﬃciency respectively. It has been also mapped to typical FPGA devices connected to the host through the PCI interface. In this case, the performance evaluation shows that the proposed system can achieve up to 3.05x speedup for the Logistic regression and 1.47x speedup for the K-Means and reduces the overall energy consumption by 2.63x and 1.78x respectively.

4.4.2 The Spark on Pynq (SPynq) Framework

Now that we have mapped Spark on top of the PYNQ-Z1 cluster, we are ready to go through all the necessary steps for adapting it to communicate with the hardware accelerators, located in the programmable logic (PL) of the Zynq system.

Figure 4.7, depicts the software stack of our implemented setup. The Hadoop DFS is in the ﬁrst level, while on top of it the Apache Spark framework is built, along with its APIs for machine learning, graph computing and other applications.

Figure 4.7: The software stack of our implemented setup

In the typical case of running a machine learning application, the application invokes the Spark MLlib, which utilizes the Breeze library. Breeze library invokes the Netlib Java framework that is a wrapper for lowlevel linear algebra tools implemented in C or Fortran. Netlib Java is executed through the Java Virtual Machine (JVM) and the actual linear algebra tools (BLAS - Basic Linear Algebra Subpro- grams) are executed through the Java Native Interface (JNI). All these layers

54 Chapter 4. Spark on a Pynq Cluster add signiﬁcant overhead to the Spark applications. So the main idea of Spynq framework, is to create new packages that deliver hardware acceleration to Spark applications. In that way, the only modiﬁcation needed for any Spark application, is the replacement of the old MLlib function with the new one that invokes the hardware accelerator. Figure 4.8 depicts our proposed scheme for accelerating Spark applications, where we have implemented a new MLlib package called MLlib_accel for accelerating machine learning algorithms.

Figure 4.8: The software stack of our proposed setup for accelerating Spark applications

On the PYNQ side now, as already mentioned in section 4.2, the PYNQ project comes with a bunch of Python libraries for communicating with the programmable logic. These libraries include methods for deploying the hardware accelerators on the PL as well as whole structures and methods for handling the components of each accelerator. In example, Python libraries are provided for creating and de- stroying DMA (Direct Memory Access) objects as well as methods for allocating contiguous memory buffers that serve as input or output buffers for the hardware accelerator. Behind this Python API, a C API is used which is invoked for the actual communication with the hardware accelerator, therefore it serves as its driver. In other words, PYNQ provides an easy and efficient way to handle FGPA accelerators without requiring from the user deep hardware engineering knowl- edge and expertise. So, for every new implemented hardware accelerator, an also new Python library needs to be created that will host the lower level function calls for the communication with the PL. It is important to note that this library is independent of any given framework (e.g Apache Spark, Hadoop etc.), therefore it could be integrated into a multitude of applications. Figure 4.9, shows the

Further Conﬁgurations 55 intervening stages when communicating with the hardware accelerator.

Figure 4.9: Flow diagram depicting the intervening stages for the communication with the hardware accelerator

Upon creating the Python API for the accelerator, the corresponding library for accelerating Spark’s applications could be implemented. The whole stack is shown below.

Figure 4.10: Final Spark software stack including ”accelerated” libraries

Further Conﬁgurations

Before being able to evaluate our proposed framework, a few more conﬁgurations have to be made. These conﬁgurations concern both the Spark framework and the existing PYNQ’s libraries.

As far as Spark is concerned, since the new libraries that invoke the hardware accelerators are written in Python, PySpark is going to be used for any submitted application. By default, Python 2 is used for PySpark. On the other hand, the libraries provided by the PYNQ project are written in Python 3, so we have to explicitly set PySpark to use Python 3. This is done by adding to every node’s spark-env.sh ﬁle the following code lines:

No documento Mapping, Characterization And Acceleration Of Apache Spark Applications (páginas 195-200)