Today’s high-performance analytic applications run on large, scaled-out clusters of computers. These clusters are managed by resource manager software. Typically, many different applications launched by many different users run on a cluster at the same time. Resource managers allocate resources among the different applications that are running on the cluster. Examples of resource manager software include open source offerings such as Apache YARN and Apache Mesos, and commercial products from companies like IBM.
Apache Spark is emerging as the platform of choice for building high-performance big data analytic applications. Companies are now looking to run Spark applications in a distributed, scaled-out architecture to gain scalability, performance, and cost savings. Apache Spark supports YARN, Mesos, and other resource managers. IBM Platform Conductor for Spark is the IBM distribution of Apache Spark and a high performance, low-latency Spark resource manager.
Performance of Spark applications depends significantly on how efficiently the resource manager is able to allocate resources among applications running on the cluster. For example, an application that ends up running on 10 CPUs will run faster (in the majority of cases) than the same, or very similar, application that run on five CPUs. To optimize performance of Spark analytic applications, we need to be able to analyze and evaluate the resource manager contribution to overall performance.
How the Spark Multiuser Benchmark works
The Spark Multiuser Benchmark (SMB) was developed specifically to address this aspect, using a relatively simple test scenario intended to exercise the resource manager. The figure below shows how the benchmark works.
The benchmark starts a number of user job sequences. After the first job sequence is started, after a fixed period of time, another job sequence starts. The period of time is referred to as the offset. Once another offset period passes, a third user job sequence is started. This process repeats until the desired number of concurrent user job sequences is reached. The desired number of concurrent-user job sequences is a configurable parameter that is passed to the main benchmark script when the test is started.
Each user job sequence executes a number of jobs. The number of jobs that each user stream executes is configurable, and is passed in as a parameter when the tester starts the main benchmark script. Between each two jobs there is a delay. This delay is constant. In the current version of the benchmark (SMB-1) it is not configurable and is equal to one second.
In the current benchmark implementation (SMB-1), all jobs executed by all user job streams are the same. These jobs are all instances of Terasort implementation developed by the IBM Austin Research Laboratory. The source code and description of this Terasort implementation is available in our SMB GitHub repository.
Although some would argue that using the same job for all user job sequences is not the most realistic scenario, keep in mind that the key objective of this benchmark is to study how efficiently and fairly the resource manager allocates resources. This can be much more readily done when all the jobs are the same, and hence exhibit equal performance characteristics when presented with the same allocation of CPU and memory resources.
The tester/analyst, as discussed above, passes in values for the desired number of concurrent users, the total number of jobs to run for each stream, and the offset. When the script begins to execute, the test will proceed in three stages
- Step-up phase
- Steady-state phase
- Step-down phase
The figure below shows the expected pattern if you were to plot job duration data vs. elapsed test time.
Evaluating resource manager efficiency
To evaluate how efficiently the resource manager is allocating and dispersing key compute resources to the jobs, we need to look at throughput, job duration, and fairness.
This metric is measured in jobs per hour, and provides an overall sense of how the resource manager impacts performance. To calculate this metric, we take all successfully completed jobs and divide this number by the total duration of the test in hours.
Job duration (response time)
Resource allocation made by the resource manager also directly affects how fast jobs run. This is particularly true when more than one job is running concurrently on the same pool of resources. Statistical analysis of job durations – the max, min, average, standard deviation, and 90th percentile – allows us to assess and compare resource manager efficiency.
This is a measure of how consistently the resource manager allocates resources to different jobs. Since all the jobs are the same and all users have the same Quality of Service (QoS) constraints (in the current version of the benchmark), each job should be getting the same allocation of CPU and memory resources. Thus, the variance among job durations should ideally be very small. Conversely, if the resource manager allocates resources among jobs inconsistently, we will see a large variance in job durations. Fairness is calculated as the standard deviation of job durations divided by the average job duration.
The Securities Technology Analysis Center (STAC), an independent audit organization with support from a recognized Mesos and open source community expert, recently reviewed SMB and benchmarked three resource managers – YARN, Mesos, and IBM Platform Conductor for Spark – running Spark workloads using SMB. The results below illustrate the types of insights this benchmark reveals.
The graph below shows an example SMB job duration plot taken by STAC. This is very similar to the theoretical plot shown above and shows step-up, steady state, and step-down portions of the test. The job durations are also tightly clustered, reflecting a very consistent resource allocation among jobs. These data were collected by STAC with IBM Platform Conductor for Spark v1.1.
The graph below shows another sampling of job durations. In this case, note that, during the steady-state portion of the test, the job duration data points are not nearly as tightly clustered – reflecting less fairness. These data were collected by STAC with Spark-Mesos in coarse integration mode. Download and read the full STAC report free of charge.
We hope that the SMB benchmark will serve to stimulate further research into aspects of resource manager performance. This benchmark is now open source, so if you would like to use it as part of your performance evaluation cycle, or contribute to it, feel free to download the scripts and source code and reach out to me and other members of the SMB project for any assistance. The community and users now have a tool to explore performance improvement and evaluate workloads that matter to users. This advancement in tooling will be a valuable addition to the community and can be extended and enhanced as Spark grows.
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Goals and challenges of the Spark Multiuser Benchmark