Back to Top

■ A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads

A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads, Nikos Zacheilas and Vana Kalogeraki, EURASIP Journal on Embedded Systems, Springer July 3, 2017
Abstrtact.

In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can be challenging due to the observed trade-off between the need for performance and the corresponding monetary cost. The problem is exacerbated by the fact that cloud providers tend to charge users based on their I/O operations, increasing dramatically the spending budget. In this paper, we describe our approach for scheduling MapReduce workloads in cluster environments taking into consideration the performance/budget trade-off. Our approach makes the following contributions: (i) we propose a novel Pareto-based scheduler for identifying near-optimal resource allocations for user workloads with respect to performance and monetary cost, and (ii) we develop an automatic configuration of basic tasks’ parameters that allows us to further minimize the user’s spending budget and the jobs’ execution times. Our detailed experimental evaluation using both real and synthetic datasets illustrate that our approach improves the performance of the workloads as much as 50%, compared to its competitors.

 
Bibtex Entry.
@article{zacheilas2017pareto,
  title={A Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads},
  author={Zacheilas, Nikos and Kalogeraki, Vana},
  journal={EURASIP Journal on Embedded Systems},
  volume={2017},
  number={1},
  pages={29},
  year={2017},
  publisher={Springer}
}