A mapReduce Skeleton for Skandium

Yüklə 427,52 Kb.

Pdf görüntüsü

səhifə	1/19
tarix	05.03.2018
ölçüsü	427,52 Kb.
	#30153

1 2 3 4 5 6 7 8 9 ... 19

Acknowledgements
Table of Contents

A MapReduce Skeleton for Skandium

Ioannis Assiouras

Master of Science

School of Informatics

University of Edinburgh

2011

Abstract

MapReduce is a popular programming model currently used for application develop-

ment on large scale clusters. MapReduce realizes the concept of parallel programming

skeletons: The model describes the overall structure of a computation, the programmer

plugs in the low level problem-speciﬁc code that turns the generic description of the

problem to the ﬁnal program and the runtime system completely hides the task manage-

ment and synchronization issues that make parallel programming complex and unreli-

able. The increasing scale of multi-core platforms stresses even the need for structured

parallel programming models like MapReduce in the development of shared-memory

applications. In this project, we implemented the MapReduce Model for Skandium,

which is Java-based algorithmic skeleton library that targets multi-core architectures.

After the skeleton was implemented we tested and tuned its performance for a selection

of typical MapReduce applications. Our objectives were two-fold: provide an abstract

and easy to use programming model for MapReduce and identify the main factors

that affect the skeleton’s performance on shared memory architectures and when it is

implemented on top of the Java platform.

Acknowledgements

First, I would like to thank my academic supervisor, Murray Cole, for his his invaluable

help and guidance throughout the project. I would also like to thank my family for their

constant support.

Declaration

I declare that this thesis was composed by myself, that the work contained herein is

my own except where explicitly stated otherwise in the text, and that this work has not

been submitted for any other degree or professional qualiﬁcation except as speciﬁed.

(Ioannis Assiouras)

iii

Table of Contents

Introduction

1.1

Project Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Related Background

2.1

Algorithmic Skeletons

. . . . . . . . . . . . . . . . . . . . . . . . .

2.2

The MapReduce Skeleton . . . . . . . . . . . . . . . . . . . . . . . .

2.3

The Phoenix Implementation . . . . . . . . . . . . . . . . . . . . . .

2.4

The Skandium Library . . . . . . . . . . . . . . . . . . . . . . . . .

2.4.1

The programming model . . . . . . . . . . . . . . . . . . . .

2.4.2

The runtime system . . . . . . . . . . . . . . . . . . . . . . .

2.4.3

The Skandium Map Skeleton . . . . . . . . . . . . . . . . . .

2.5

Typical MapReduce Applications . . . . . . . . . . . . . . . . . . . .

2.5.1

Word Count . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5.2

Inverted Index

. . . . . . . . . . . . . . . . . . . . . . . . .

2.5.3

Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . .

2.5.4

Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.5.5

KMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.6

Garbage Collection principles

. . . . . . . . . . . . . . . . . . . . .

3

MapReduce for Skandium: The programming model

3.1

The MapReduce integration into Skandium . . . . . . . . . . . . . .

3.2

The Programming Interface . . . . . . . . . . . . . . . . . . . . . . .

3.2.1

The Splitter . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.2

The Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.3

The Reducer . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2.4

The Merge muscle . . . . . . . . . . . . . . . . . . . . . . .

3.2.5

Creating the Skeleton’s Instance . . . . . . . . . . . . . . . .

4

MapReduce For Skandium: Implementation details

4.1

The Skeleton’s Instantiation

. . . . . . . . . . . . . . . . . . . . . .

4.2

Implementation of the generic muscles . . . . . . . . . . . . . . . . .

4.2.1

An initial Approach . . . . . . . . . . . . . . . . . . . . . . .

4.2.2

Parallelizing the Store muscle . . . . . . . . . . . . . . . . .

4.2.3

Using a caching technique to resolve collisions . . . . . . . .

4.2.4

Implementing Phoenix’s storing/partitioning scheme . . . . .

Performance Evaluation

5.1

Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . .

5.1.1

Shared Memory Systems . . . . . . . . . . . . . . . . . . . .

5.1.2

Applications . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2

Evaluation of the four implementation schemes . . . . . . . . . . . .

5.3

Evaluation of the Phoenix Scheme . . . . . . . . . . . . . . . . . . .

5.4

Comparison to manual Java threading . . . . . . . . . . . . . . . . .

5.4.1

Implementation using manual threading . . . . . . . . . . . .

5.4.2

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.5

Evaluating the Garbage Collector’s impact . . . . . . . . . . . . . . .

Optimization of the MapReduce Skeleton

6.1

Hash Table Improvements

. . . . . . . . . . . . . . . . . . . . . . .

6.1.1

Choosing an optimal Hash Table Size . . . . . . . . . . . . .

6.1.2

Using a binary search tree for O log(n) store . . . . . . . . . .

6.1.3

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.2

Minimizing the Garbage Collector’s impact . . . . . . . . . . . . . .

6.2.1

An object reuse technique . . . . . . . . . . . . . . . . . . .

6.2.2

Tuning the Garbage Collector . . . . . . . . . . . . . . . . .

6.3

Auto Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6.3.1

Auto-tuning for the MapReduce skeleton . . . . . . . . . . .

6.3.2

A simple Auto-tuning mechanism . . . . . . . . . . . . . . .

Conclusions and Future Work

7.1

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bibliography

Yüklə 427,52 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 19