Parallel Datastore System for Parallel Javaark/students/obs8529/presentation.pdf · Parallel Java...
Transcript of Parallel Datastore System for Parallel Javaark/students/obs8529/presentation.pdf · Parallel Java...
Parallel Datastore
System for Parallel Java
MSCS Capstone Project
by Omonbek Salaev
January 22, 2010
Overview
� Introduction
� Parallel Java Library
� DoubleMatrixFile
� Parallel Datastore System (PDS)
� Test Programs
� Results
� Conclusions
Introduction
� Parallel Virtual File System (PVFS) [1]
�Files split across I/O nodes
�Manager node coordinates partitioning
� Java Expandable Parallel File System
� I/O library for high-performance Java computing
�Files are declustered onto several NFS servers
Introduction
� Existing Parallel File Systems
�Complex
�Java programming often not supported
�Difficult to learn / use
Parallel Java (PJ)
� API and middleware for parallel programming
� 100% Java
� Designed and implemented by professor
A. Kaminsky and Luke McOmber
� Supports parallel programming for:
�Shared Memory Processor (SMP) computers
�Cluster computers
�Hybrid computers
DoubleMatrixFile
� Input/Output context in PJ
� Object for reading/writing a matrix of
doubles from/to a file
� RxC matrix of doubles
DoubleMatrixFile
DoubleMatrixFile()
DoubleMatrixFile(int R, int C, double[][] mx)
SetMatrix(int R, int C, double[][] mx)
getColCount() : int
getRowCount() : int
getMatrix() : double[][]
prepareToRead(InputStream is) : Reader
prepareToWrite(OutputStream os) : Writer
Reader
Writer
DoubleMatrixFileString fn = “test.dat”;
int R = 20, C=10;
double[][] mx = new double[R][C];
// compute matrix elements…
FileOutputStream fos = new FileOutputStream(fn);
DoubleMatrixFile dmf = new DoubleMatrixFile(R, C, mx);
DoubleMatrixFile.Writer writer = dmf.prepareToWrite(newBufferedOutputStream(fos));
writer.write();
writer.close();
Class DoubleMatrixFileString fn = “test.dat”;
FileInputStream fis = new FileIntputStream(fn);
DoubleMatrixFile dmf = new DoubleMatrixFile();
DoubleMatrixFile.Reader reader = dmf.prepareToRead(newBufferedIntputStream(fis));
reader.read();
reader.close();
int R = dmf.getRowCount();
Int C = dmf.getColCount();
double[][] mx = dmf.getMatrix();
// use matrix elements…
DoubleMatrixFile file format
R C
RL CL M N
ARL,CL ARL,CL+1 ARL,CL+N-1…
ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
DoubleMatrixFile file format
R C
RL CL M N
ARL,CL ARL,CL+1 ARL,CL+N-1…
ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
Number of matrix
rows/columns
R≥0, C≥0
(int, 4 bytes each)
DoubleMatrixFile file format
R C
RL CL M N
ARL,CL ARL,CL+1 ARL,CL+N-1…
ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
Matrix Segment
#1
Matrix Segment
#2
DoubleMatrixFile file format
R C
RL CL M N
ARL,CL ARL,CL+1 ARL,CL+N-1…
ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
RL – segment’s lower row
index (int), RL≥0
CL – segment’s lower
column index (int), CL≥0
M – number of rows in
segment (int), M≥0
N – number of columns in
segment (int), M≥0
DoubleMatrixFile file format
R C
RL CL M N
ARL,CL ARL,CL+1 ARL,CL+N-1…
ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
Matrix elements in
segment
(double)
Stored in row-
major order
Class Range
� Defined by:
� L – Lower bound
� U – Upper bound
� S – Stride
� N – Length
� A range of integers:
{L, L+S, L+2*S, . . . , L+(N-1)*S},
where U = L+(N-1)*S
� Range(2, 9, 3) Range(1, 4)
{2,5,8} {1,2,3,4}
Class DoubleMatrixFile.Writer
� Object to write DoubleMatrixFile to an output
stream
� Created by the prepareToWrite() method
� When created:
�Number of rows and number of columns are
written to output stream
Class DoubleMatrixFile.Writer
� Methods to write:
�writeColSlice(Range slice)
�writeRowSlice(Range slice)
�writePatch(Range rslice, Range cslice)
�write
* Range objects must have stride=1
� Each write operation writes a matrix
segment to output stream
WriteRowSlice
…
Writer wr = dmf.prepareToWrite(ostream);
Range r = new Range(2,5);
wr.writeRowSlice(r);
wr.close();
…
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
WriteColSlice
…
Writer wr = dmf.prepareToWrite(ostream);
Range r = new Range(1,4);
wr.writeColSlice(r);
wr.close();
…
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
WritePatch
…
Writer wr = dmf.prepareToWrite(ostream);
Range rr = new Range(2,5);
Range cr = new Range(1,4);
wr.writePatch(rr,cr);
wr.close();
…
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
Write
…
Writer wr = dmf.prepareToWrite(ostream);
wr.write();
wr.close();
…
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
Class DoubleMatrixFile.Reader
� Object to read DoubleMatrixFile from an
input stream
� Created by the prepareToRead() method
� When created:
�Number of rows and number of columns are
read from input stream
�Storage is allocated for underlying matrix’s row references
Class DoubleMatrixFile.Reader
� Methods to read:� read
� readColSlice(Range slice)
� readRowSlice(Range slice)
� readPatch(Range rslice, Range cslice)
� readSegment
� readSegmentColSlice(Range slice)
� readSegmentRowSlice(Range slice)
� readSegmentPatch(Range rslice, Range cslice)
* Range objects must have stride=1
ReadRowSlice
…
Reader rd = dmf.prepareToRead( istream);
Range r = new Range(2,5);
rd.readRowSlice(r);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 33 15 10 51 70 54 44 48
35 12 43 21 42 32 13 12 23
20 15 12 13 16 17 88 16 17
51 34 92 33 40 69 38 30 65
ReadSegmentRowSlice
…
Reader rd = dmf.prepareToRead( istream);
Range r = new Range(2,5);
rd.readSegmentRowSlice(r);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 33 15 10 51 70 54 44 48
35 12 43 21 42 32 13 12 23
20 15 12 13 16 17 88 16 17
51 34 92 33 40 69 38 30 65
ReadColSlice
…
Reader rd = dmf.prepareToRead( istream);
Range r = new Range(4,6);
rd.readColSlice(r);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
ReadSegmentColSlice
…
Reader rd = dmf.prepareToRead( istream);
Range r = new Range(4,6);
rd.readSegmentColSlice(r);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
ReadPatch
…
Reader rd = dmf.prepareToRead( istream);
Range rr = new Range(2,5);
Range cr = new Range(4,6);
rd.readPatch(rr,cr);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
ReadSegmentPatch
…
Reader rd = dmf.prepareToRead( istream);
Range rr = new Range(2,5);
Range cr = new Range(4,6);
rd.readSegmentPatch(rr,cr);
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
Read
…
Reader rd = dmf.prepareToRead( istream);
rd.read();
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
ReadSegment
…
Reader rd = dmf.prepareToRead( istream);
rd.readSegment();
rd.close();
…
11 11
0 0 4 11
…
97 19 42 50 91 74 94 87 80 29 6
95 42 93 81 62 72 12 11 93 32 77
22 89 42 10 36 77 81 25 47 77 55
91 74 97 53 49 59 98 32 85 18 84
4 0 4 9
17 9 41 54 94 24 14 27 20
25 12 96 82 32 71 22 31 91
12 69 22 14 11 97 21 24 27
61 54 17 23 39 50 18 31 45
Parallel Datastore System (PDS)
� Extension of I/O context of PJ library
� Set of classes to work with matrices and
arrays
� Modeled after the original DoubleMatrixFile
class
MatrixFile
DoubleMatrixFile
FloatMatrixFile
LongMatrixFile
IntMatrixFile
ShortMatrixFile
CharMatrixFile
ByteMatrixFile
BooleanMatrixFileObjectMatrixFile
Reader
Writer
Writer
Reader
Writer
Reader
Writer
Reader Reader
Writer
Reader
Writer
Reader
Writer
Reader
WriterWriter
Reader
MatrixFileWriter
MatrixFileReader
Class MatrixFile
� Abstract class
� Includes methods applicable to all matrix
files, regardless of element type
� Inner (abstract) classes:
�MatrixFileReader
�MatrixFileWriter
FloatMatrixFile file format
Class name R
RL CL M N
ARL,CL ARL,CL+cs ARL,CL+N-1…
ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
C
RS CS
FloatMatrixFile file format
Elements’ class name
(String “float”);
Number of matrix
rows/columns
R≥0, C≥0
(int, 4 bytes each)
Class name R
RL CL M N
ARL,CL ARL,CL+cs ARL,CL+N-1…
ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
C
RS CS
FloatMatrixFile file format
Matrix Segment
#1
Matrix Segment
#2
Class name R
RL CL M N
ARL,CL ARL,CL+cs ARL,CL+N-1…
ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
C
RS CS
FloatMatrixFile file format
RL – segment’s lower row
index (int), RL≥0
CL – segment’s lower
column index (int), CL≥0
M – number of rows in
segment (int), M≥0
N – number of columns in
segment (int), M≥0
RS – segment’s row stride
(int),RS > 0
CS – segment’s column
stride (int), CS > 0
Class name R
RL CL M N
ARL,CL ARL,CL+cs ARL,CL+N-1…
ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
C
RS CS
FloatMatrixFile file format
Matrix elements in
segment
(float)
Stored in row-
major order
Class name R
RL CL M N
ARL,CL ARL,CL+cs ARL,CL+N-1…
ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…
ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…
.
.
.
.
.
.
.
.
.
…
C
RS CS
Matrix Segment
� Range objects defining slices may have
stride > 1
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
0 1 2 3 4 5 6 7 8 9 10
0 97 19 42 50 91 74 94 87 80 29 6
1 95 42 93 81 62 72 12 11 93 32 77
2 22 89 42 10 36 77 81 25 47 77 55
3 91 74 97 53 49 59 98 32 85 18 84
4 56 71 66 41 69 95 5 7 70 25 94
5 29 62 68 50 19 95 92 73 19 71 81
6 82 93 91 72 53 49 33 18 68 59 79
7 19 18 66 53 84 54 67 22 53 27 77
8 95 76 6 57 83 14 56 82 52 4 1
9 46 18 66 30 93 77 69 8 33 92 49
10 63 96 62 37 91 34 43 18 97 50 85
Range(2,8,3) Range(0,10,5) Range(2,8,3)
Range(0,10,5)
MatrixFile subclasses
Subclass Element Size Class Name
IntMatrixFile 4 “int”
LongMatrixFile 8 “long”
FloatMatrixFile 4 “float”
DoubleMatrixFile 8 “double”
CharMatrixFile 2 “char”
ShortMatrixFile 2 “short”
ByteMatrixFile 1 “byte”
BooleanMatrixFile 1 “boolean”
ObjectMatrixFile<T> variable variable
Class ObjectMatrixFile<T>
� Generic class
� T – reference type
�Needs to implement java.io.Serializable
� Elements may be of variable size
�E.g.: ObjectMatrixFile<String>
Class ObjectMatrixFile<T>
� When written to output stream:
�Actual class name of T is written
�Elements are written using method writeObject()
of class ObjectOutputStream
�Size of each element (in bytes) is written before writing the element
Class ObjectMatrixFile<T>
� When read from input stream:
�Elements’ actual class name is read
�Elements are read using method readObject()
of class ObjectInputStream
�Size of each element (in bytes) is read before reading the element
� Useful when skipping elements
Class MatrixFileManager
� Process different types of MatrixFiles
� Automatically recognizes matrix type
� Use interactively or programmatically
MatrixFileManager Operations
� Join
�Join several MatrixFiles into one
� Split
�Split MatrixFile into several pieces
� By rows, columns, or patches
� Dump
�Dump MatrixFile contents in text format
ArrayFile
DoubleArrayFile
FloatArrayFile
LongArrayFile
IntArrayFile
ShortArrayFile
CharArrayFile
ByteArrayFile
BooleanArrayFileObjectArrayFile
Reader
Writer
Writer
Reader
Writer
Reader
Writer
Reader Reader
Writer
Reader
Writer
Reader
Writer
Reader
WriterWriter
Reader
ArrayFileWriter
ArrayFileReader
Class ArrayFile
� Abstract class
� Includes methods applicable to all array
files, regardless of element type
� Inner (abstract) classes:
�ArrayFileReader
�ArrayFileWriter
FloatArrayFile file format
Class name N
L M S
AL AL+S AL+S*(M-1)…
…
FloatArrayFile file format
Elements’ class name
(String “float”);
Number of array elements
N≥0
(int, 4 bytes)
Class name N
L M S
AL AL+S AL+S*(M-1)…
…
FloatArrayFile file format
Array Segment
#1
Array Segment
#3
Class name N
L M S
AL AL+S AL+S*(M-1)…
…
Array Segment
#2
FloatArrayFile file format
L – segment’s lower
element index (int), L≥0
M – number of elements in
segment (int), M≥0
S – segment’s element
stride (int), S > 0
Class name N
L M S
AL AL+S AL+S*(M-1)…
…
FloatArrayFile file format
Array elements in
segment
(float)
Class name N
L M S
AL AL+S AL+S*(M-1)…
…
Array Segment
� Range objects defining segment may have
stride > 1
Range(3,13)
Range(3,12,3)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ArrayFile subclasses
Subclass Element Size Class Name
IntArrayFile 4 “int”
LongArrayFile 8 “long”
FloatArrayFile 4 “float”
DoubleArrayFile 8 “double”
CharArrayFile 2 “char”
ShortArrayFile 2 “short”
ByteArrayFile 1 “byte”
BooleanArrayFile 1 “boolean”
ObjectArrayFile<T> variable variable
Class ObjectArrayFile<T>
� Generic class
� T – reference type
�Needs to implement java.io.Serializable
� Elements may be of variable size
�E.g.: ObjectArrayFile<String>
Class ObjectArrayFile<T>
� When written to output stream:
�Actual class name of T is written
�Elements are written using method writeObject()
of class ObjectOutputStream
�Size of each element (in bytes) is written before writing the element
Class ObjectArrayFile<T>
� When read from input stream:
�Elements’ actual class name is read
�Elements are read using method readObject()
of class ObjectInputStream
�Size of each element (in bytes) is read before reading the element
� Useful when skipping elements
Class ArrayFileManager
� Process different types of ArrayFiles
� Automatically recognizes array type
� Use interactively or programmatically
ArrayFileManager Operations
� Join
�Join several ArrayFiles into one
� Split
�Split ArrayFile into several pieces
� Dump
�Dump ArrayFile contents in text format
Test Programs
� Matrix Multiplication
� Floyd’s Algorithm
� K-Means Clustering
Matrix Multiplication
A B Cm m
n
n
pp
x =
∑=
=
n
k
jkkiji BAC1
,,,where andmi ≤≤1 pj ≤≤1
Matrix Multiplicationfor (int i = 0; i < m; i++) {
for (int j = 0; j < p; j++) {
c[i][j] = 0;
for (int k = 0; k < n; k++) {
c[i][j] += a[i][k] * b[k][j];
}
}
}
� Complexity: O(n3)
Matrix Multiplication
� Easily parallelizable
� No sequential dependencies:
� To calculate Cij, we only need:
� i-th row of A
� j-th column of B
Matrix Multiplication
� Each parallel process:
�Multiply a row slice of A by column slice of B to produce a patch of C
� E.g.: 1 parallel process:
A B C
Matrix Multiplication
� E.g.: 2 parallel processes:
A B C
A B C
Process 0:
Process 1:
Matrix Multiplication
� E.g.: 4 parallel processes:
A B C
A B C
Process 0:
Process 2:
A B C
A B C
Process 1:
Process 3:
Matrix MultiplicationAccept fileA, fileB, fileC as input parameters
Each parallel process i:
Based on # of processes and process rank, determine
which row slice of A and which column slice of B to
process
Read correct row slice from fileA
Read correct column slice from fileB
Multiply row slice by column slice
Write produced patch to fileC.i
Exit
Matrix MultiplicationAccept fileA, fileB, fileC as input parameters
Each parallel process i:
Based on # of processes and process rank, determine
which row slice of A and which column slice of B to
process
Read correct row slice from fileA
Read correct column slice from fileB
Multiply row slice by column slice
Write produced patch to fileC.i
Exit
readRowSlice()
readColSlice()
writePatch()
Floyd’s Algorithm
� All-pairs shortest distance
� Given a graph G, calculate the shortest distance from
every vertex of G to every other vertex
V1
V2
V5
V4
V3
V1 V2 V3 V4 V5
V1 0 3 ∞ 6 ∞
V2 3 0 ∞ ∞ 6V3 ∞ ∞ 0 ∞ 8V4 6 ∞ ∞ 0 2V5 ∞ 6 8 2 0
Distance Matrix: D[NxN]
Floyd’s Algorithmfor i=1 to N do:
for r=1 to N do:
for c=1 to N do:
Dr,c = min{Dr,c , Dr,i+Di,c}
� Complexity: O(N3)
� Parallel Program:
�Distribute the rows of D evenly among the available parallel processes
Floyd’s Algorithmfor i=1 to N do:
for r=1 to N do:
for c=1 to N do:
Dr,c = min{Dr,c , Dr,i+Di,c}
� Every processes needs to access i-th row
Floyd’s Algorithmfor i=1 to N do:
broadcast contents of ith row
for r=1 to N do:
for c=1 to N do:
Dr,c = min{Dr,c , Dr,i+Di,c}
� Process which “has” the i-th row for the current value of i broadcasts it to other
processors
Floyd’s AlgorithmAccept fileIN, fileOUT as input parameters
Each parallel process i:
Based on # of processes and process rank, determine
which row slice of D to process
Read correct row slice from fileIN
Run Floyd’s algorithm
Write produced row slice to fileOUT.i
Exit
Source code for this program adapted from [4]
Accept fileIN, fileOUT as input parameters
Each parallel process i:
Based on # of processes and process rank, determine
which row slice of D to process
Read correct row slice from fileIN
Run Floyd’s algorithm
Write produced row slice to fileOUT.i
Exit
Floyd’s Algorithm
readRowSlice()
writeRowSlice()
K-Means Clustering
� Partition N observations into K clusters
N=100, K=3
Images taken from [6]
K-Means Clustering
� Inputs:
�N – number of data points with D coordinates each
�K – number of clusters
� Outputs:
�Coordinates of K cluster centers
C1 C2 C3 … CD
P1 10 4.3 6.9 … 5.8
P2 3.9 0.8 17 … 62
P3 11 4.6 40 … 1.8
… … … … … …
PN 7.9 68 8.2 … 0.4
P[NxD]
K-Means Clustering
� Algorithm:1. Initialize the K cluster centers to be the first K data points
2. Initially, each data point is not assigned to any cluster
3. Repeat:
4. Assign each data point to the cluster whose center is the closest using
the Euclidean distance
5. Recalculate the center point of each cluster
6. Until none of the data points changed to a different cluster
7. Output the coordinates of each cluster center
�Euclidean distance:
21
1
2)(
−= ∑
=
D
i
ii bad
K-Means Clustering
� Cluster Parallel Algorithm
�Distribute the N data points evenly among parallel processors
�Process row slices of matrix P
K-Means Clustering1. Accept fileIN, K as input parameters
2. Each parallel process i:
3. Based on # of processes and process rank, determine
which row slice of P to process
4. Read correct row slice from fileIN
5. Assign initial cluster centers to first K data points
6. Assign each data point in slice to closest cluster
7. Perform All-Reduce using logical ‘OR’
8. If no data points changed clusters, go to 12
9. Perform All-Reduce using addition to propagate cluster
center data to all processes
10. Recalculate cluster centers
11. Go to 6
12. Print out cluster centers
13. Exit
K-Means Clustering
� Inputs:
�DoubleMatrixFile
� N rows, D columns
�2 segments:
1. First K rows
2. Rest of the rows
C1 C2 C3 … CD
P1 10 4.3 6.9 … 5.8
P2 3.9 0.8 17 … 62
P3 11 4.6 40 … 1.8
… … … … … …
PN 7.9 68 8.2 … 0.4
P[NxD]
K-Means Clustering1. Accept fileIN, K as input parameters
2. Each parallel process i:
3. Based on # of processes and process rank, determine
which row slice of P to process
4. Read K rows from the 1st segment in fileIN
5. Read correct row slice from 2nd segment in fileIN
6. Assign initial cluster centers to first K data points
7. Assign each data point in slice to closest cluster
8. Perform All-Reduce using logical ‘OR’
9. If no data points changed clusters, go to 12
10. Perform All-Reduce using addition to propagate cluster
center data to all processes
11. Recalculate cluster centers
12. Go to 6
13. Print out cluster centers
14. Exit
readSegmentRowSlice(
)
readSegmentRowSlice(
)
Test Runs
� 3 variations:
1. Sequential program not using PDS
2. Cluster parallel program not using PDS
3. Cluster parallel program using PDS
Test Runs
Variation 3:
IntMatrixFile imf = new IntMatrixFile();
MatrixFileReader reader =
imf1.prepareToRead(new
BufferedInputStream(new
FileInputStream(“matrix.dat”)));
int r = imf.getRowCount(); // get # of rows
int c = imf.getColCount(); // get # of cols
reader.read();
int[][] A = imf.getMatrix();
reader.close();
// use matrix A
Variations 1 & 2:
DataInputStream dis = new DataInputStream(new
BufferedInputStream(new
FileInputStream(“matrix.dat”)));
dis.readUTF(); // read & skip the
// element class name
int r = dis.readInt(); // read # of rows
int c = dis.readInt(); // read # of cols
// skip 6 integers that provide info about the
segment:
dis.skipBytes(6*4);
int[][] A = new int[r][c];
for (int i = 0; i < r; i++) {
for (int j = 0; j < c; j++) {
A[i][j] = dis.readInt();
}
}
dis.close();
// use matrix A
Test Runs
� Hybrid SMP parallel computer [?]:� Frontend computer -- tardis.cs.rit.edu
� UltaSPARC-IIe CPU, 650 MHz clock, 512 MB main memory
� 10 backend computers -- dr00 through dr09
� each with two AMD Opteron 2218 dual-core CPUs, four
processors, 2.6 GHz clock, 8 GB main memory
� 1-Gbps switched Ethernet backend interconnection
network
� Aggregate 104 GHz clock, 80 GB main memory
Test Runs
� Run each test program at least 3 times with
each configuration
�Sequential
�K = 1, 2, 4, 8, 10 parallel processors
� Take the smallest running time for each run
� Calculate and compare:
�Running times
�Efficiency
�EDSF
�Speedup
Results: Matrix Multiplication
� Input Dataset #1: [1000x800] * [800x900] = [1000x900]
� Problem Size: N = 0.7G
� K – number of parallel processes
� T – minimal running time from 3 runs (in ms.)
Variation K T Spdup Effic EDSF
Sequential seq 8719
Non-PDS
1 8610 1,013 1,013
2 4255 2,049 1,025 -0,012
4 2602 3,351 0,838 0,07
8 1468 5,939 0,742 0,052
10 1305 6,681 0,668 0,057
PDS
seq 8719
1 8396 1,038 1,038
2 4266 2,044 1,022 0,016
4 2586 3,372 0,843 0,077
8 1505 5,793 0,724 0,062
10 1199 7,272 0,727 0,048
Results: Matrix Multiplication� Input Dataset #2: [4000x4000] * [4000x4000] = [4000x4000]
� Problem Size: N=64G
� K – number of parallel processes
� T – minimal running time from 3 runs (in ms.)
Variation K T Spdup Effic EDSF
Sequential seq 8719
Non-PDS
1 8610 1,013 1,013
2 4255 2,049 1,025 -0,012
4 2602 3,351 0,838 0,07
8 1468 5,939 0,742 0,052
10 1305 6,681 0,668 0,057
PDS
seq 1106687
1 949546 1,165 1,165
2 489582 2,26 1,13 0,031
4 243092 4,553 1,138 0,008
8 127135 8,705 1,088 0,01
10 99322 11,142 1,114 0,005
Results: Matrix Multiplication
Results: Matrix Multiplication
Results: K-Means Clustering
� Input Dataset #1: D=2, N=180000, K=100
� K – number of parallel processes
� T – minimal running time from 3 runs (in ms.)
Variation K T Spdup Effic EDSF
Sequential seq 119364
Non-PDS
1 122578 0,974 0,974
2 71402 1,672 0,836 0,165
4 50256 2,375 0,594 0,213
8 23374 5,107 0,638 0,075
10 11128 10,726 1,073 -0,01
PDS
seq 119364
1 121650 0,981 0,981
2 70768 1,687 0,843 0,163
4 49815 2,396 0,599 0,213
8 22727 5,252 0,657 0,071
10 11603 10,287 1,029 -0,005
Results: K-Means Clustering
� Input Dataset #3: D=6, N=600000, K=55
� K – number of parallel processes
� T – minimal running time from 3 runs (in ms.)
Variation K T Spdup Effic EDSF
Sequential seq 319327
Non-PDS
1 311595 1,025 1,025
2 185976 1,717 0,859 0,194
4 107607 2,968 0,742 0,127
8 53247 5,997 0,75 0,052
10 43419 7,355 0,735 0,044
PDS
seq 319327
1 311950 1,024 1,024
2 185151 1,725 0,862 0,187
4 108308 2,948 0,737 0,13
8 53433 5,976 0,747 0,053
10 44218 7,222 0,722 0,046
Results: K-Means Clustering
Results: K-Means Clustering
Results: Floyd’s Algorithm
� Input Dataset: Graph with 4000 vertices
� Problem Size: N = 64G
� K – number of parallel processes
� T – minimal running time from 3 runs (in ms.)
Variation K T Spdup Effic EDSF
Sequential seq 614362
Non-PDS
1 587473 1,046 1,046
2 295097 2,082 1,041 0,005
4 149873 4,099 1,025 0,007
8 75500 8,137 1,017 0,004
10 63645 9,653 0,965 0,009
PDS
seq 614362
1 564362 1,089 1,089
2 276787 2,22 1,11 -0,019
4 143133 4,292 1,073 0,005
8 72371 8,489 1,061 0,004
10 61410 10,004 1 0,01
Results: Floyd’s Algorithm
Results: Floyd’s Algorithm
Conclusions
� New PDS Developed
�Extension of PJ library
�Set of classes for working with matrices and
arrays of different element types
� Full Javadocs and developer’s guide
�Object-oriented design
�Concise, elegant
Conclusions
� Manager utilities developed
� 3 Test programs developed
� Ready to be used in writing parallel
programs
Future Work
� Integrate with PJ job scheduler
�Automatically partition and distribute the input files
� Compatibility with existing parallel file
systems
References[1] Avery Ching, Alok Choudhary, Wei-keng Liao, Robert Ross, William Gropp. Noncontiguous I/O
through PVFS. Proceedings of 2002 IEEE International Conference on Cluster Computing, September, 2002.
[2] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. July 18,
1997.
http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html
[3] J. Perez, L. Sanchez, F. Garcia, A. Calderon, J. Carretero. High performance Java input/output for heterogeneous distributed computing. Proceedings of the 10th IEEE Symposium on Computers
and Communications, 2005.
[4] A.Kaminsky. Building Programs: SMPs, Clusters and Java. Cengage Course Technology, 2010.
[5] Alan Kaminsky. Parallel Java Library
http://www.cs.rit.edu/~ark/pj.shtml
[6] Alan Kaminsky. 4005-736-70 Parallel Computing II, Programming Project 1.
http://www.cs.rit.edu/~ark/736/project01/project01.shtml
Questions?
Appendix
� Speedup
� K – number of processes
� Tseq – running time of sequential program
� TK – running time of parallel program on K processes
K
seq
KT
TS =
Appendix
� Efficiency
� K – number of processes
K
SEff K
K =
Appendix
� EDSF - Experimentally Determined Sequential
Fraction
� K – number of processes
� Tseq – running time of sequential program
� TK – running time of parallel program on K processes
seqseq
seqK
TTK
TTKF
−×
−×
=