Parallel Datastore System for Parallel Javaark/students/obs8529/presentation.pdf · Parallel Java...

Parallel Datastore

System for Parallel Java

MSCS Capstone Project

by Omonbek Salaev

January 22, 2010

Overview

� Introduction

� Parallel Java Library

� DoubleMatrixFile

� Parallel Datastore System (PDS)

� Test Programs

� Results

� Conclusions

Introduction

� Parallel Virtual File System (PVFS) [1]

�Files split across I/O nodes

�Manager node coordinates partitioning

� Java Expandable Parallel File System

� I/O library for high-performance Java computing

�Files are declustered onto several NFS servers

Introduction

� Existing Parallel File Systems

�Complex

�Java programming often not supported

�Difficult to learn / use

Parallel Java (PJ)

� API and middleware for parallel programming

� 100% Java

� Designed and implemented by professor

A. Kaminsky and Luke McOmber

� Supports parallel programming for:

�Shared Memory Processor (SMP) computers

�Cluster computers

�Hybrid computers

DoubleMatrixFile

� Input/Output context in PJ

� Object for reading/writing a matrix of

doubles from/to a file

� RxC matrix of doubles

DoubleMatrixFile

DoubleMatrixFile()

DoubleMatrixFile(int R, int C, double[][] mx)

SetMatrix(int R, int C, double[][] mx)

getColCount() : int

getRowCount() : int

getMatrix() : double[][]

prepareToRead(InputStream is) : Reader

prepareToWrite(OutputStream os) : Writer

Reader

Writer

DoubleMatrixFileString fn = “test.dat”;

int R = 20, C=10;

double[][] mx = new double[R][C];

// compute matrix elements…

FileOutputStream fos = new FileOutputStream(fn);

DoubleMatrixFile dmf = new DoubleMatrixFile(R, C, mx);

DoubleMatrixFile.Writer writer = dmf.prepareToWrite(newBufferedOutputStream(fos));

writer.write();

writer.close();

Class DoubleMatrixFileString fn = “test.dat”;

FileInputStream fis = new FileIntputStream(fn);

DoubleMatrixFile dmf = new DoubleMatrixFile();

DoubleMatrixFile.Reader reader = dmf.prepareToRead(newBufferedIntputStream(fis));

reader.read();

reader.close();

int R = dmf.getRowCount();

Int C = dmf.getColCount();

double[][] mx = dmf.getMatrix();

// use matrix elements…

DoubleMatrixFile file format

RL CL M N

ARL,CL ARL,CL+1 ARL,CL+N-1…

ARL+1,CL ARL+1,CL+1 ARL+1,CL+N-1…

ARL+M-1,CL ARL+M-1,CL+1 ARL+M-1,CL+N-1…

RL CL M N

Number of matrix

rows/columns

R≥0, C≥0

(int, 4 bytes each)

RL CL M N

Matrix Segment

RL CL M N

RL – segment’s lower row

index (int), RL≥0

CL – segment’s lower

column index (int), CL≥0

M – number of rows in

segment (int), M≥0

N – number of columns in

RL CL M N

Matrix elements in

segment

(double)

Stored in row-

major order

Class Range

� Defined by:

� L – Lower bound

� U – Upper bound

� S – Stride

� N – Length

� A range of integers:

{L, L+S, L+2*S, . . . , L+(N-1)*S},

where U = L+(N-1)*S

� Range(2, 9, 3) Range(1, 4)

{2,5,8} {1,2,3,4}

Class DoubleMatrixFile.Writer

� Object to write DoubleMatrixFile to an output

stream

� Created by the prepareToWrite() method

� When created:

�Number of rows and number of columns are

written to output stream

Class DoubleMatrixFile.Writer

� Methods to write:

�writeColSlice(Range slice)

�writeRowSlice(Range slice)

�writePatch(Range rslice, Range cslice)

�write

* Range objects must have stride=1

� Each write operation writes a matrix

segment to output stream

WriteRowSlice

Writer wr = dmf.prepareToWrite(ostream);

Range r = new Range(2,5);

wr.writeRowSlice(r);

wr.close();

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

WriteColSlice

wr.writeColSlice(r);

wr.close();

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

WritePatch

Range rr = new Range(2,5);

Range cr = new Range(1,4);

wr.writePatch(rr,cr);

wr.close();

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

wr.write();

wr.close();

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

Class DoubleMatrixFile.Reader

� Object to read DoubleMatrixFile from an

input stream

� Created by the prepareToRead() method

� When created:

�Number of rows and number of columns are

read from input stream

�Storage is allocated for underlying matrix’s row references

Class DoubleMatrixFile.Reader

� Methods to read:� read

� readColSlice(Range slice)

� readRowSlice(Range slice)

� readPatch(Range rslice, Range cslice)

� readSegment

� readSegmentColSlice(Range slice)

� readSegmentRowSlice(Range slice)

� readSegmentPatch(Range rslice, Range cslice)

* Range objects must have stride=1

ReadRowSlice

Reader rd = dmf.prepareToRead( istream);

rd.readRowSlice(r);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 33 15 10 51 70 54 44 48

35 12 43 21 42 32 13 12 23

20 15 12 13 16 17 88 16 17

51 34 92 33 40 69 38 30 65

ReadSegmentRowSlice

rd.readSegmentRowSlice(r);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 33 15 10 51 70 54 44 48

35 12 43 21 42 32 13 12 23

20 15 12 13 16 17 88 16 17

51 34 92 33 40 69 38 30 65

ReadColSlice

rd.readColSlice(r);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

ReadSegmentColSlice

rd.readSegmentColSlice(r);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

ReadPatch

rd.readPatch(rr,cr);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

ReadSegmentPatch

rd.readSegmentPatch(rr,cr);

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

rd.read();

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

ReadSegment

rd.readSegment();

rd.close();

0 0 4 11

97 19 42 50 91 74 94 87 80 29 6

95 42 93 81 62 72 12 11 93 32 77

22 89 42 10 36 77 81 25 47 77 55

91 74 97 53 49 59 98 32 85 18 84

4 0 4 9

17 9 41 54 94 24 14 27 20

25 12 96 82 32 71 22 31 91

12 69 22 14 11 97 21 24 27

61 54 17 23 39 50 18 31 45

Parallel Datastore System (PDS)

� Extension of I/O context of PJ library

� Set of classes to work with matrices and

arrays

� Modeled after the original DoubleMatrixFile

MatrixFile

DoubleMatrixFile

FloatMatrixFile

LongMatrixFile

IntMatrixFile

ShortMatrixFile

CharMatrixFile

ByteMatrixFile

BooleanMatrixFileObjectMatrixFile

Reader

Writer

Reader

Writer

Reader

Writer

Reader Reader

Writer

Reader

Writer

Reader

Writer

Reader

WriterWriter

Reader

MatrixFileWriter

MatrixFileReader

Class MatrixFile

� Abstract class

� Includes methods applicable to all matrix

files, regardless of element type

� Inner (abstract) classes:

�MatrixFileReader

�MatrixFileWriter

FloatMatrixFile file format

Class name R

RL CL M N

ARL,CL ARL,CL+cs ARL,CL+N-1…

ARL+rs,CL ARL+rs,CL+cs ARL+rs,CL+N-1…

ARL+M-1,CL ARL+M-1,CL+cs ARL+M-1,CL+N-1…

Elements’ class name

(String “float”);

Number of matrix

rows/columns

R≥0, C≥0

(int, 4 bytes each)

Class name R

RL CL M N

Matrix Segment

Class name R

RL CL M N

RL – segment’s lower row

index (int), RL≥0

CL – segment’s lower

column index (int), CL≥0

M – number of rows in

N – number of columns in

RS – segment’s row stride

(int),RS > 0

CS – segment’s column

stride (int), CS > 0

Class name R

RL CL M N

Matrix elements in

segment

(float)

Stored in row-

major order

Class name R

RL CL M N

Matrix Segment

� Range objects defining slices may have

stride > 1

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

0 1 2 3 4 5 6 7 8 9 10

0 97 19 42 50 91 74 94 87 80 29 6

1 95 42 93 81 62 72 12 11 93 32 77

2 22 89 42 10 36 77 81 25 47 77 55

3 91 74 97 53 49 59 98 32 85 18 84

4 56 71 66 41 69 95 5 7 70 25 94

5 29 62 68 50 19 95 92 73 19 71 81

6 82 93 91 72 53 49 33 18 68 59 79

7 19 18 66 53 84 54 67 22 53 27 77

8 95 76 6 57 83 14 56 82 52 4 1

9 46 18 66 30 93 77 69 8 33 92 49

10 63 96 62 37 91 34 43 18 97 50 85

Range(2,8,3) Range(0,10,5) Range(2,8,3)

Range(0,10,5)

MatrixFile subclasses

Subclass Element Size Class Name

IntMatrixFile 4 “int”

LongMatrixFile 8 “long”

FloatMatrixFile 4 “float”

DoubleMatrixFile 8 “double”

CharMatrixFile 2 “char”

ShortMatrixFile 2 “short”

ByteMatrixFile 1 “byte”

BooleanMatrixFile 1 “boolean”

ObjectMatrixFile<T> variable variable

Class ObjectMatrixFile<T>

� Generic class

� T – reference type

�Needs to implement java.io.Serializable

� Elements may be of variable size

�E.g.: ObjectMatrixFile<String>

� When written to output stream:

�Actual class name of T is written

�Elements are written using method writeObject()

of class ObjectOutputStream

�Size of each element (in bytes) is written before writing the element

� When read from input stream:

�Elements’ actual class name is read

�Elements are read using method readObject()

of class ObjectInputStream

�Size of each element (in bytes) is read before reading the element

� Useful when skipping elements

Class MatrixFileManager

� Process different types of MatrixFiles

� Automatically recognizes matrix type

� Use interactively or programmatically

MatrixFileManager Operations

� Join

�Join several MatrixFiles into one

� Split

�Split MatrixFile into several pieces

� By rows, columns, or patches

� Dump

�Dump MatrixFile contents in text format

ArrayFile

DoubleArrayFile

FloatArrayFile

LongArrayFile

IntArrayFile

ShortArrayFile

CharArrayFile

ByteArrayFile

BooleanArrayFileObjectArrayFile

Reader

Writer

Reader

Writer

Reader

Writer

Reader Reader

Writer

Reader

Writer

Reader

Writer

Reader

WriterWriter

Reader

ArrayFileWriter

ArrayFileReader

Class ArrayFile

� Abstract class

� Includes methods applicable to all array

files, regardless of element type

� Inner (abstract) classes:

�ArrayFileReader

�ArrayFileWriter

FloatArrayFile file format

Class name N

AL AL+S AL+S*(M-1)…

Elements’ class name

(String “float”);

Number of array elements

(int, 4 bytes)

Class name N

Array Segment

Class name N

Array Segment

L – segment’s lower

element index (int), L≥0

M – number of elements in

S – segment’s element

stride (int), S > 0

Class name N

Array elements in

segment

(float)

Class name N

Array Segment

� Range objects defining segment may have

stride > 1

Range(3,13)

Range(3,12,3)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

ArrayFile subclasses

Subclass Element Size Class Name

IntArrayFile 4 “int”

LongArrayFile 8 “long”

FloatArrayFile 4 “float”

DoubleArrayFile 8 “double”

CharArrayFile 2 “char”

ShortArrayFile 2 “short”

ByteArrayFile 1 “byte”

BooleanArrayFile 1 “boolean”

ObjectArrayFile<T> variable variable

Class ObjectArrayFile<T>

� Generic class

� T – reference type

�Needs to implement java.io.Serializable

� Elements may be of variable size

�E.g.: ObjectArrayFile<String>

� When written to output stream:

�Actual class name of T is written

�Elements are written using method writeObject()

of class ObjectOutputStream

�Size of each element (in bytes) is written before writing the element

� When read from input stream:

�Elements’ actual class name is read

�Elements are read using method readObject()

of class ObjectInputStream

�Size of each element (in bytes) is read before reading the element

� Useful when skipping elements

Class ArrayFileManager

� Process different types of ArrayFiles

� Automatically recognizes array type

� Use interactively or programmatically

ArrayFileManager Operations

� Join

�Join several ArrayFiles into one

� Split

�Split ArrayFile into several pieces

� Dump

�Dump ArrayFile contents in text format

Test Programs

� Matrix Multiplication

� Floyd’s Algorithm

� K-Means Clustering

Matrix Multiplication

A B Cm m

jkkiji BAC1

,,,where andmi ≤≤1 pj ≤≤1

Matrix Multiplicationfor (int i = 0; i < m; i++) {

for (int j = 0; j < p; j++) {

c[i][j] = 0;

for (int k = 0; k < n; k++) {

c[i][j] += a[i][k] * b[k][j];

� Complexity: O(n3)

� Easily parallelizable

� No sequential dependencies:

� To calculate Cij, we only need:

� i-th row of A

� j-th column of B

� Each parallel process:

�Multiply a row slice of A by column slice of B to produce a patch of C

� E.g.: 1 parallel process:

� E.g.: 2 parallel processes:

Process 0:

Process 1:

� E.g.: 4 parallel processes:

Process 0:

Process 2:

Process 1:

Process 3:

Matrix MultiplicationAccept fileA, fileB, fileC as input parameters

Each parallel process i:

Based on # of processes and process rank, determine

which row slice of A and which column slice of B to

process

Read correct row slice from fileA

Read correct column slice from fileB

Multiply row slice by column slice

Write produced patch to fileC.i

Matrix MultiplicationAccept fileA, fileB, fileC as input parameters

which row slice of A and which column slice of B to

process

Read correct row slice from fileA

Read correct column slice from fileB

Multiply row slice by column slice

Write produced patch to fileC.i

readRowSlice()

readColSlice()

writePatch()

Floyd’s Algorithm

� All-pairs shortest distance

� Given a graph G, calculate the shortest distance from

every vertex of G to every other vertex

V1 V2 V3 V4 V5

V1 0 3 ∞ 6 ∞

V2 3 0 ∞ ∞ 6V3 ∞ ∞ 0 ∞ 8V4 6 ∞ ∞ 0 2V5 ∞ 6 8 2 0

Distance Matrix: D[NxN]

Floyd’s Algorithmfor i=1 to N do:

for r=1 to N do:

for c=1 to N do:

Dr,c = min{Dr,c , Dr,i+Di,c}

� Complexity: O(N3)

� Parallel Program:

�Distribute the rows of D evenly among the available parallel processes

for r=1 to N do:

for c=1 to N do:

� Every processes needs to access i-th row

broadcast contents of ith row

for r=1 to N do:

for c=1 to N do:

� Process which “has” the i-th row for the current value of i broadcasts it to other

processors

Floyd’s AlgorithmAccept fileIN, fileOUT as input parameters

which row slice of D to process

Read correct row slice from fileIN

Run Floyd’s algorithm

Write produced row slice to fileOUT.i

Source code for this program adapted from [4]

Accept fileIN, fileOUT as input parameters

which row slice of D to process

Read correct row slice from fileIN

Run Floyd’s algorithm

Write produced row slice to fileOUT.i

Floyd’s Algorithm

readRowSlice()

writeRowSlice()

K-Means Clustering

� Partition N observations into K clusters

N=100, K=3

Images taken from [6]

K-Means Clustering

� Inputs:

�N – number of data points with D coordinates each

�K – number of clusters

� Outputs:

�Coordinates of K cluster centers

C1 C2 C3 … CD

P1 10 4.3 6.9 … 5.8

P2 3.9 0.8 17 … 62

P3 11 4.6 40 … 1.8

… … … … … …

PN 7.9 68 8.2 … 0.4

P[NxD]

K-Means Clustering

� Algorithm:1. Initialize the K cluster centers to be the first K data points

2. Initially, each data point is not assigned to any cluster

3. Repeat:

4. Assign each data point to the cluster whose center is the closest using

the Euclidean distance

5. Recalculate the center point of each cluster

6. Until none of the data points changed to a different cluster

7. Output the coordinates of each cluster center

�Euclidean distance:

−= ∑

ii bad

K-Means Clustering

� Cluster Parallel Algorithm

�Distribute the N data points evenly among parallel processors

�Process row slices of matrix P

K-Means Clustering1. Accept fileIN, K as input parameters

2. Each parallel process i:

3. Based on # of processes and process rank, determine

which row slice of P to process

4. Read correct row slice from fileIN

5. Assign initial cluster centers to first K data points

6. Assign each data point in slice to closest cluster

7. Perform All-Reduce using logical ‘OR’

8. If no data points changed clusters, go to 12

9. Perform All-Reduce using addition to propagate cluster

center data to all processes

10. Recalculate cluster centers

11. Go to 6

12. Print out cluster centers

13. Exit

K-Means Clustering

� Inputs:

�DoubleMatrixFile

� N rows, D columns

�2 segments:

1. First K rows

2. Rest of the rows

C1 C2 C3 … CD

P1 10 4.3 6.9 … 5.8

P2 3.9 0.8 17 … 62

P3 11 4.6 40 … 1.8

… … … … … …

PN 7.9 68 8.2 … 0.4

P[NxD]

K-Means Clustering1. Accept fileIN, K as input parameters

2. Each parallel process i:

3. Based on # of processes and process rank, determine

which row slice of P to process

4. Read K rows from the 1st segment in fileIN

5. Read correct row slice from 2nd segment in fileIN

6. Assign initial cluster centers to first K data points

7. Assign each data point in slice to closest cluster

8. Perform All-Reduce using logical ‘OR’

9. If no data points changed clusters, go to 12

10. Perform All-Reduce using addition to propagate cluster

center data to all processes

11. Recalculate cluster centers

12. Go to 6

13. Print out cluster centers

14. Exit

readSegmentRowSlice(

Test Runs

� 3 variations:

1. Sequential program not using PDS

2. Cluster parallel program not using PDS

3. Cluster parallel program using PDS

Test Runs

Variation 3:

IntMatrixFile imf = new IntMatrixFile();

MatrixFileReader reader =

imf1.prepareToRead(new

BufferedInputStream(new

FileInputStream(“matrix.dat”)));

int r = imf.getRowCount(); // get # of rows

int c = imf.getColCount(); // get # of cols

reader.read();

int[][] A = imf.getMatrix();

reader.close();

// use matrix A

Variations 1 & 2:

DataInputStream dis = new DataInputStream(new

BufferedInputStream(new

FileInputStream(“matrix.dat”)));

dis.readUTF(); // read & skip the

// element class name

int r = dis.readInt(); // read # of rows

int c = dis.readInt(); // read # of cols

// skip 6 integers that provide info about the

segment:

dis.skipBytes(6*4);

int[][] A = new int[r][c];

for (int i = 0; i < r; i++) {

for (int j = 0; j < c; j++) {

A[i][j] = dis.readInt();

dis.close();

// use matrix A

Test Runs

� Hybrid SMP parallel computer [?]:� Frontend computer -- tardis.cs.rit.edu

� UltaSPARC-IIe CPU, 650 MHz clock, 512 MB main memory

� 10 backend computers -- dr00 through dr09

� each with two AMD Opteron 2218 dual-core CPUs, four

processors, 2.6 GHz clock, 8 GB main memory

� 1-Gbps switched Ethernet backend interconnection

network

� Aggregate 104 GHz clock, 80 GB main memory

Test Runs

� Run each test program at least 3 times with

each configuration

�Sequential

�K = 1, 2, 4, 8, 10 parallel processors

� Take the smallest running time for each run

� Calculate and compare:

�Running times

�Efficiency

�EDSF

�Speedup

Results: Matrix Multiplication

� Input Dataset #1: [1000x800] * [800x900] = [1000x900]

� Problem Size: N = 0.7G

� K – number of parallel processes

� T – minimal running time from 3 runs (in ms.)

Variation K T Spdup Effic EDSF

Sequential seq 8719

Non-PDS

1 8610 1,013 1,013

2 4255 2,049 1,025 -0,012

4 2602 3,351 0,838 0,07

8 1468 5,939 0,742 0,052

10 1305 6,681 0,668 0,057

seq 8719

1 8396 1,038 1,038

2 4266 2,044 1,022 0,016

4 2586 3,372 0,843 0,077

8 1505 5,793 0,724 0,062

10 1199 7,272 0,727 0,048

Results: Matrix Multiplication� Input Dataset #2: [4000x4000] * [4000x4000] = [4000x4000]

� Problem Size: N=64G

Sequential seq 8719

Non-PDS

1 8610 1,013 1,013

2 4255 2,049 1,025 -0,012

4 2602 3,351 0,838 0,07

8 1468 5,939 0,742 0,052

10 1305 6,681 0,668 0,057

seq 1106687

1 949546 1,165 1,165

2 489582 2,26 1,13 0,031

4 243092 4,553 1,138 0,008

8 127135 8,705 1,088 0,01

10 99322 11,142 1,114 0,005

Results: Matrix Multiplication

Results: K-Means Clustering

� Input Dataset #1: D=2, N=180000, K=100

Sequential seq 119364

Non-PDS

1 122578 0,974 0,974

2 71402 1,672 0,836 0,165

4 50256 2,375 0,594 0,213

8 23374 5,107 0,638 0,075

10 11128 10,726 1,073 -0,01

seq 119364

1 121650 0,981 0,981

2 70768 1,687 0,843 0,163

4 49815 2,396 0,599 0,213

8 22727 5,252 0,657 0,071

10 11603 10,287 1,029 -0,005

� Input Dataset #3: D=6, N=600000, K=55

Non-PDS

1 311595 1,025 1,025

2 185976 1,717 0,859 0,194

4 107607 2,968 0,742 0,127

8 53247 5,997 0,75 0,052

10 43419 7,355 0,735 0,044

seq 319327

1 311950 1,024 1,024

2 185151 1,725 0,862 0,187

4 108308 2,948 0,737 0,13

8 53433 5,976 0,747 0,053

10 44218 7,222 0,722 0,046

Results: Floyd’s Algorithm

� Input Dataset: Graph with 4000 vertices

� Problem Size: N = 64G

Non-PDS

1 587473 1,046 1,046

2 295097 2,082 1,041 0,005

4 149873 4,099 1,025 0,007

8 75500 8,137 1,017 0,004

10 63645 9,653 0,965 0,009

seq 614362

1 564362 1,089 1,089

2 276787 2,22 1,11 -0,019

4 143133 4,292 1,073 0,005

8 72371 8,489 1,061 0,004

10 61410 10,004 1 0,01

Results: Floyd’s Algorithm

Conclusions

� New PDS Developed

�Extension of PJ library

�Set of classes for working with matrices and

arrays of different element types

� Full Javadocs and developer’s guide

�Object-oriented design

�Concise, elegant

Conclusions

� Manager utilities developed

� 3 Test programs developed

� Ready to be used in writing parallel

programs

Future Work

� Integrate with PJ job scheduler

�Automatically partition and distribute the input files

� Compatibility with existing parallel file

systems

References[1] Avery Ching, Alok Choudhary, Wei-keng Liao, Robert Ross, William Gropp. Noncontiguous I/O

through PVFS. Proceedings of 2002 IEEE International Conference on Cluster Computing, September, 2002.

[2] Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface. July 18,

http://www.mpi-forum.org/docs/mpi-20-html/mpi2-report.html

[3] J. Perez, L. Sanchez, F. Garcia, A. Calderon, J. Carretero. High performance Java input/output for heterogeneous distributed computing. Proceedings of the 10th IEEE Symposium on Computers

and Communications, 2005.

[4] A.Kaminsky. Building Programs: SMPs, Clusters and Java. Cengage Course Technology, 2010.

[5] Alan Kaminsky. Parallel Java Library

http://www.cs.rit.edu/~ark/pj.shtml

[6] Alan Kaminsky. 4005-736-70 Parallel Computing II, Programming Project 1.

http://www.cs.rit.edu/~ark/736/project01/project01.shtml

Questions?

Appendix

� Speedup

� K – number of processes

� Tseq – running time of sequential program

� TK – running time of parallel program on K processes

Appendix

� Efficiency

SEff K

Appendix

� EDSF - Experimentally Determined Sequential

Fraction

� Tseq – running time of sequential program

� TK – running time of parallel program on K processes

seqseq

Parallel Datastore System for Parallel Javaark/students/obs8529/presentation.pdf · Parallel Java...

Documents

Transcript of Parallel Datastore System for Parallel Javaark/students/obs8529/presentation.pdf · Parallel Java...

Efï¬cient Java RMI for Parallel Programming

The PaCMAn Metacomputer: parallel computing with Java ...

Parallel Computing in Java - St. Cloud State University

Java 8 Parallel ImageStreamGang Example (Part 3)schmidt/cs891f/2018-PDFs/18-Java... · 2018-10-24 · 8 Implementing a Parallel Stream in ImageStreamGang •We focus on processStream()

Safe Parallel Programming with Session Java

Java Parallel Streams Internals: Order of Results for ...

Java Lambdas and Parallel Streams · 2019. 6. 12. · . ller, Java Lambdas and Parallel Streams 1.1-1-2-2-1 CHAPTER 1 Introduction Lambdas and (Parallel) Streams Some of the new features

Java-based coupling for parallel predictive-adaptive …downloads.hindawi.com/journals/sp/1999/812589.pdfC. Germain-Renaud and V. Néri / Java-based coupling for parallel predictive-adaptive

PARALLEL PROGRAMMING IN JAVA - Computer Science

Parallel streams in java 8

Parallel programming with Java Slides 1: Introduction

Javelin Internet-based parallel computing using Java.

Casual mass parallel data processing in Java

A Type and Effect System for Deterministic Parallel Java

JPVM: Network Parallel Computing in Java 1 - JPVM: Network Parallel Computing in Java Adam J. Ferrari ferrari@cs.virginia.edu Technical Report CS-97-29 Department of Computer Science

Parallel Datastore System for Parallel Javaark/students/obs8529/report.pdf · Parallel Datastore System for Parallel Java by Omonbek Salaev A Capstone Project Final Report Submitted

Java Thread and Process Performance for Parallel Machine ...dsc.soic.indiana.edu/publications/ieee.bigdata.2016-v9.pdf · Java Thread and Process Performance for Parallel Machine

THE PARALLEL JAVA 2 LIBRARY - sc14.supercomputing.orgsc14.supercomputing.org/.../post116s2-file2.pdfTHE PARALLEL JAVA 2 LIBRARY Multicore Parallel Programming in 100% Java Shared global

Java Parallel

Efficient Memory and Thread Management in Highly Parallel Java Applications