Ernie Chan

T H E U N I V E R S I T Y O F T E X A S A T A U S T I N

Programming Dense Matrix Computations Using Distributed and Off-Chip Shared-Memory on

Many-Core Architectures

Ernie Chan

MARC symposium 2

• 48 cores in 6×4 mesh with 2 cores per tile• 4 DDR3 memory controllers

How to Program SCC?

November 9, 2010

System I/F

TileTile

Core 1

Core 0

Router MPB

Core 1

Core 0

MARC symposium 3

Outline

• How to Program SCC?• Elemental• Collective Communication• Off-Chip Shared-Memory• Conclusion

November 9, 2010

MARC symposium 4

Elemental

• New, Modern Distributed-Memory Dense Linear Algebra Library– Replacement for PLAPACK and ScaLAPACK– Object-oriented data structures for matrices– Coded in C++– Torus-wrap/elemental mapping of matrices to a

two-dimensional process grid– Implemented entirely using bulk synchronous

communication

November 9, 2010

MARC symposium 5

Elemental

• Two-Dimensional Process Grid: – Tile the process grid over the matrix to assign each

matrix element to a process

November 9, 2010

MARC symposium 6

Elemental

November 9, 2010

MARC symposium 7

Elemental

November 9, 2010

MARC symposium 8

Elemental

• Redistributing the Matrix Over a Process Grid– Collective communication

November 9, 2010

MARC symposium 9

Outline

November 9, 2010

MARC symposium 10

Collective Communication

• RCCE Message Passing API– Blocking send and receiveint RCCE_send( char *buf, size_t num, int dest );int RCCE_recv( char *buf, size_t num, int src );

– Potential for deadlock

November 9, 2010

0 2 41 3 5

MARC symposium 11

• Avoiding Deadlock– Even number of cores in cycle

November 9, 2010

0 2 41 3 5

MARC symposium 12

• Avoiding Deadlock– Odd number of cores in cycle

November 9, 2010

0 2 41 3

MARC symposium 19

• Scatterint RCCE_scatter( char *inbuf, char *outbuf, size_t num, int root, RCCE_COMM comm );

November 9, 2010

Before

MARC symposium 20

• Scatterint RCCE_scatter( char *inbuf, char *outbuf, size_t num, int root, RCCE_COMM comm );

November 9, 2010

MARC symposium 21

• Allgatherint RCCE_allgather( char *inbuf, char *outbuf, size_t num, RCCE_COMM comm );

November 9, 2010

Before

MARC symposium 22

• Allgatherint RCCE_allgather( char *inbuf, char *outbuf, size_t num, RCCE_COMM comm );

November 9, 2010

MARC symposium 30

• Minimum Spanning Tree Algorithm– Scatter

November 9, 2010

MARC symposium 31

November 9, 2010

MARC symposium 32

November 9, 2010

MARC symposium 33

November 9, 2010

MARC symposium 34

• Cyclic (Bucket) Algorithm– Allgather

November 9, 2010

MARC symposium 35

November 9, 2010

MARC symposium 36

November 9, 2010

MARC symposium 37

November 9, 2010

MARC symposium 38

November 9, 2010

MARC symposium 39

November 9, 2010

MARC symposium 40

November 9, 2010

MARC symposium 41

Elemental

November 9, 2010

MARC symposium 43

Elemental

November 9, 2010

MARC symposium 44

Elemental

November 9, 2010

MARC symposium 45

Elemental

November 9, 2010

MARC symposium 46

Outline

November 9, 2010

MARC symposium 47

Off-Chip Shared-Memory

• Distributed vs. Shared-Memory

November 9, 2010

System I/F

Shared-Memory

MARC symposium 48

• SuperMatrix– Map dense matrix computation

to a directed acyclic graph

– No matrix distribution– Store DAG and matrix

on off-chip shared- memory

November 9, 2010

TRSM2TRSM1

SYRK5GEMM4SYRK3

MARC symposium 49

• Non-cacheable vs. Cacheable Shared-Memory– Non-cacheable• Allow for a simple programming interface• Poor performance

– Cacheable• Need software managed cache coherency mechanism• Execute on data stored in cache

• Interleave distributed and shared-memory programming concepts

November 9, 2010

MARC symposium 50

November 9, 2010

MARC symposium 56

Outline

November 9, 2010

MARC symposium 57

Conclusion

• Distributed vs. Shared-Memory– Elemental vs. SuperMatrix?

• A Collective Communication Library for SCC– RCCE_comm: released under LGPL and available

on the public Intel SCC software repositoryhttp://marcbug.scc-dc.com/svn/repository/trunk/rcce_applications/UT/RCCE_comm/

November 9, 2010

MARC symposium 58

Acknowledgments

• We thank the other members of the FLAME team for their support– Bryan Marker, Jack Poulson, and Robert van de Geijn

• We thank Intel for access to SCC and their help– Timothy G. Mattson and Rob F. Van Der Wijngaart

• Funding– Intel Corporation– National Science Foundation

November 9, 2010

MARC symposium 59

Conclusion

November 9, 2010

• More Informationhttp://www.cs.utexas.edu/~flame

• Questions?echan@cs.utexas.edu

Ernie Chan

Documents

Transcript of Ernie Chan

Tiger, Ernie & Christopher, CNBC

Chan - Chan

The Sierra Help Pages - Keeping the classics alive on ...sierrahelp.com/Documents/Manuals/Conquests_of_the_Longbow_-_… · Gerald Moore, Ernie Chan, Eric Kasner, Deanna Yhalkee,

CHAN CHAN: BIOGRAPHY€¦ · CHAN CHAN: BIOGRAPHY Born in Yangon in 1987, Chan Chan (birth name Chan Mya Nyein) is the daughter of U Maung Maung San and Daw Aye Aye. Chan Chan is

Culturas Chimú - Chan Chan

Ernie Barnes Day 7

Acknowledgement - fuhong.org · Chan Kam Yee Chan King Tong & Chan Hou Ping, Doris Chan Kit Wing Chan Lai Ha Chan Lai Wan Chan Mei Kuen Chan Mei To Chan Miu Sheung Chan Muk Wa Chan

Ernie The Electronic Organist

Ernie Barnes By: Joel Mukendi. Ernie Barnes Ernie Barnes, was an American Neo- Mannerist artist born in Durham, North Carolina, on Friday July 15, 1938,

Enhancing Statistical Significance of Backtests Statistical Significance of Backtests.pdf · Enhancing Statistical Significance of Backtests Ernie Chan, Ph.D. QTS Capital Management,

Ernie Ball Redesign Booklet

Ernie Harwell

Ernie Franke eafranke@tampabay.rr

Ernie TX Design Sheet

Order_0.pdf · No Change Result Awaited 3 marks increased from 26 to 29 No Chan No Chan No Chan No Chan No Chan No Chan No Chan No Chan No Chan 10 marks Increased from 35 to 45 ...

Profile: Ernie Gruner

Design of Scalable - UJI · Design of Scalable Dense Linear Algebra Libraries ... Gregorio Quintana-Ort´ı Enrique S. Quintana-Ort´ı Ernie Chan Robert A. van de Geijn Field G.

Breakfast with Eric and Ernie

Ernie Chan

1983 Ernie P. Wiggers - TDL