David Sheffield, Michael Anderson, Kurt Keutzer

GSRC Annual Symposium

September 28, 2010

through October 1, 2010

•  We suspect that the problems commonly solved using computers can be classified as a small number of distinct computations

•  e.g. matrix multiply, sorting, convolution, etc.

•  We are interested in finding the distinguishing characteristics of these computations

•  Automatically detecting these patterns in software would allow us to suggest optimizations or libraries to the programmer. (An intelligent profiler)

•  We could also predict how an arbitrary program would perform on a number of different architectures based on its composition of computations

Motivation

Computational Motifs

Machine-Level Features

Decision Tree Classifier

1. Sort does not use floating-point so very few FP Multiplies is a good indicator

2. Dense Linear Algebra tends to load directly and sequentially from memory

3. This is overfitting slightly. A few outliers in the Dense Linear Algebra training set (Givens Rotations) had many indirect loads

4. This reflects the fact that sparse matrix-vector multiply tends to accumulate results in registers, while structured grid computations write continuously to memory

•  Feature vector: •  Indirect Loads •  Indirect Stores •  Loads •  Stores •  Floating-point add/sub •  Floating-point multiplies •  Floating-point divides •  Integer instructions

•  Assembly Code

load r1, mem[r2] mulsd xmm0, xmm1 addsd xmm2, dmm0 load r3, mem[r1] . . .

•  Arrow indicates an indirect load, common in sparse codes

•  13 Computational Patterns were identified by a group of researchers from UC Berkeley and LLNL in 2006 [Asanovic et al].

•  We try to classify: •  Dense Linear Algebra

•  Sparse Linear Algebra

•  Structured Grid

•  Sort

•  Feature Collection •  Used Intel Software Development Emulator (SDE) •  Picked the top “basic blocks” of assembly instructions •  Extracted features by parsing and simulating the assembly •  Compiled 42 training examples

Next Steps •  We plan on modifying GCC or LLVM to gather feature vectors

• The compiler to insert counters in order to profile interesting events (such as indirect loads)

• Similar in principle to traditional profilers such as gprof

• Consider adding data access patterns, including data structure shape • Train classifier with known examples such as SPEC2006

•  LLNL has a computational pattern benchmark suite too.

• Use RPM package manager to rebuild complete Linux userland with pattern profiling code

• Detect computational patterns in the “wild”

David Sheffield, Michael Anderson, Kurt Keutzer

Documents

Transcript of David Sheffield, Michael Anderson, Kurt Keutzer

arXiv:1802.08241v4 [cs.CV] 2 Dec 2018 · Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Zhewei Yao 1Amir Gholami Qi Lei2 Kurt Keutzer Michael W. Mahoney1

Architecting Parallel Software with Patterns Kurt Keutzer, EECS, Berkeley Tim Mattson, Intel and the PALLAS team: Michael Anderson, Bryan Catanzaro, (Jike.

EECS 244: Introduction to CAD and the Coursekeutzer/classes/244fa2004/pdf/1-1... · EECS 244: Introduction to CAD and the Course Prof. Kurt Keutzer EECS keutzer@eecs.berkeley.edu

A Handlist of the Bonpo Kangyur and Tengyurhimalaya.socanth.cam.ac.uk/collections/journals/ret/pdf/ret_17_04.pdf · A Handlist of the Bonpo Kangyur and Tengyur Kurt Keutzer, University

Sheffield Local Plan (formerly Sheffield Development Framework)

The Content-Based Image Retrieval Project...The Content-Based Image Retrieval Project Bryan Catanzaro and Kurt Keutzer 1 Introduction The Content Based Image Retrieval Project was

Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University of Seoul) Chao-Yue Lai (UC Berkeley) Slav Petrov (Google Research) Kurt Keutzer (UC Berkeley)

1 Lecture 18: Introduction to Multiprocessors Prepared and presented by: Kurt Keutzer with thanks for materials from Kunle Olukotun, Stanford; David Patterson,

Hessian-based Analysis of Large Batch Training and ......Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Zhewei Yao 1 Amir Gholami Qi Lei2 Kurt Keutzer

290T: The Business of Software The Software Industrykeutzer/classes/biz_sw/... · 2003-08-28 · landay 1 1 290T: The Business of Software The Software Industry Professor Kurt Keutzer

Routing in Integrated Circuits A. Kahng K. Keutzer A. R ... · Routing in Integrated Circuits A. Kahng K. Keutzer A. R. Newton. ECE 260B – CSE 241A /UCB EECS 244 2 ... Power & Ground

Copperhead: A Python-like Data Parallel Language & Compiler Bryan Catanzaro, UC Berkeley Michael Garland, NVIDIA Research Kurt Keutzer, UC Berkeley Universal.

290T: The Business of Software The Software Industrypeople.eecs.berkeley.edu/~keutzer/classes/biz_sw/lecture2.pdf · landay 3 Kurt Keutzer 5 SeSoSu’s History of Software • Independent

Technology Independent Multi-Level Logic Optimization Prof ...keutzer/classes/244fa2005/lecture… · K. Keutzer & S. Seshia 14 Tech.-Independent Multi-Level Optimization: Operations

Boundary thickness and robustness in learning modelsBoundary thickness and robustness in learning models Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph

1 Design Patterns for Parallel Programming Research work by: Kurt Keutzer, EECS, UC Berkeley Tim Mattson, Intel Corporation Presented to faculty at Tsinghua.

KURT VONNEGUT, JR KURT VONNEGUT - Kouroo

Kurt cobain's journal. El Diario de Kurt Cobain

Coverage Metrics and Error Models Serdar Tasiran, Kurt Keutzer Department of Electrical Engineering & Computer Sciences University of California, Berkeley.

Routing in Integrated Circuits A. Kahng K. Keutzer A. R ...out to the channel – the route will come to them Courtesy K. Keutzer et al. UCB ECE 260B – CSE 241A /UCB EECS 244 8 Andrew