MATLAB Under the Hood - Mallinckrodt Institute of Radiology€¦ · MATLAB Under the Hood...

45
1 © 2011 The MathWorks, Inc. MATLAB Under the Hood Optimizing Performance and Memory with MATLAB Tim Mathieu Sr. Account Manager Gerardo Hernandez Application Engineer Abhishek Gupta Application Engineer

Transcript of MATLAB Under the Hood - Mallinckrodt Institute of Radiology€¦ · MATLAB Under the Hood...

1 © 2011 The MathWorks, Inc.

MATLAB Under the Hood

Optimizing Performance and Memory with MATLAB

Tim Mathieu – Sr. Account Manager

Gerardo Hernandez – Application Engineer

Abhishek Gupta – Application Engineer

12

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

13

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

Have you ever had an “Out of Memory” error?

14

How much memory is there on 32 bit OS?

4GB addresses, numbered:

– From 0x00000000 (32 bit)

– To 0xFFFFFFFF (32 bit)

OS reserves addresses for itself

Leaves the rest for processes

Independent of system RAM

Virtualization duplicates upper

addresses for each process

Memory Addresses

Operating System

Process

15

Virtualization

Reuses addresses

across multiple processes

Feature of all modern

operating systems

Managed automatically

Disk page file provides

additional data storage

Total Data Storage =

Physical RAM

+

Page file on disk

16

Memory is Often OS-Bound

32-bit systems have 4GB of addressable process

memory

– Part of it is reserved by the OS, leaving the application < 4GB

Windows XP (default): 2GB

Windows XP with /3gb switch: 2GB + 1GB

Linux/UNIX/Mac: ~3GB

64-bit systems allow 8TB of addressable memory

In MATLAB, data needs to be defined in contiguous

memory

17

What is the largest array you can create in

MATLAB on 32 bit Windows XP (bytes)?

a) 0.5 GB

b) 1.0 GB

c) 1.5 GB

d) 2.0 GB

e) 2.5 GB

Memory Addresses

Operating System

Process

18

What is the largest array you can create in

MATLAB on 32 bit Windows XP (bytes)?

Memory Addresses

Operating System

Process

• Windows 2000/XP/Vista

• Reserve 2 GB of addresses

• Configurable with /3gb switch

• Linux and Mac OSX

• Typically 1 GB of addresses

• Some customization possible

• 300 MB for Overhead

• Java Virtual Machine (JVM)

• Libraries

• 1.7 GB for Arrays

• Typically 1.5 GB Contiguous

• Remaining Fragmented

19

Contiguous Memory

Why do we need contiguous memory?

How much contiguous memory is available?

How can contiguous memory be controlled?

20

MATLAB Performance Technologies

Commercial libraries

– BLAS: Basic Linear Algebra

Subroutines (multithreaded)

– LAPACK: Linear Algebra Package

– etc.

JIT/Accelerator

– Improves looping

– Generates on-the-fly multithreaded code

– Continually improving

21

Memory Fragmentation

Checked available memory

– >> memory (only for Windows)

Preallocate Arrays

– Preallocate large matrices first

Controlled contiguous memory with startup switch

– C:\matlab –shield medium

22

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

23

MATLAB Data Storage Model

How does MATLAB store data?

How much overhead is there for arrays,

structures, and cell arrays?

When are data copies made?

24

How does MATLAB store data? Container overhead

d = [1 2] % Double array

dcell = {d} % Cell array containing “d”

dstruct.d = d % Structure containing “d”

whos

>> overhead.m

25

How does MATLAB store data? Container overhead

d Header (60)

Data

d = [1 2] dcell = {d}

dcell Header (60)

Data

Cell Header (60)

dstruct.d = d

dstruct Header (60)

Data

Element Header (60)

Fieldname (64)

26

How does MATLAB store data? Structures

s.A = rand(4000,3200);

s.B = rand(4000,3200);

sNew = s;

s.A(1,1) = 17;

sNew.B(1,1) = 0;

sNew = s;

.A 100MB

.B 100MB

s

sNew

.A 100MB

.B 100MB

Memory Used

100 MB 200 MB 300 MB 400 MB

>> structmem2.m

27

When is data copied? Function calls

function y = foo(x,a,b)

a(1) = a(1) + 12;

y = a * x + b;

When does MATLAB copy memory upon calling a function?

y = foo(1:3,2,4)

– i.e., x = 1:10000, a = 2, b = 4

Answer: “a” is copied

28

When is data copied? In-Place optimizations

When does MATLAB perform calculations “in-place”?

y = 2*x + 3;

x = 2*x + 3;

in-place

NOT in-place

• Output variable name same as input variable name

• Element-wise computation

>> inplaceEx.m

29

In-place Optimizations

What happens during “in-place” operations?

x = rand(5000);

y = rand(5000);

%% NOT In-Place

y = sin(sqrt(2*x.^5+3*x+4));

%% In-Place

x = sin(sqrt(2*x.^5+3*x+4));

30

Summary of Memory Usage in MATLAB

When is data copied? – “Lazy” copy: only when necessary (copy on write)

– Never, if operation can be performed “in-place”

– “in-place” is faster because memory is not copied

How does MATLAB store data? – Every array has overhead

– Structures and cell arrays are containers which can hold multiple arrays

31

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

32

Use Only the Precision You Need

Numerical data types

– Float: double and single precision (8 and 4 bytes)

– Integer: signed and unsigned (1-4 bytes)

Floating point for math (e.g. linear algebra)

Integers where appropriate (e.g. images)

>> datatypeEx.m

>> datatypeEx2_double.m

>> datatypeEx2_single.m

33

Sparse Matrices

Why use sparse?

– Less Memory

Store only the nonzero elements of the matrix

and their indices

– Faster

Reduce computation time by eliminating

operations on zero elements

When to use sparse?

– < 1/2 dense on 64-bit (double precision)

– < 2/3 dense on 32-bit (double precision)

– Sparse matrices often much sparser

than cutoff limit

34

Using Sparse Matrices

Creation

– S = sparse(i,j,s,m,n)

– A = spdiags(B,d,m,n)

Structure and Efficiency

– Different storage convention from that of full matrices

– If matrix highly rectangular, use tall and skinny, not short and fat

Functions that support sparse matrices

– >> help sparfun

Blog Post: Creating Sparse Finite Element Matrices

– http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-

finite-element-matrices-in-matlab/

35

Copy and Create Only What You Need

Share data between functions

– Nested Functions

– Global (but sparingly)

Understand vectorization tradeoffs

– Limit creating intermediate matrices

Use bsxfun

– Reduce size of array to scalars or smaller blocks

Process with for loops, de-vectorize

Slower runtime but less memory

>> bsxfunEx.m

>> vectorizeEx.m

36

Plot Only What You Need

Every plot independently stores x and y data

>> x = rand(125e4,1); %10MB

>> plot(x) ; %20MB for x and y data

Integers are plotted as doubles

Strategies

– Downsample or resample your data prior to plotting

Built-in functions for resampling your data (e.g. interp1)

imresize from the Image Processing Toolbox for images

– Divide your data into regular intervals and plot values of interest

(e.g. open and close for stock prices, or min/max values)

37

Load Only the Data You Need

ASCII file

– textscan(…)

– Selectively choose columns to load or ignore

– Selectively choose rows to load (i.e. block processing)

Binary file

– memmapfile(…)

– Read and write directly to/from file on disk

– Can access files on disk in the same way it accesses dynamic

memory, overlay address space directly onto file

– MATLAB dynamically shifts address space to handle larger files

e.g. >1.5 GB files on 32 bit Windows can be accessed

38

MATLAB is best at batch processing…

Load the entire file and process it all at once

Source

Batch

Processing

Algorithm

Memory

Memory

… but stream processing is

better for some algorithms Load a frame and process it before moving on to the next frame

MATLAB

Stream

Source

Stream

Processing

42

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

43

Example: Block Processing Images

Evaluate function at grid points

Reevaluate function

over larger blocks

Compare the results

Evaluate code performance

>> blockAvg.m

>> blockAvgRedo1.m

>> blockAvgRedo2.m

44

Summary of Example

Used built-in timing functions

>> tic

>> toc

Used M-Lint to find

suboptimal code

Preallocated arrays

Vectorized code

45

Effect of Not Preallocating Memory

>> x = 4

>> x(2) = 7

>> x(3) = 12

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

4 4

4

7

4

7

4

7

12

x(3) = 12 x(2) = 7

46

Benefit of Preallocation

>> x = zeros(3,1)

>> x(1) = 4

>> x(2) = 7

>> x(3) = 12

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0

0

0

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0

0

0

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

4

0

0

4

7

0

4

0

0

7

12

47

Data Storage of MATLAB Arrays

>> x = magic(3)

x =

8 1 6

3 5 7

4 9 2

0x0000

0x0008

0x0010

0x0018

0x0020

0x0028

0x0030

0x0038

0x0040

0x0048

0x0050

0x0058

0x0060

0x0068

See the June 2007 article in “The MathWorks News and Notes”:

http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html

8

3

4

1

5

9

6

7

2

48

Indexing into MATLAB Arrays

Subscripted

– Access elements by rows and columns

Linear

– Access elements with a single number

Logical

– Access elements with logical operations or mask

1 4 7

2 5 8

3 6 9

1,1 1,2 1,3

2,1 2,2 2,3

3,1 3,2 3,3

Linear indexing

Subscripted indexing

ind2sub sub2ind

>> logicalIndex.m

49

MATLAB Performance Technologies

Commercial libraries

– BLAS: Basic Linear Algebra

Subroutines (multithreaded)

– LAPACK: Linear Algebra Package

– etc.

JIT/Accelerator

– Improves looping

– Generates on-the-fly multithreaded code

– Continually improving

50

Other Best Practices for Performance

Minimize dynamically changing path >> addpath(…)

>> fullfile(…)

Use the functional load syntax >> x = load('myvars.mat')

x =

a: 5

b: 'hello'

Minimize changing variable class >> x = 1;

>> x = 'hello';

instead of cd(…)

instead of load('myvars.mat')

51

Summary

Techniques for addressing performance

– Vectorization

– Preallocation

Consider readability and maintainability

– Looping vs. matrix operations

– Subscripted vs. linear vs. logical

– etc.

52

Working with Large Data

Understanding the constraints – RAM, OS, BLAS, LAPACK

Working within MATLAB –data storage and copying

Minimizing memory usage – precision, selective loading and

plotting, stream processing

Speeding Up MATLAB Programs

Leveraging vector and matrix operations

Detecting and addressing bottlenecks

Agenda – MATLAB Under the Hood

53

Example: Fitting Data

Load data from multiple files

Extract a specific test

Fit a spline to the data

Write results to Microsoft Excel

>> testFit.m

>> testFitRedo1.m

>> testFitRedo2.m

>> testFitRedo3.m

54

Summary of Example

Used profiler to analyze code

Targeted significant bottlenecks

Reduced file I/O

Reused figure

55

Interpreting Profiler Results

Focus on top bottleneck

– Total number of function calls

– Time per function call

Functions

– All function calls have overhead

– MATLAB functions often take vectors or matrices as inputs

– Find the right function – performance may vary

Search MATLAB functions (e.g., textscan vs. textread)

Write a custom function (specific/dedicated functions may be faster)

Many shipping functions have viewable source code

56

Classes of Bottlenecks

File I/O

– Disk is slow compared to RAM

– When possible, use load and save commands

Displaying output

– Creating new figures is expensive

– Writing to command window is slow

Computationally intensive

– Use what you’ve learned today

– Trade-off modularization, readability and performance

– Integrate other languages or additional hardware

e.g. emlmex, MEX, GPUs, FPGAs, clusters, etc.

57

Steps for Improving Performance

First focus on getting your code working

Then speed up the code within core MATLAB

Consider additional processing power

59

MathWorks Contact Information

For pricing, licensing, trials and general questions:

Tim Mathieu

Sr. Account Manager

Education Sales Department

Email: [email protected]

Phone: 508.647.7016

Customer Service: [email protected]

508.647.7000

Technical Support: [email protected]

508.647.7000