Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.

21
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.

Dr. Alexandra FedorovaAugust 2007

Introduction to Systems Research at SFU

2CMPT 401 Summer 2007 © A. Fedorova

Introduction

• Systems: software systems, hardware systems, the interaction between them

• New research area at SFU, before December 2006 there were no faculty members at SFU doing systems research (not counting networking)

• Research opportunities at undergraduate and graduate level:– Undergraduate honours thesis– CMPT 415– Paid research assistanships– Master’s and Ph.D.

3CMPT 401 Summer 2007 © A. Fedorova

What is Systems Research?

• System – a collection of software and hardware components that accomplish a certain goal

• Usually this does not include applications, but includes system software:– The operating system– System libraries

• Systems research concerns with building these components and structuring their interaction

4CMPT 401 Summer 2007 © A. Fedorova

Systems Research at SFU

System software design for chip multithreading

processors

Computer Architecture

Distributed Systems

5CMPT 401 Summer 2007 © A. Fedorova

System Software Design for Chip Multithreading Processors

• What is chip multithreading?• Why is this research relevant?• What research problems are we addressing?

6CMPT 401 Summer 2007 © A. Fedorova

Chip Multithreading (CMT)• Conventional processor: one

software thread runs on a chip at a given instant:

Level-1cache

A CHIP

Level-2 cache

• CMT processors: multiple threads runs on the same chip simultaneously:

7CMPT 401 Summer 2007 © A. Fedorova

CMT: The Dominant Architecture

• Most new processors are CMT:– Intel: 100% of new server processors and 90% of high-

performance desktop processors are CMT by the end of 2007• All major hardware vendors are in the CMT business:

– Sun Microsystems Niagara (32 threads on the chip)– IBM Power4, Power5, Power6– Intel Hyper-threaded Xeon (servers, desktops)– Intel Core Duo (desktops and laptops)– Dell Quad core systems (2x Intel Dual-core processors)– AMD Quad core (coming up in Fall 2007)

8CMPT 401 Summer 2007 © A. Fedorova

Why CMT?

• Running one thread per chip is inefficient• Due to nature of modern applications, computational hardware is

underutilized– Modern applications spend 50-60% of their CPU time accessing

memory– While memory is accessed CPU pipeline is stalled – it is idle, not

doing anything useful– But while it is stalled, CPU is still consuming power– So there’s power waste with no benefit

• Idea behind CMT: while one thread stalls the pipeline, let another thread use it– Sort of like overlapping I/O and computation but at the micro

level

9CMPT 401 Summer 2007 © A. Fedorova

CMT: More Efficient CPU Utilization

time

1:add 2:subtract 4:load data from memory3:load data from cache

stall the pipeline2:add1:load data

from memory3:subtract

thre

ad

1

4:add

thre

ad

0

Stall the pipeline

Pipeline is busy

10CMPT 401 Summer 2007 © A. Fedorova

How to Enable CMT?

• How to enable running multiple threads on the same chip? – Hardware multithreading– Multicore processing– Combination of the two

11CMPT 401 Summer 2007 © A. Fedorova

Hardware Multithreading• Run at least two threads on the same

processing core• Some hardware is duplicated, some is

shared• Shared hardware:

– Pipeline: i.e., functional units, register files, queues

– Caches: Level-1 (L1) instruction and data caches, Level-2 (L2) unified cache

– Interconnects• Multithreaded processors:

– Intel Hyper-threaded Xeon– IBM Power5, Power6, Cell– Sun Microsystems Niagara

Level-1cache

A CHIP

Level-2 cache

12CMPT 401 Summer 2007 © A. Fedorova

Multicore Processing

• Multiple processing cores on the same chip

• Threads share the L2 cache (and other lower-level caches), and interconnects

• Multicore processors:– Intel Core Duo– AMD Quad Core– IBM Power4, 5, 6– Sun Microsystems Niagara

L1cache

A CHIP

L1cache

L2 cache

13CMPT 401 Summer 2007 © A. Fedorova

Multicore + Multithreading

• A multicore processor• Each core is multithreaded

• Multicore and multithreaded processors:– Sun Microsystems

Niagara– IBM Power5, Power6

L1cache

A CHIP

L1cache

L2 cache

14CMPT 401 Summer 2007 © A. Fedorova

Research on CMT Processors

• Computer architecture research:– How to design a CMT processor to achieve a good combination of:

CPU utilization, application performance, power efficiency• System software research:

– How to design system software, i.e., the operating system, that enables applications to perform well on these processors?

15CMPT 401 Summer 2007 © A. Fedorova

OS Design for CMT Processors

• Operating systems are traditionally responsible for the allocation of hardware resources

• On CMT processors, on-chip resources are shared among threads that run simultaneously

• How you allocate those resources among threads determines the performance that those threads will achieve

• Let’s look at a few examples…

16CMPT 401 Summer 2007 © A. Fedorova

Constructing Optimal Co-schedules

L1cache

A CHIP

L1cache

L2 cache

• Blue suffers when it does not have enough L1 cache,

• Red uses lots of L1 cache• Green does not use much L1 cache• Yellow does not suffer when it does

not have much L1 cache

17CMPT 401 Summer 2007 © A. Fedorova

Constructing Optimal Co-schedules (cont.)

• How do we find out applications’ cache behaviour?– Turns out you need to consider memory access patterns - this is

not trivial to measure• How do you model interactions among applications?

– How do you know if one application’s cache usage patterns are incompatible with another’s?

• These patterns/relationships cannot be measured directly• Can they be modeled?

– Simple models are inaccurate– Complex models are too inefficient to use inside an operating

system scheduler• Approach of my group: use learning methods, feedback-directed

scheduling

18CMPT 401 Summer 2007 © A. Fedorova

Heterogeneous Multicore Systems

• One size does not fit all– Application class A runs best on

core with feature set X– Application class B runs best on

core with feature set Y• Rather than designing a

homogeneous multicore system that attempts to satisfy everyone but satisfies no one, design a heterogeneous multicore system (HMC)

L1cache

A CHIP

L1cache

L2 cache

19CMPT 401 Summer 2007 © A. Fedorova

Scheduling On HMC Systems

L1cache

Core 1

A CHIP

L1cache

Core 2

L2 cache

Set A: Want to run on Core 1

Set B: Want to run on Core 2

20CMPT 401 Summer 2007 © A. Fedorova

Scheduling On HMC Systems

• If you schedule all threads in Set A on their preferred core, those threads will suffer from:– Low amount of CPU time– High response time

• Because there is high demand for that core, and they’d have to share it with others

• So you might want to schedule threads on their non-preferred core once in a while

• How do you balance between performance, fair CPU allocation and good response time?

21CMPT 401 Summer 2007 © A. Fedorova

Summary

• CMT systems are new and cool, yet prevalent enough for people to care about them

• Companies are desperate to hire students with experience on CMT systems

• If you are thinking about academic career: new and hot research area– Many problems– Many opportunities to publish

• Talk to me if you are interested in research opportunities• Tell your friends who might be interested