Fall-12: Early Adoption of NSF/TCPP PDC...

1
Fall-12: Early Adoption of NSF/TCPP PDC Curriculum Exploring Computer Vision and Image Processing Algorithms in Teaching Parallel Programming Professor Dan Connors- [email protected] Department of Electrical Engineering, University of Colorado Denver Introduction Multicore processors and GPUs (Graphics Processing Units) are universally available to student programmers Need to prepare students for future of parallel programming Substantial programmer burden in developing optimized implementations for current parallel programming languages and architecture models Effective parallel programming requires knowledge of parallel computing principles and advance architecture concepts Requires a staged approach Curriculum Design Text block Computer Vision Algorithms Computer vision includes methods for acquiring, processing, analyzing, and understanding images and, in general, high- dimensional data from the real world in order to produce numerical or symbolic information. There is definite interest in ways that computer vision will impact society such as assisted driving, augmented reality, biometric search, and medical image analysis. Some of the core foundational algorithms related to parallel computing concepts that have OpenCL/CUDA assignments: Scale-invariant feature transform (or SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, and video tracking. Object Detection: Histogram generation and similarity matching Student project extensions of histogram generation: GPU- accelerated thumbnail-based image and video mosaics Edge Detection: Sobel Filter Parallel K-Means Clustering and KNN Classification Glyph Recognition and Matching NVIDIA CUDA GPU SIFT Assignment CUDA programming of SIFT matching explores both parallel programming models and architecture concepts Architecture: DRAM latency, shared memory, bank conflicts, Programming: data-parallel thread execution model, synchronization Summary and Future Work ELEC 1201-1 Intro to Electrical Engineering ELEC 1510-3 Logic Design Digital Foundation 8 hours Digital Core 6 hours Digital Specialty 7 hours ELEC 1520-3 Embedded Systems I ELEC 3651-3 Digital Hardware Design ELEC 2531-1 Logic Laboratory ELEC 2520-3 Embedded Systems II ELEC 4501-3 Microprocessor-Based Design (MBD) ELEC 4521-1 MBD Lab ELEC 4511-3 Hardware-Software Interface Design (HW-SW) ELEC 4561-1 HW-SW Lab ELEC 4723-3 Computer Architecture ELEC 4727-3 Computer Vision Acceleration NSF/TCPP PDC Curriculum Early Adopter Initiative GPUs represent a highlight of the research work being done at the University of Colorado Denver in computer engineering and embedded systems Seek ways to integrate emerging research tools into courses Provide demonstration of research concepts in GPGPU computing that advance the state of the art in multiple disciplines: Mobile and unmanned aerial vehicle computer-vision Acceleration of neurobiology modeling and neuron simulation ELEC 4727 Computer Vision Acceleration with MC/GPU 0 10 20 30 40 50 60 70 80 90 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Seconds Total Keypoints (KP1 + KP2) Performance Comparison of SIFT Matching Algorithms Sequential CUDA The NVIDIA Corporation funded our program with a CUDA Center of Teaching Excellence grant Enabled the resource for an additional Teaching Assistant (TA) for one semester The NSF/TCPP Curriculum awarded an Early Adopter Award for developing core curriculum for CS/CE undergraduates related to parallel and distributed computing (PDC) topics. (2012) Performance for evaluating full SIFT matches with varying keypoint size on two GPUs: NVIDIA GTX280 and NVIDIA GTX480 (respectively with 280 and 480 cores). The difference between the GTX 280 and GTX 480 is clearly distinguished, and concepts in doubling the number of transistors and doubling performance in the architecture model according to Moore’s law can be communicated and discussed. A new special topics (elective) course with focus on hands-on experience and applications of GPU architectures GPU architectures, CUDA programming, and computer vision algorithms with OpenCV Enrollment: 23 undergraduates, 13 graduates in spring 2013 NSF/TCPP PDC curriculum being adopted at University of Colorado Denver Computer vision serves to promote interest in parallel/distributed computing education and computer engineering Critical to provide real-world applications of parallel programming and ways for students to explore on their own new concepts OpenCV provides advantage Integrated Research The primary focus of our approach is to motivate the area of computer engineering by exploring foundational algorithms and their implementation on parallel computing systems in four strategic courses: ELEC 1520 - Embedded Systems I: Intro to C Programming ELEC 2520 - Embedded Systems II: Microcontroller Systems ELEC 4723 - Advanced Computer Architecture ELEC 4727 - Computer Vision Acceleration with GPU & Multicore Processors Goal is to integrate parallel concepts early and consistently throughout curriculum Course modifications Demonstration of new technologies and architectures CUDA/OpenCL Application Programming Interfaces (APIs) Parallel programming examples of core Algorithms Performance comparison for evaluating SIFT matching for a range of keypoint file sizes for both the CPU and GPU models This example helps demonstrate the massively parallel capabilities of the GPU as students gain insight into natural differences in performance models. General Purpose GPU (GPGPU) Computing Arithmetic intensity and application domain scaling Acknowledgements Students Kyle Dunn and Jeff Wiencrot helped develop OpenCV- based computer vision examples and helped build course support for the CUDA computing environment Approach Goals Early Stage Enable students to detect and describe coding and computation cases with inherent parallelism Later Stage Observe students selecting GPU parallel solutions for semester-end projects and senior projects in computationally-intensive scenarios The focus of our approach is to integrate GPU-related programming concepts in distinct phases within multiple courses in the curriculum Adopt real applications/real systems as the learning motivation and use them in teaching related topics Provide project-based experiences and opportunities Motivate Parallelism Code Parallelism Optimize Parallelism Explore Parallelism 1 st Year 2nd Year 3 rd Year 4 th Year Expose students to the concept of high-level concepts of GPU parallelism Relate parallelism to performance Real-world application domains & demonstrations Students deploy API interfaces and GPU templates Understand the scenarios for deploying GPGPU codes Students advance their understanding of GPU model Overcome performance bottlenecks with knowledge of computer organization concepts Students independently leverage GPU systems Investigate open-ended projects with GPU acceleration

Transcript of Fall-12: Early Adoption of NSF/TCPP PDC...

Page 1: Fall-12: Early Adoption of NSF/TCPP PDC Curriculumtcpp.cs.gsu.edu/curriculum/sites/default/files/Fall-12 Early Adpoption of NSD-TCPP...Computer Vision Algorithms • Computer vision

Fall-12: Early Adoption of NSF/TCPP PDC Curriculum Exploring Computer Vision and Image Processing Algorithms in Teaching

Parallel Programming

Professor Dan Connors- [email protected] Department of Electrical Engineering, University of Colorado Denver

Introduction •  Multicore processors and GPUs (Graphics Processing Units)

are universally available to student programmers •  Need to prepare students for future of parallel programming •  Substantial programmer burden in developing optimized

implementations for current parallel programming languages and architecture models

•  Effective parallel programming requires knowledge of parallel computing principles and advance architecture concepts

Requires a staged approach

Curriculum Design

Text block

Computer Vision Algorithms

•  Computer vision includes methods for acquiring, processing, analyzing, and understanding images and, in general, high-dimensional data from the real world in order to produce numerical or symbolic information.

•  There is definite interest in ways that computer vision will impact society such as assisted driving, augmented reality, biometric search, and medical image analysis.

•  Some of the core foundational algorithms related to parallel computing concepts that have OpenCL/CUDA assignments:

•  Scale-invariant feature transform (or SIFT) is an algorithm in computer vision to detect and describe local features in images. The algorithm was published by David Lowe in 1999.

•  Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, and video tracking.

•  Object Detection: Histogram generation and similarity matching

•  Student project extensions of histogram generation: GPU-accelerated thumbnail-based image and video mosaics

•  Edge Detection: Sobel Filter •  Parallel K-Means Clustering and KNN Classification •  Glyph Recognition and Matching

NVIDIA CUDA GPU SIFT Assignment

CUDA programming of SIFT matching explores both parallel programming models and architecture concepts

•  Architecture: DRAM latency, shared memory, bank conflicts, •  Programming: data-parallel thread execution model,

synchronization

Summary and Future Work

ELEC 1201-1 Intro to Electrical Engineering

ELEC 1510-3 Logic Design

Digital Foundation 8 hours

Digital Core 6 hours

Digital Specialty 7 hours

ELEC 1520-3 Embedded Systems I

ELEC 3651-3 Digital Hardware Design

ELEC 2531-1 Logic Laboratory

ELEC 2520-3 Embedded Systems II

ELEC 4501-3 Microprocessor-Based Design (MBD)

ELEC 4521-1 MBD Lab

ELEC 4511-3 Hardware-Software Interface Design (HW-SW)

ELEC 4561-1 HW-SW Lab

ELEC 4723-3 Computer Architecture

ELEC 4727-3 Computer Vision Acceleration

NSF/TCPP PDC Curriculum Early Adopter Initiative

•  GPUs represent a highlight of the research work being done at the University of Colorado Denver in computer engineering and embedded systems

•  Seek ways to integrate emerging research tools into courses •  Provide demonstration of research concepts in GPGPU

computing that advance the state of the art in multiple disciplines:

•  Mobile and unmanned aerial vehicle computer-vision •  Acceleration of neurobiology modeling and neuron

simulation

ELEC 4727 Computer Vision Acceleration with MC/GPU

Here is an example of the expected results for comparing keypoints. This measures the execution time when the keypoint file KP1 and KP2 is compared to

0

10

20

30

40

50

60

70

80

90

2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

Sec

onds

Total Keypoints (KP1 + KP2)

Performance Comparison of SIFT Matching Algorithms

SequentialCUDA

!

!

! !

•  The NVIDIA Corporation funded our

program with a CUDA Center of Teaching Excellence grant

•  Enabled the resource for an additional Teaching Assistant (TA) for one semester

•  The NSF/TCPP Curriculum awarded an Early Adopter Award for developing core curriculum for CS/CE undergraduates related to parallel and distributed computing (PDC) topics. (2012)

•  Performance for evaluating full SIFT matches with varying keypoint size on two GPUs: NVIDIA GTX280 and NVIDIA GTX480 (respectively with 280 and 480 cores).

•  The difference between the GTX 280 and GTX 480 is clearly distinguished, and concepts in doubling the number of transistors and doubling performance in the architecture model according to Moore’s law can be communicated and discussed.

•  A new special topics (elective) course with focus on hands-on experience and applications of GPU architectures

•  GPU archi tectures, CUDA programming, and computer vision algorithms with OpenCV

•  Enrollment: 23 undergraduates, 13 graduates in spring 2013

•  NSF/TCPP PDC curriculum being adopted at University of Colorado Denver

•  Computer vision serves to promote interest in parallel/distributed computing education and computer engineering

•  Critical to provide real-world applications of parallel programming and ways for students to explore on their own new concepts

•  OpenCV provides advantage

Integrated Research

•  The primary focus of our approach is to motivate the area of computer engineering by exploring foundational algorithms and their implementation on parallel computing systems in four strategic courses:

•  ELEC 1520 - Embedded Systems I: Intro to C Programming

•  ELEC 2520 - Embedded Systems II: Microcontroller Systems

•  ELEC 4723 - Advanced Computer Architecture •  ELEC 4727 - Computer Vision Acceleration with GPU &

Multicore Processors •  Goal is to integrate parallel concepts early and consistently

throughout curriculum •  Course modifications

•  Demonstration of new technologies and architectures •  CUDA/OpenCL Application Programming Interfaces (APIs) •  Parallel programming examples of core Algorithms

•  Performance comparison for evaluating SIFT matching for a range of keypoint file sizes for both the CPU and GPU models

•  This example helps demonstrate the massively parallel capabilities of the GPU as students gain insight into natural differences in performance models.

General Purpose GPU (GPGPU) Computing •  Arithmetic intensity and application domain scaling

Acknowledgements

•  Students Kyle Dunn and Jeff Wiencrot helped develop OpenCV-based computer vision examples and helped build course support for the CUDA computing environment

Approach

Goals

•  Early Stage •  Enable students to detect and describe coding and

computation cases with inherent parallelism •  Later Stage

•  Observe students selecting GPU parallel solutions for semester-end projects and senior projects in computationally-intensive scenarios

•  The focus of our approach is to integrate GPU-related

programming concepts in distinct phases within multiple courses in the curriculum

•  Adopt real applications/real systems as the learning motivation and use them in teaching related topics

•  Provide project-based experiences and opportunities

Motivate Parallelism

Code Parallelism

Optimize Parallelism

Explore Parallelism

1st Year

2nd Year

3rd Year

4th Year

•  Expose students to the concept of high-level concepts of GPU parallelism

•  Relate parallelism to performance •  Real-world application domains & demonstrations

•  Students deploy API interfaces and GPU templates •  Understand the scenarios for deploying GPGPU codes

•  Students advance their understanding of GPU model •  Overcome performance bottlenecks with knowledge of

computer organization concepts

•  Students independently leverage GPU systems •  Investigate open-ended projects with GPU acceleration