Download - Kinect krishna kumar-itkan

Where you are the controller

Krishna Kumar, Sr. Developer Evangelist - [email protected]

mailto:[email protected]

Started as a $30,000 prototype

Vision: Shift the world from thinking“We need to understand technology” to "Technology needs to understand us"

Option A:

Why Kinect?

Why Kinect?

Option You:

What is Kinect?

What is Kinect?

An extraordinary new way to play, where you are the controller

Voice Recognition

Face Recognition

You Recognition

Gesture Recognition

“Xbox?!”

Kinect knows what to do!

“Let’s Play!”

①

“What are those things?”

③②


3D Depth Sensors① ③

Projected Invisible IR pattern

11

Depth Computation

Depth Map


RGB Camera②


Multi-array Microphone


Motorized Tilt

Combination of RGB camera, depth sensor and multi-array microphone RBG camera delivers three basic color components Depth sensors “sees” the room in 3-D Microphone locates voices by sound and extracts ambient

noise

Software makes all the magic possible Skeletal Tracking Face, Gesture Recognition Audio Echo cancellation Audio Beam Forming Speech Recognition

19© 2010 Microsoft Corporation. All rights reserved.

Scope of Microsoft Research

• Significant Investment• Investing > $9B in R&D (MSR & product dev)

• Staff of over 850 in 55 research areas

• International Research lab locations : • Redmond, Washington (Sept, 1991)• San Francisco, California (1995)• Cambridge, United Kingdom (July, 1997)• Beijing, People’s Republic of China (Nov, 1998)• Mountain View, California (July, 2001)• Bangalore, India (January, 2005)• Cambridge, Massachusetts (February, 2008)

Turning ideas into reality.

research.microsoft.com

20© 2010 Microsoft Corporation. All rights reserved.

Scope of Microsoft ResearchResearch Areas

research.microsoft.com

“Xbox?!” “Let’s

Play!”

How does Kinect know what I do?

J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006

Microsoft Research: Object Recognition

Microsoft Research: Human Body Tracking Wide range of

motion But limited agility And not real-time Infinite number of

movements

R Navaratnam, A Fitzgibbon, R Cipolla The Joint Manifold Model for Semi-supervised Multi-valued RegressionIEEE Intl Conf on Computer Vision, 2007

XBox calls MSR: September 2008“We need a body tracker with

All body motions…All agilities…10x Real-time…For multiple players…… and it has to be 3D ”

MSR’s response?

Teach the Computer/Machine LearningStep 1: Collect A LOT of Data

Teams visit households across the globe, filming real users

Hollywood motion capture studio generates billions of CG images

Training Data

Training

Millions of training images -> millions of classifier parametersVery far from “embarrassingly

parallel”New algorithm for distributed

decision-tree trainingMajor use of DryadLINQ

available for downloadDistributed Data-Parallel Computing Using a High-Level Programming LanguageM Isard, Y YuInternational Conference on Management of Data (SIGMOD), July 2009

t=1 t=2 t=3

Recognize Joint Angles Classify each pixel’s

probability of being each of 32 body parts

Determine probabilistic cluster of body configurations consistent with those parts

Present the most probable to the user

Programmers View

A Platform is Born

Consumer Technologies Push The Envelope

32

Price: $6000

Price: $150

Play Space

Field of View and Operational Area

• Play Space: Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft

• Player Position: Ideally is 6-10 feet away from camera

Lighting and Environment

• Fluorescent or LED lighting are recommended• No direct light on player• No direct light into sensor lens• In a stage environment, all lights need to be

Infrared-filtered• To avoid lighting noise do not intersect sensor lens

fields of view• Avoid playing in/next to reflective surfaces

Clothing Considerations

• Avoid anything that conceals your arms or legs

• Avoid wearing flowing clothing such as scarves or long dresses and skirts– Long skirts hide the legs and scarves are often

mistaken for arms

• Avoid baggy jackets or overly baggy clothing• Generally, anything that hides the human form

should be removed for optimal game play• If players with long hair are having difficulty

playing, encourage them to pull their hair back and try playing again

Kinect with more than just games Use your voice or a wave of your

hand to:Video Kinect with others*Manage your media gallery

Music with Last.fm*HD movies with Zune

Get in the game with ESPN*

* with Xbox LIVE Gold membership

XBOX LIVEMore Ways to Connect with Family and Friends

VIDEO KINECTVIDEO KINECT FAMILY CENTERFAMILY CENTER SOCIAL NETWORKSSOCIAL NETWORKS

• Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat

• Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan.

• Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat

• Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan.

• Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location

• Ensure safe, secure fun for the whole family

• Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location

• Ensure safe, secure fun for the whole family

• Connect with friends, share photos and updates through Facebook and Twitter

• Connect with friends, share photos and updates through Facebook and Twitter

ESPN Home-field advantage in your living room Access over 3,500 live global events from

ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com

Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia.

See what the Xbox LIVE community is watching and declare what team you’re rooting for

With Kinect™ control the action right from your couch with just your voice or the wave of your hand

Featured Content: NCAA Football, NCAA Basketball, College Bowl Games,

NBA, MLB, Soccer, Golf and Tennis majors

Where can Kinect go?

Air Guitar Hero?Shopping in 3D?Remote Replacement?Dance Instructor?Education?Personal Trainer?Physical Therapy?

“Xbox?”

The Kinect SDK

Provides both Unmanaged and Managed APIUnmanaged API – Concepts work in C++Managed API – Concepts work in both VB/C#

Samples & documentation to get you startedAssumes some programming experiencehttp://research.microsoft.com/kinectsdk/

http://research.microsoft.com/kinectsdk/

The Kinect Sensor

A hybrid device containing the following input devices: A color (RGB) camera A depth sensor A microphone array A tilt sensor

Play space control is done through a tilt motor Pitch +/- 27 degrees

RGB CAMERA

MULTI-ARRAY MIC MOTORIZED TILT

3D DEPTH SENSORS

Kinect USB cable

The Innards

55

The Vision System

IR laser projector

IR camera

RGB camera

Kinect video output30 HZ frame rate; 57deg field-of-view

8-bit VGA RGB640 x 480

12-bit monochrome320 x 240

57

The Audio System

Input Stream(What the mic array hears)

Post-MEC(What APIs present)

MEC

Demo: Multichannel Echo Cancellation

The Kinect SDK

Provides access to:RGB feedDepth feedSkeletal Tracking capabilitiesAudio Beam dataSpeech Recognition

Data Streams• Color stream at 640x480 resolution; 32BPP• Depth stream at 320 x 240 resolution;

16BPP• Skeletal Joint positions• Frame #s, TimeStamps, Tilt sensor data• Echo-canceled audio• Higher level systems– Speech recognition

RGB Camera Fundamentals

Camera Data

RGB stream Format• Upto 640 x 480 resolution• Upto 32 bits per pixel • Data contained in ImageFrame.Image.Bits• Array of bytes public byte[] Bits;• Array– Starts at top left of image– Moves left to right, then top to bottom

Stride

Stride - # of bytes from one row of pixels in memory to the next

Demos::RGB Camera

Depth Camera Fundamentals

Camera Data

Depth Map Format• 320 x 240 resolution• 16 bits per pixel

– Upper 13 bits: depth in mm: 800 mm to 4000 mm range– Lower 3 bits: segmentation mask

• Depth value 0 means unknown– Shadows, low reflectivity, and high reflectivity among the few reasons

• Segmentation index– 0 – no player– 1 – skeleton 0– 2 – skeleton 1– …

Depth Byte Buffer

ImageFrame.Image.BitsArray of bytes public byte[] Bits;Array

Starts at top left of imageMoves left to right, then top to bottomRepresents distance for pixel

Calculating Distance2 bytes per pixel (16 bits)Depth – Distance per pixel

Bitshift second byte by 8 Distance (0,0) = (int)(Bits[0] | Bits[1] << 8);

DepthAndPlayer Index – Includes Player indexBitshift by 3 first byte (player index), 5 second byte Distance (0,0) =(int)(Bits[0] >> 3 | Bits[1] << 5);

Demos::Depth Camera

Skeletal Tracking Fundamentals

Human Depth SensingObject pattern similarity determines disparity

Kinect Depth SensingIR pattern similarity determines disparity

IR Projector

IR Camera

Provided Data

Pipeline Architecture

Title Space

Skeleton API

Joints • Maximum two players tracked at once

– Six player proposals

• Each player with set of <x, y, z> joints in meters• Each joint has associated state

– Tracked, Not tracked, or Inferred

• Inferred - Occluded, clipped, or low confidence joints• Not Tracked - Rare, but your code must check for this state

Provided DataDepth and segmentation map

Depth Map Format• 320 x 240 resolution• 16 bits per pixel

– Upper 13 bits: depth in mm: 800 mm to 4000 mm range– Lower 3 bits: segmentation mask

• Depth value 0 means unknown– Shadows, low reflectivity, and high reflectivity among the few reasons

• Segmentation index– 0 – no player– 1 – skeleton 0– 2 – skeleton 1– …

Demos::Skeletal Tracking

Audio Fundamentals

Going Inside the Kinect• Four microphone array

with hardware-basedaudio processing– Multichannel echo cancellation (MEC)– Sound position tracking– Other digital signal processing (noise

suppression and reduction)

Audio Data

Speech Recognition

Grammar – What we are listening forCode – GrammarBuilder, ChoicesSpeech Recognition Grammar

Specification (SRGS)C:\Program Files (x86)\Microsoft Speech

Platform SDK\Samples\Sample Grammars\

Note: Set AutomaticGainControl = false

Grammar<rule id="Confirmation_YesNo" scope="public"> <example> yes </example> <example> no </example> <one-of> <item> <ruleref uri="#Confirmation_Yes" /> </item> <item> <ruleref uri="#Confirmation_No" /> </item> </one-of> <tag> out = rules.latest() </tag></rule></rule>

<rule id="Confirmation_Yes" scope="public"> <example> yes </example> <example> yes please </example> <one-of> <item> yes </item> <item> yeah </item> <item> yep </item> <item> ok </item> </one-of> <item repeat="0-1"> please </item> <tag> out._value = "Yes";</tag>

Demos::Audio

[email protected]