Where  you  are the controller
Krishna Kumar,  Sr. Developer Evangelist - Academic [email_address]
Started as a  $30,000 prototype Vision:  Shift the world from thinking “We need to understand technology”   to  " Technology needs to understand  us "
Option A: Why  Kinect ?
Why  Kinect ? Option You:
What is  Kinect ?
What is  Kinect ? An extraordinary new way to play,  where you are the controller Voice Recognition Face Recognition You  Recognition Gesture Recognition “ Xbox”
Kinect  knows what to do! “ Xbox?!” “ Let’s Play!”
“ What are those things?” ① ③ ②
“ What are those things?” 3D Depth Sensors ① ③
Projected Invisible IR pattern
Depth Computation
Depth Map
“ What are those things?” RGB Camera ②
“ What are those things?” Multi-array Microphone
“ What are those things?” Motorized Tilt
Combination of RGB camera, depth sensor and multi-array microphone RBG camera delivers three basic color components Depth sensors “sees” the room in 3-D Microphone locates voices by sound and extracts ambient noise Software makes all the magic possible Skeletal Tracking Face, Gesture Recognition Audio Echo cancellation Audio Beam Forming Speech Recognition
 
Scope of Microsoft Research Significant Investment Investing > $9B in R&D (MSR & product dev) Staff of over 850 in 55 research areas International Research lab locations :  Redmond, Washington (Sept, 1991) San Francisco, California (1995) Cambridge, United Kingdom (July, 1997) Beijing, People’s Republic of China (Nov, 1998) Mountain View, California (July, 2001) Bangalore, India (January, 2005) Cambridge, Massachusetts (February, 2008) Turning ideas into reality. research.microsoft.com
Scope of Microsoft Research Research Areas research.microsoft.com
How does  Kinect  know what I do? “ Xbox?!” “ Let’s Play!”
Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi,  TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation.  European Conference on Computer Vision, 2006
Microsoft Research: Human Body Tracking Wide range of motion But limited agility And not real-time Infinite number of movements R Navaratnam, A Fitzgibbon, R Cipolla  The Joint Manifold Model for  Semi-supervised Multi-valued Regression IEEE Intl Conf on Computer Vision, 2007
XBox calls MSR: September 2008 “ We need a body tracker with All body motions… All agilities… 10x Real-time… For multiple players… …  and it has to be 3D   ” MSR’s response?
Teach the Computer/Machine Learning Step 1: Collect A LOT of Data Teams visit households across the globe, filming real users Hollywood motion capture studio generates billions of CG images
Training Data
Training Millions of training images -> millions of classifier parameters Very far from “embarrassingly parallel” New algorithm for distributed decision-tree training Major use of DryadLINQ available for download Distributed Data-Parallel Computing Using a High-Level Programming Language M Isard, Y Yu International Conference on Management of Data (SIGMOD), July 2009
Recognize Joint Angles Classify each pixel’s probability of being each of 32 body parts Determine probabilistic cluster of body configurations consistent with those parts Present the most probable to the user t=1 t=2 t=3
Programmers View
Programmers View
A Platform is Born
Consumer Technologies  Push The Envelope Price: $6000 Price: $150
Play Space  Field of View and Operational Area Play Space : Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft Player Position : Ideally is 6-10 feet away from camera
Lighting and Environment  Fluorescent or LED lighting are recommended No direct light on player No direct light into sensor lens In a stage environment, all lights need to be  Infrared-filtered To avoid lighting noise do not intersect sensor lens  fields of view Avoid playing in/next to reflective surfaces
Clothing Considerations Avoid anything that conceals your arms or legs Avoid wearing flowing clothing such as scarves or long dresses and skirts Long skirts hide the legs and scarves are often mistaken for arms Avoid baggy jackets or overly baggy clothing Generally, anything that hides the human form should be removed for optimal game play If players with long hair are having difficulty playing, encourage them to pull their hair back and try playing again
Kinect  with more than just games  Use  your  voice or a wave of  your  hand to: Video Kinect   with others* Manage your media gallery Music with Last.fm* HD movies with Zune Get in the game with ESPN* * with Xbox LIVE Gold membership
XBOX LIVE More Ways to Connect with Family and Friends VIDEO KINECT FAMILY CENTER SOCIAL NETWORKS Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan. Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location Ensure safe, secure fun for the whole family Connect with friends, share photos and updates through Facebook and Twitter
 
 
 
 
 
ESPN Home-field advantage in your living room Access over 3,500 live global events from ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com  Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia. See what the Xbox LIVE community is watching and declare what team you’re rooting for With Kinect™ control the action right from your couch with just your voice or the wave of your hand Featured Content: NCAA Football, NCAA Basketball, College Bowl Games, NBA, MLB, Soccer, Golf and Tennis majors
 
 
 
 
Where can  Kinect  go? Air Guitar Hero? Shopping in 3D? Remote Replacement? Dance Instructor? Education? Personal Trainer? Physical Therapy? “ Xbox?”
 
 
The Kinect SDK Provides both Unmanaged and Managed API Unmanaged API – Concepts work in C++ Managed API – Concepts work in both VB/C# Samples & documentation to get you started Assumes some programming experience http://research.microsoft.com/kinectsdk/
The Kinect Sensor A hybrid device containing the following input devices: A color (RGB) camera A depth sensor A microphone array A tilt sensor Play space control is done through a tilt motor Pitch +/- 27 degrees
RGB CAMERA MULTI-ARRAY MIC MOTORIZED TILT 3D DEPTH SENSORS
Kinect USB cable
The Innards
The Vision System IR laser  projector IR  camera RGB  camera
Kinect video output 30 HZ frame rate; 57deg field-of-view 8-bit VGA RGB 640 x 480 12-bit monochrome 320 x 240
The Audio System
Demo: Multichannel Echo Cancellation Input Stream (What the mic array hears) Post-MEC (What APIs present) MEC
The Kinect SDK Provides access to: RGB feed Depth feed Skeletal Tracking capabilities Audio Beam data Speech Recognition
Data Streams Color stream at 640x480 resolution; 32BPP Depth stream at 320 x 240 resolution; 16BPP Skeletal Joint positions Frame #s, TimeStamps, Tilt sensor data Echo-canceled audio Higher level systems Speech recognition
RGB Camera Fundamentals
Camera Data
RGB stream Format Upto 640 x 480 resolution Upto 32 bits per pixel  Data contained in ImageFrame.Image.Bits Array of bytes  public   byte [] Bits; Array Starts at top left of image Moves left to right, then top to bottom
Stride Stride   - # of bytes from one row of pixels in memory to the next
Demos::RGB Camera
Depth Camera Fundamentals
Camera Data
Depth Map Format 320 x 240 resolution 16 bits per pixel Upper 13 bits: depth in mm: 800 mm to 4000 mm range Lower 3 bits: segmentation mask Depth value 0 means unknown Shadows, low reflectivity, and high reflectivity among the few reasons Segmentation index 0 – no player 1 – skeleton 0 2 – skeleton 1 …
Depth Byte Buffer ImageFrame.Image.Bits Array of bytes  public   byte [] Bits; Array Starts at top left of image Moves left to right, then top to bottom Represents distance for pixel
Calculating Distance 2 bytes per pixel (16 bits) Depth – Distance per pixel Bitshift  second byte by 8 Distance (0,0) =  ( int )(Bits[0] | Bits[1]  << 8 ); DepthAndPlayer Index – Includes Player index Bitshift by  3 first byte  (player index),  5 second byte Distance (0,0) = ( int )(Bits[0]  >> 3  | Bits[1]  << 5 );
Demos::Depth Camera
Skeletal Tracking Fundamentals
Human Depth Sensing Object pattern similarity determines disparity
Kinect Depth Sensing IR pattern similarity determines disparity IR Projector IR Camera
Provided Data
Pipeline Architecture Title Space
Skeleton API
Joints  Maximum two players tracked at once Six player proposals Each player with set of <x, y, z> joints in meters Each joint has associated state Tracked, Not tracked, or Inferred Inferred - Occluded, clipped, or low confidence joints Not Tracked - Rare, but your code must check for this state
Provided Data Depth and segmentation map
Depth Map Format 320 x 240 resolution 16 bits per pixel Upper 13 bits: depth in mm: 800 mm to 4000 mm range Lower 3 bits: segmentation mask Depth value 0 means unknown Shadows, low reflectivity, and high reflectivity among the few reasons Segmentation index 0 – no player 1 – skeleton 0 2 – skeleton 1 …
Demos::Skeletal Tracking
Audio Fundamentals
Going Inside the Kinect Four microphone array with hardware-based audio processing Multichannel echo cancellation (MEC) Sound position tracking Other digital signal processing (noise suppression and reduction)
Audio Data
Speech Recognition Grammar – What we are listening for Code – GrammarBuilder, Choices Speech Recognition Grammar Specification (SRGS) C:\Program Files (x86)\Microsoft Speech Platform SDK\Samples\Sample Grammars\ Note: Set AutomaticGainControl = false
Grammar <!-- Confirmation_YesNo._value: string [&quot;Yes&quot;, &quot;No&quot;] --> < rule   id =&quot;Confirmation_YesNo&quot;   scope =&quot;public&quot;> < example >  yes  </ example > < example >  no  </ example > < one-of > < item > < ruleref   uri =&quot;#Confirmation_Yes&quot;   /> </ item > < item > < ruleref   uri =&quot;#Confirmation_No&quot;   /> </ item > </ one-of > < tag >  out = rules.latest()  </ tag > </ rule > </ rule > <!-- Confirmation_Yes._value: string [&quot;Yes&quot;] --> < rule   id =&quot;Confirmation_Yes&quot;   scope =&quot;public&quot;> < example >  yes  </ example > < example >  yes please  </ example > < one-of > < item >  yes  </ item > < item >  yeah  </ item > < item >  yep  </ item > < item >  ok  </ item > </ one-of > < item   repeat =&quot;0-1&quot;>  please  </ item > < tag >  out._value = &quot;Yes&quot;; </ tag >
Demos::Audio
[email_address]

More Related Content

PDF
Kinect v1+Processing workshot fabcafe_taipei
PDF
Introduction to Kinect - Update v 1.8
PDF
Human interface guidelines_v1.8.0
PDF
Programming with kinect v2
PPTX
Develop Store Apps with Kinect for Windows v2
PDF
Kinect for Windows SDK - Programming Guide
PDF
How Augment your Reality: Different perspective on the Reality / Virtuality C...
PPTX
Becoming a kinect hacker innovator v2
Kinect v1+Processing workshot fabcafe_taipei
Introduction to Kinect - Update v 1.8
Human interface guidelines_v1.8.0
Programming with kinect v2
Develop Store Apps with Kinect for Windows v2
Kinect for Windows SDK - Programming Guide
How Augment your Reality: Different perspective on the Reality / Virtuality C...
Becoming a kinect hacker innovator v2

What's hot (20)

PDF
3 track kinect@Bicocca - sdk e camere
PPTX
Kinect sensor
PPTX
Kinect
PPTX
Odessa .NET User Group - Kinect v2
PDF
Kinect Hacks for Dummies
PDF
Kinect v2 Introduction and Tutorial
PPTX
Kinect presentation
PPTX
Programming with RealSense using .NET
PPTX
Kinect
PDF
Develop store apps with kinect for windows v2
PDF
2 track kinect@Bicocca - hardware e funzinamento
PPTX
Kinect
PDF
"Time of Flight Sensors: How Do I Choose Them and How Do I Integrate Them?," ...
KEY
Getmoving as3kinect
PPT
Enhanced Computer Vision with Microsoft Kinect Sensor: A Review
PPTX
The power of Kinect in 10 minutes
PPTX
Kinect connect
PDF
Introduction to Kinect v2
PDF
An Approach for Object and Scene Detection for Blind Peoples Using Vocal Vision.
3 track kinect@Bicocca - sdk e camere
Kinect sensor
Kinect
Odessa .NET User Group - Kinect v2
Kinect Hacks for Dummies
Kinect v2 Introduction and Tutorial
Kinect presentation
Programming with RealSense using .NET
Kinect
Develop store apps with kinect for windows v2
2 track kinect@Bicocca - hardware e funzinamento
Kinect
"Time of Flight Sensors: How Do I Choose Them and How Do I Integrate Them?," ...
Getmoving as3kinect
Enhanced Computer Vision with Microsoft Kinect Sensor: A Review
The power of Kinect in 10 minutes
Kinect connect
Introduction to Kinect v2
An Approach for Object and Scene Detection for Blind Peoples Using Vocal Vision.
Ad

Viewers also liked (20)

PPTX
Microsoft Kinect in Healthcare
 
PPTX
Xbox 360 Kinect
PDF
Kinect 2.0 Programming (2)
PDF
ITCamp 2013 - Tim Huckaby - Kinect for Windows - Designing Software for Gestu...
PPTX
Building Applications with the Microsoft Kinect SDK
PDF
5 track kinect@Bicocca - gesture
PDF
Xbox One Kinect
KEY
Present kinect
PPTX
Elderly xbox 360 kinect
PDF
はじめてのKinect for windows v2
PDF
Touchless interactivity is the new frontier
PPTX
Leap motion
PDF
Kinect for windows sdk introduction
PDF
Project Seminar on Leapmotion Technology
DOCX
Xbox technology
PPTX
Xbox 360
PPTX
Xbox 360
PPTX
Touchless Technology
PPTX
Leap motion
PPTX
Xbox 360 powerpoint
Microsoft Kinect in Healthcare
 
Xbox 360 Kinect
Kinect 2.0 Programming (2)
ITCamp 2013 - Tim Huckaby - Kinect for Windows - Designing Software for Gestu...
Building Applications with the Microsoft Kinect SDK
5 track kinect@Bicocca - gesture
Xbox One Kinect
Present kinect
Elderly xbox 360 kinect
はじめてのKinect for windows v2
Touchless interactivity is the new frontier
Leap motion
Kinect for windows sdk introduction
Project Seminar on Leapmotion Technology
Xbox technology
Xbox 360
Xbox 360
Touchless Technology
Leap motion
Xbox 360 powerpoint
Ad

Similar to Kinect krishna kumar-itkan (20)

PPTX
Kinect for Windows SDK Dr David Brown
PPTX
Writing applications using the Microsoft Kinect Sensor
PDF
Jancke kinect programming
PPTX
Visug: Say Hello to my little friend: a session on Kinect
PPTX
Sit microsoft kinect
PPTX
SIT - Microsoft Kinect
PPTX
Kinect2 hands on
PPTX
March.2012.KinectForWindows
PPTX
Exergaming - Technology and beyond
PPTX
Kinect for Windows Quickstart Series
PPTX
Developing For Kinect For Windows
PPTX
Motion Game
PPTX
Kinect for Xbox 360: the world's first viral 3D technology
PPTX
Sensor based interaction
PPTX
Kinectic vision looking deep into depth
PPTX
Lidnug Presentation - Kinect - The How, Were and When of developing with it
PPTX
Microsoft Kinect for Human-Computer Interaction
PPTX
Microsoft Kinect and Kinect SDK
Kinect for Windows SDK Dr David Brown
Writing applications using the Microsoft Kinect Sensor
Jancke kinect programming
Visug: Say Hello to my little friend: a session on Kinect
Sit microsoft kinect
SIT - Microsoft Kinect
Kinect2 hands on
March.2012.KinectForWindows
Exergaming - Technology and beyond
Kinect for Windows Quickstart Series
Developing For Kinect For Windows
Motion Game
Kinect for Xbox 360: the world's first viral 3D technology
Sensor based interaction
Kinectic vision looking deep into depth
Lidnug Presentation - Kinect - The How, Were and When of developing with it
Microsoft Kinect for Human-Computer Interaction
Microsoft Kinect and Kinect SDK

More from Pat Maher (20)

DOCX
Research program flyer fall 2021
PDF
Yuchi spring 2021 volunteer opportunities final version
DOCX
Vrtac qe national needs assessment general-final 4 steering committee_
DOCX
Research program flyer 3
DOCX
Research program flyer 3 techprep92220
DOC
Chris gandy bio
DOC
Gandybiov291416
DOC
Maherprofessional c vbio216
RTF
Free CREW Job Interview Workshop
PDF
RIC October luncheon Invitation
DOC
10 2-2012 business team flyer
DOC
10 2-2012 business team flyer
DOC
10 2-2012 business team flyer
DOC
10 2-2012 business team flyer
PDF
Itka nagenda8912
DOC
Maherwhitepaperon at universaldesign112111v2 0
DOC
Delta Able Network speaker series
PDF
Itkan agenda111011
PPT
Life cycle of deep diversity immersion
DOC
Cme inventorysupport92011
Research program flyer fall 2021
Yuchi spring 2021 volunteer opportunities final version
Vrtac qe national needs assessment general-final 4 steering committee_
Research program flyer 3
Research program flyer 3 techprep92220
Chris gandy bio
Gandybiov291416
Maherprofessional c vbio216
Free CREW Job Interview Workshop
RIC October luncheon Invitation
10 2-2012 business team flyer
10 2-2012 business team flyer
10 2-2012 business team flyer
10 2-2012 business team flyer
Itka nagenda8912
Maherwhitepaperon at universaldesign112111v2 0
Delta Able Network speaker series
Itkan agenda111011
Life cycle of deep diversity immersion
Cme inventorysupport92011

Recently uploaded (20)

PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
The AI Revolution in Customer Service - 2025
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PPTX
Training Program for knowledge in solar cell and solar industry
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PPTX
Microsoft User Copilot Training Slide Deck
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Introduction to MCP and A2A Protocols: Enabling Agent Communication
EIS-Webinar-Regulated-Industries-2025-08.pdf
The AI Revolution in Customer Service - 2025
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Basics of Cloud Computing - Cloud Ecosystem
Training Program for knowledge in solar cell and solar industry
MuleSoft-Compete-Deck for midddleware integrations
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
giants, standing on the shoulders of - by Daniel Stenberg
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
A symptom-driven medical diagnosis support model based on machine learning te...
Electrocardiogram sequences data analytics and classification using unsupervi...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Microsoft User Copilot Training Slide Deck

Kinect krishna kumar-itkan

  • 1. Where you are the controller
  • 2. Krishna Kumar, Sr. Developer Evangelist - Academic [email_address]
  • 3. Started as a $30,000 prototype Vision: Shift the world from thinking “We need to understand technology” to &quot; Technology needs to understand us &quot;
  • 4. Option A: Why Kinect ?
  • 5. Why Kinect ? Option You:
  • 6. What is Kinect ?
  • 7. What is Kinect ? An extraordinary new way to play, where you are the controller Voice Recognition Face Recognition You Recognition Gesture Recognition “ Xbox”
  • 8. Kinect knows what to do! “ Xbox?!” “ Let’s Play!”
  • 9. “ What are those things?” ① ③ ②
  • 10. “ What are those things?” 3D Depth Sensors ① ③
  • 14. “ What are those things?” RGB Camera ②
  • 15. “ What are those things?” Multi-array Microphone
  • 16. “ What are those things?” Motorized Tilt
  • 17. Combination of RGB camera, depth sensor and multi-array microphone RBG camera delivers three basic color components Depth sensors “sees” the room in 3-D Microphone locates voices by sound and extracts ambient noise Software makes all the magic possible Skeletal Tracking Face, Gesture Recognition Audio Echo cancellation Audio Beam Forming Speech Recognition
  • 18.  
  • 19. Scope of Microsoft Research Significant Investment Investing > $9B in R&D (MSR & product dev) Staff of over 850 in 55 research areas International Research lab locations : Redmond, Washington (Sept, 1991) San Francisco, California (1995) Cambridge, United Kingdom (July, 1997) Beijing, People’s Republic of China (Nov, 1998) Mountain View, California (July, 2001) Bangalore, India (January, 2005) Cambridge, Massachusetts (February, 2008) Turning ideas into reality. research.microsoft.com
  • 20. Scope of Microsoft Research Research Areas research.microsoft.com
  • 21. How does Kinect know what I do? “ Xbox?!” “ Let’s Play!”
  • 22. Microsoft Research: Object Recognition J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation. European Conference on Computer Vision, 2006
  • 23. Microsoft Research: Human Body Tracking Wide range of motion But limited agility And not real-time Infinite number of movements R Navaratnam, A Fitzgibbon, R Cipolla The Joint Manifold Model for Semi-supervised Multi-valued Regression IEEE Intl Conf on Computer Vision, 2007
  • 24. XBox calls MSR: September 2008 “ We need a body tracker with All body motions… All agilities… 10x Real-time… For multiple players… … and it has to be 3D  ” MSR’s response?
  • 25. Teach the Computer/Machine Learning Step 1: Collect A LOT of Data Teams visit households across the globe, filming real users Hollywood motion capture studio generates billions of CG images
  • 27. Training Millions of training images -> millions of classifier parameters Very far from “embarrassingly parallel” New algorithm for distributed decision-tree training Major use of DryadLINQ available for download Distributed Data-Parallel Computing Using a High-Level Programming Language M Isard, Y Yu International Conference on Management of Data (SIGMOD), July 2009
  • 28. Recognize Joint Angles Classify each pixel’s probability of being each of 32 body parts Determine probabilistic cluster of body configurations consistent with those parts Present the most probable to the user t=1 t=2 t=3
  • 32. Consumer Technologies Push The Envelope Price: $6000 Price: $150
  • 33. Play Space Field of View and Operational Area Play Space : Ideally need 12ft x 12ft of play space though you can make do with 10ft x 10ft Player Position : Ideally is 6-10 feet away from camera
  • 34. Lighting and Environment Fluorescent or LED lighting are recommended No direct light on player No direct light into sensor lens In a stage environment, all lights need to be Infrared-filtered To avoid lighting noise do not intersect sensor lens fields of view Avoid playing in/next to reflective surfaces
  • 35. Clothing Considerations Avoid anything that conceals your arms or legs Avoid wearing flowing clothing such as scarves or long dresses and skirts Long skirts hide the legs and scarves are often mistaken for arms Avoid baggy jackets or overly baggy clothing Generally, anything that hides the human form should be removed for optimal game play If players with long hair are having difficulty playing, encourage them to pull their hair back and try playing again
  • 36. Kinect with more than just games Use your voice or a wave of your hand to: Video Kinect with others* Manage your media gallery Music with Last.fm* HD movies with Zune Get in the game with ESPN* * with Xbox LIVE Gold membership
  • 37. XBOX LIVE More Ways to Connect with Family and Friends VIDEO KINECT FAMILY CENTER SOCIAL NETWORKS Connect with family and far away friends, all from the comfort of your living room with Xbox LIVE Video Chat Experience the ease and convenience of chat on the big screen with Kinect-enabled auto camera zoom and pan. Family Center makes it easy to manage multiple user accounts and edit privacy settings from a single location Ensure safe, secure fun for the whole family Connect with friends, share photos and updates through Facebook and Twitter
  • 38.  
  • 39.  
  • 40.  
  • 41.  
  • 42.  
  • 43. ESPN Home-field advantage in your living room Access over 3,500 live global events from ESPN3.com, including out-of-market programming plus fresh video clips from ESPN.com Enjoy features like HD programming and on-demand viewing, participate in polls, predictions and trivia. See what the Xbox LIVE community is watching and declare what team you’re rooting for With Kinect™ control the action right from your couch with just your voice or the wave of your hand Featured Content: NCAA Football, NCAA Basketball, College Bowl Games, NBA, MLB, Soccer, Golf and Tennis majors
  • 44.  
  • 45.  
  • 46.  
  • 47.  
  • 48. Where can Kinect go? Air Guitar Hero? Shopping in 3D? Remote Replacement? Dance Instructor? Education? Personal Trainer? Physical Therapy? “ Xbox?”
  • 49.  
  • 50.  
  • 51. The Kinect SDK Provides both Unmanaged and Managed API Unmanaged API – Concepts work in C++ Managed API – Concepts work in both VB/C# Samples & documentation to get you started Assumes some programming experience http://research.microsoft.com/kinectsdk/
  • 52. The Kinect Sensor A hybrid device containing the following input devices: A color (RGB) camera A depth sensor A microphone array A tilt sensor Play space control is done through a tilt motor Pitch +/- 27 degrees
  • 53. RGB CAMERA MULTI-ARRAY MIC MOTORIZED TILT 3D DEPTH SENSORS
  • 56. The Vision System IR laser projector IR camera RGB camera
  • 57. Kinect video output 30 HZ frame rate; 57deg field-of-view 8-bit VGA RGB 640 x 480 12-bit monochrome 320 x 240
  • 59. Demo: Multichannel Echo Cancellation Input Stream (What the mic array hears) Post-MEC (What APIs present) MEC
  • 60. The Kinect SDK Provides access to: RGB feed Depth feed Skeletal Tracking capabilities Audio Beam data Speech Recognition
  • 61. Data Streams Color stream at 640x480 resolution; 32BPP Depth stream at 320 x 240 resolution; 16BPP Skeletal Joint positions Frame #s, TimeStamps, Tilt sensor data Echo-canceled audio Higher level systems Speech recognition
  • 64. RGB stream Format Upto 640 x 480 resolution Upto 32 bits per pixel Data contained in ImageFrame.Image.Bits Array of bytes public byte [] Bits; Array Starts at top left of image Moves left to right, then top to bottom
  • 65. Stride Stride - # of bytes from one row of pixels in memory to the next
  • 69. Depth Map Format 320 x 240 resolution 16 bits per pixel Upper 13 bits: depth in mm: 800 mm to 4000 mm range Lower 3 bits: segmentation mask Depth value 0 means unknown Shadows, low reflectivity, and high reflectivity among the few reasons Segmentation index 0 – no player 1 – skeleton 0 2 – skeleton 1 …
  • 70. Depth Byte Buffer ImageFrame.Image.Bits Array of bytes public byte [] Bits; Array Starts at top left of image Moves left to right, then top to bottom Represents distance for pixel
  • 71. Calculating Distance 2 bytes per pixel (16 bits) Depth – Distance per pixel Bitshift second byte by 8 Distance (0,0) = ( int )(Bits[0] | Bits[1] << 8 ); DepthAndPlayer Index – Includes Player index Bitshift by 3 first byte (player index), 5 second byte Distance (0,0) = ( int )(Bits[0] >> 3 | Bits[1] << 5 );
  • 74. Human Depth Sensing Object pattern similarity determines disparity
  • 75. Kinect Depth Sensing IR pattern similarity determines disparity IR Projector IR Camera
  • 79. Joints Maximum two players tracked at once Six player proposals Each player with set of <x, y, z> joints in meters Each joint has associated state Tracked, Not tracked, or Inferred Inferred - Occluded, clipped, or low confidence joints Not Tracked - Rare, but your code must check for this state
  • 80. Provided Data Depth and segmentation map
  • 81. Depth Map Format 320 x 240 resolution 16 bits per pixel Upper 13 bits: depth in mm: 800 mm to 4000 mm range Lower 3 bits: segmentation mask Depth value 0 means unknown Shadows, low reflectivity, and high reflectivity among the few reasons Segmentation index 0 – no player 1 – skeleton 0 2 – skeleton 1 …
  • 84. Going Inside the Kinect Four microphone array with hardware-based audio processing Multichannel echo cancellation (MEC) Sound position tracking Other digital signal processing (noise suppression and reduction)
  • 86. Speech Recognition Grammar – What we are listening for Code – GrammarBuilder, Choices Speech Recognition Grammar Specification (SRGS) C:\Program Files (x86)\Microsoft Speech Platform SDK\Samples\Sample Grammars\ Note: Set AutomaticGainControl = false
  • 87. Grammar <!-- Confirmation_YesNo._value: string [&quot;Yes&quot;, &quot;No&quot;] --> < rule id =&quot;Confirmation_YesNo&quot; scope =&quot;public&quot;> < example > yes </ example > < example > no </ example > < one-of > < item > < ruleref uri =&quot;#Confirmation_Yes&quot; /> </ item > < item > < ruleref uri =&quot;#Confirmation_No&quot; /> </ item > </ one-of > < tag > out = rules.latest() </ tag > </ rule > </ rule > <!-- Confirmation_Yes._value: string [&quot;Yes&quot;] --> < rule id =&quot;Confirmation_Yes&quot; scope =&quot;public&quot;> < example > yes </ example > < example > yes please </ example > < one-of > < item > yes </ item > < item > yeah </ item > < item > yep </ item > < item > ok </ item > </ one-of > < item repeat =&quot;0-1&quot;> please </ item > < tag > out._value = &quot;Yes&quot;; </ tag >

Editor's Notes

  • #2: I’d like to introduce to you Kinect for Xbox 360 Where YOU are the controller. No gadgets, no gizmos, just you! Kinect brings games and entertainment to life in extraordinary new ways without using a controller. Imagine controlling movies and music with the wave of a hand or the sound of your voice. With Kinect, technology evaporates, letting the natural magic in all of us shine. http://www.xbox.com/en-US/kinect
  • #4: A few inspiration points from the creators of Kinect.
  • #5: So who likes playing video games? Who thinks gaming controllers are really easy to use? How long do you think it would take for you to become an expert at all of these buttons and win games? If you could just turn on the game and play and be pretty good at the game, do you think you’d probably play more video games? The purpose of Kinect is to make XBox more accessible to a broader audience. The Kinect team focused on making XBox so easy to use that anyone could jump in and play and not have to worry about reading any instructions or learning all the different controller buttons and permutations to be great at the game. They wanted to make beginners feel like experts. Kinect is designed so anyone can play, whether they are a kid, an adult, no matter how much gaming experience you have, how old you are -- you can jump in a play right away. Imagine your little brother or sister, or your grandparents trying to play an Xbox game without having to learn which button does what?
  • #6: So, as we said in the last slide, instead of learning all the right buttons to click on the console, make the game understand YOU. That’s Kinect! Make gaming more accessible. Open up gaming to others. Use what you know. Don’t need to learn. But there’s also another unique element to Kinect and that is making gaming more social. Traditionally you would have your hard core gamers sitting alone in front of their game with their console firing away at the next alien, racing away in their own world for hours, etc. With Kinect, gaming is actually bringing people together in a fun, collaborative way, where watching your friends and family play is actually really entertaining. And playing with others using Xbox Live is a very social gaming experience. People are laughing and joining in even if they aren’t playing, so much that they want to get up and play themselves.
  • #7: What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and &amp;quot;connect&amp;quot; meaning it &amp;quot;connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
  • #8: What is Kinect? Let’s start with the name… Where did the name Kinect come from? “kinetic” which means to be in motion, and &amp;quot;connect&amp;quot; meaning it &amp;quot;connects you to the friends and entertainment you love”! Kinect has Voice Recognition Kinect uses four strategically placed microphones within the sensor to recognize and separate your voice from the other noises in the room, so you can control movies and more with your voice. Kinect has Gesture Recognition , through a Motion Sensor Kinect uses a motion sensor that tracks your entire body. So when you play, it’s not only about your hands and wrists. It’s about all of you. Arms, legs, knees, waist, hips and so on. It also includes Skeletal Tracking As you play, Kinect creates a digital skeleton of you based on depth data. So when you move left or right or jump around, the sensor will capture it and put you in the game. Kinect has Facial Recognition Kinect ID remembers who you are by collecting physical data that’s stored in your profile. So when you want to play again, Kinect will know it’s you, making it easy to jump in whenever you want. In a nutshell – YOU Recognition!
  • #9: Build out this slide – - Kinect knows what to do - The camera captures you and your movements, voice, etc. - It’s programmed to analyze images, look for basic human form and identify about 32 essential body parts such as your head, torso, hips, knees, elbows and thighs. - Create your Avatar - You’re ready to play!
  • #10: Let’s have a look at the Kinect Sensor. What are those things on the sensor? There’s a RGB camera, a depth sensor and a multi-array microphone. When you first start up Kinect, it reads the layout of your room and configures the play space you&apos;ll be moving in. Then, Kinect detects and tracks 32 points on each player&apos;s body, mapping them to a digital reproduction of that player&apos;s body shape and skeletal structure, including facial detail. Let’s take a look at each component separately to help you understand how it all works together… [next few slides go into more detail]
  • #11: An infrared projector combined with a monochrome CMOS sensor allows Kinect to see the room in 3-D (as opposed to inferring the room from a 2-D image) under any lighting conditions. Depth is determined by projecting invisible infrared (IR) dots into a room. Let’s see how that might look…(next slide)
  • #12: Source: www.ros.org Depth is recovered by projecting invisible infrared (IR) dots into a room. The way the optical system works, on a hardware level, is fairly basic. A class 1 laser is projected into the room. The sensor is able to detect what&apos;s going on based on what&apos;s reflected back at it. Together, the projector and sensor create a depth map. You can see in this picture the couch is further away from the Kinect sensor than the player’s hand, so the infrared dots on the couch aren’t as bright white as those on the person. This is also very helpful when there are other’s in the room watching the game. The Kinect sensor will use the depth sensors to determine the person sitting on the couch in the distance isn’t playing the game and their movements won’t interfere with the player’s movements. 320×240 depth stream
  • #13: Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html
  • #15: There’s also an RGB Camera. Does anyone know what RGB means? This video camera aids in facial recognition and other detection features by detecting three color components: R ed, G reen and B lue. The &amp;quot;RGB camera&amp;quot; is referring to the color components it detects. It’s similar to the web cam you see on computers and laptops today and it’s used for the sharing memories feature of Kinect which captures pictures while you’re playing! It is also used for Video Kinect which we’ll talk about a little later. What else do you think is part of the Kinect sensor?
  • #16: The sensor also has EARS!! The Multi-array microphone is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. These microphones focus on sound we care about and throw away the noise. When you first plug in Kinect it steps through an accoustic set up. Kinect is bouncing sound and listening to how it sounds to accoustically map your room. There is also a voice recognition component of Kinect. Most voice recognition available today is push to talk. No buttons with Kinect – you can talk to the controller and it recognizes speech!
  • #17: There’s also a motorized tilt. The Kinect sensor will adjust using this motorized tilt so it can recognize all shapes and sizes of players. When you first turn on Kinect, you’ll see the sensor move up and down to find the players.
  • #18: Color VGA video camera - This video camera aids in facial recognition and other detection features by detecting three color components: red, green and blue. Microsoft calls this an &amp;quot;RGB camera&amp;quot; referring to the color components it detects. Depth sensor - An infrared projector and a monochrome CMOS (complimentary metal-oxide semiconductor) sensor work together to &amp;quot;see&amp;quot; the room in 3-D regardless of the lighting conditions. Complementary metal–oxide–semiconductor (CMOS) (pronounced /ˈsiːmɒs/) is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits. CMOS technology is also used for several analog circuits such as image sensors, data converters, and highly integrated transceivers for many types of communication Multi-array microphone - This is an array of four microphones that can isolate the voices of the players from the noise in the room. This allows the player to be a few feet away from the microphone and still use voice controls. What comes in the box Kinect sensor for Xbox 360 Power supply cable User&apos;s manual Wi-Fi extension cable Kinect Adventures game Color VGA Motion Camera 640 x 480 pixel resolution at 30FPS Depth Camera 640 x 480 pixel resolution at 30FPS Array of 4 microphones supporting single speaker voice recognition Put it all together with a VERY IMPORTANT piece that makes it all possible – SOFTWARE!! Kinect&apos;s software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you&apos;ll be moving in. Then, Kinect detects and tracks 32 points on each player&apos;s body, mapping them to a digital reproduction of that player&apos;s body shape and skeletal structure, including facial details. http://electronics.howstuffworks.com/microsoft-kinect3.htm http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal Kinect Software Learns from &amp;quot;Experience&amp;quot; Kinect&apos;s software layer is the essential component to add meaning to what the hardware detects. When you first start up Kinect, it reads the layout of your room and configures the play space you&apos;ll be moving in. Then, Kinect detects and tracks 48 points on each player&apos;s body, mapping them to a digital reproduction of that player&apos;s body shape and skeletal structure, including facial details [source: Rule ]. In an interview with Scientific American, Alex Kipman, Microsoft&apos;s Director of Incubation for Xbox 360 , explains Project Natal&apos;s approach to developing the Kinect software. Kipman explains, &amp;quot;Every single motion of the body is an input,&amp;quot; which creates seemingly endless combinations of actions [source: Kuchinskas ]. Knowing this, developers decided not to program that seemingly endless combination into pre-established actions and reactions in the software. Instead, it would &amp;quot;teach&amp;quot; the system how to react based on how humans learn: by classifying the gestures of people in the real world. To start the teaching process, Kinect developers gathered massive amounts of data from motion-capture in real-life scenarios. Then, they processed that data using a machine-learning algorithm by Jamie Shotton, a researcher at Microsoft Research Cambridge in England. Ultimately, the developers were able to map the data to models representing people of different ages, body types, genders and clothing. With select data, developers were able to teach the system to classify the skeletal movements of each model, emphasizing the joints and distances between those joints. An article in Popular Science describes the four steps Kinect&apos;s &amp;quot;brain&amp;quot; goes through 30 times per second to read and respond to your movements [source: Duffy ]. The Kinect software goes a step further than just detecting and reacting to what it can &amp;quot;see.&amp;quot; Kinect can also distinguish players and their movements even if they&apos;re partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room.
  • #22: http://research.microsoft.com/apps/video/default.aspx?id=139295
  • #23: http://research.microsoft.com/apps/video/default.aspx?id=139295 © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION
  • #24: So, where did this idea of this system where you are the controller come from? Where did the technology behind the system get it’s start? Let me share a bit of background about the technology behind Kinect. Microsoft Research (MSR) did a lot of research back in 2007 on Human Body Tracking. They spent a lot of time and effort and ended up producing this video that you see here. While it seems pretty accurate, it really was quite limited in the range of motion it could track, it wasn’t real time, and couldn’t work with multiple people/players. It was a start, and then some gamers from Xbox gave MSR a call… © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION
  • #25: In 2008 someone from Xbox called Microsoft Research. They saw the published human body tracking work highlighted on the previous slide and they said they needed a computer body tracker for one of their new Xbox Games. They talked about all of the other things they wanted this tracker to be able to do – it needed to track all body motions, it needed to be 10 times faster than real-time, it must support multiple players and it must be 3D. They asked if MSR could help them build it. Well, Microsoft Research said it couldn’t be done.   But the Xbox team had some game programmers that had already been trying to develop a system that could do human body tracking. They sent a video to Microsoft Research of what they had developed and the research team was truly inspired by what they saw. So they teamed up and decided to make this work! Imagine those teams getting together – PHD’s from Microsoft Research meets Xbox gaming developers…those must have been some awkward first meetings!!
  • #26: The first thing they did was collected a lot of data. Xbox sent a team of people to households in about 10 countries where they went into their living rooms and asked them to pretend they were playing on this video. They captured terabytes of information. That gave them data of different sizes of living rooms, backgrounds, different sizes of people. They then went to a Hollywood motion capture studio and asked them to generate billions of computer generated images of humans based on the many different hairstyles, clothing, different poses, lighting, shapes and sizes the team collected across the globe. They took all of this data and used it to teach the computer. See examples of the training data in the next slide. (details highlighted in this article) http://www.popsci.com/gadgets/article/2010-01/exclusive-inside-microsofts-project-natal
  • #27: Here are some examples of the training data (images of different human poses). The idea was this – if they can feed the computer enough data—in this case, millions of images of people—it can learn for itself how to understand it. That saves programmers the near-impossible task of coding rules that describe all the zillions of possible movements a body can make.
  • #28: http://research.microsoft.com/en-us/projects/DryadLINQ/ DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters. So, the painstaking task of the Xbox team (the gathering of pictures of people in many different poses) generated the massive amounts of training data. They ran this data through huge clusters of computers (shown here) where the learning “brain” of Kinect resides to “learn” the many different human body movements.
  • #29: The part of Kinect that the player sees looks like a Webcam, but it’s the software inside that Microsoft casually refers to as “the Brain” that makes sense of the images captured by the camera. It’s programmed to analyze images, look for basic human form and identify about 30 essential body parts such as your head, torso, hips, knees, elbows and thighs. What&apos;s the brain thinking as it watches you jump around, swinging imaginary bats or head-butting imaginary soccer balls? As you stand in front of the camera, it judges the distance to different points on your body. Then the brain guesses which parts of your body are which. So you can see here in this image, the bold colored boxes are the probable guesses that the green square is the players head, the pink and light blue squares are the players hands, etc.
  • #30: Once Kinect has determined it has enough certainty about enough body parts to pick the most probable skeletal structure, it outputs that shape to a simplified 3D avatar (you can see the avatar images on the bottom right) Then it does this all over again—30 times a second! As you move, the Kinect “brain” generates all possible skeletal structures at each frame, eventually deciding on, and outputting, the one that is most probable. This thought process takes just a few milliseconds, so there&apos;s plenty of time for the Xbox to take the info and use it to control the game. Here’s the programmers view of the different images and probabilistic matching going on to eventually give you your Kinect Avatar!
  • #32: The end result = the game platform is born!
  • #34: Before we start playing, let’s see what type of Play Space is recommended for Kinect. Kinect needs to be able to see your entire body. - Clear the area between the sensor and the players. - If there is only one player: Stand back 6 feet (1.8 m). - If there are two players: Stand back 8 feet (2.4 m). - Make sure that the play space is at least 6 feet (1.8 m) wide, and not wider or longer than 12 feet (3.6 m).
  • #35: You’ll also need to be sure that the lighting in the room is good enough to be able to detect the players. Good lighting - Make sure your room has enough light so that your face is clearly visible and evenly lit. Try to minimize side or back lighting, especially from a window. - Illuminate players from the front, as if you were taking a picture of them. - Make sure the room is brightly lit. Poor lighting - Some lighting conditions can make it difficult for Kinect to identify you or track your movements. - For best results, avoid positioning either the players or the sensor in direct sunlight.
  • #36: There are also some clothing considerations to keep in mind. As we learned earlier, the sensor is detecting points on each player’s body. If clothing is hiding any points the body, for example, a skirt may be hiding your knees, then the player may have difficulty playing. [review other bullets above]
  • #37: Kinect with more than just games: With Xbox LIVE, a whole world of extraordinary entertainment experiences awaits, including streaming music, HD movies, live sporting events, Facebook, Twitter, Video chat and more. Use your voice or a wave of your hand to: - Video Kinect with others* - Manage your media gallery - Music with Last.fm* - HD movies with Zune - Get in the game with ESPN*
  • #39: Here’s an example of Video Kinect. Two families: one in LA, one in Dallas talking over Kinect using Video.
  • #40: The families watching a video together.
  • #42: You can also navigate through HD movies with Kinect and Zune.
  • #49: Can you think of other great uses for Kinect?
  • #56: Source: iFixit