Main

15
Design Document: LuteKinduct Sachi A. Williamson Bachelor of Science, Computer Science Laurie Murphy, Faculty Mentor Pacific Lutheran University CSCE 499 - Fall 2012 1

Transcript of Main

Page 1: Main

Design Document: LuteKinduct

Sachi A. WilliamsonBachelor of Science, Computer Science

Laurie Murphy, Faculty Mentor

Pacific Lutheran University

CSCE 499 - Fall 2012

1

Page 2: Main

Contents

1 Introduction 3

2 Research Review 42.1 Functionality of the Kinect Hardware . . . . . . . . . . . . . . . . . . . . . . 42.2 Mathematical Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Modifications from Previous Documents . . . . . . . . . . . . . . . . . . . . 6

3 Design Methodology 73.1 UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Detailed Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Use Case #1: Set up the Kinect sensor . . . . . . . . . . . . . . . . . 93.2.2 Use Case #2: Load a WAV file . . . . . . . . . . . . . . . . . . . . . 93.2.3 Use Case #3: Begin conducting a musical piece . . . . . . . . . . . . 93.2.4 Use Case #4: Change the speed of the musical piece . . . . . . . . . 93.2.5 Use Case #4: Change the volume of the musical piece during playback 103.2.6 Use Case #5: Conduct a musical piece . . . . . . . . . . . . . . . . . 103.2.7 Use Case #6: Stop the musical piece at the finish . . . . . . . . . . . 10

3.3 User Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Work Completed 11

5 Future Work 12

6 Updated Timetable 13

7 References 14

8 Glossary 15

List of Figures

1 The Kinect sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 The coordinate system of the Kinect sensor . . . . . . . . . . . . . . . . . . . 53 The first version of the UML Diagram . . . . . . . . . . . . . . . . . . . . . 84 The first GUI that the user will interact with . . . . . . . . . . . . . . . . . 105 A prototype of loading SoundPlayer and browsing files . . . . . . . . . . . . 116 A Gantt Chart updated as of 12.10.12 . . . . . . . . . . . . . . . . . . . . . 13

2

Page 3: Main

1 Introduction

In a video posted to YouTube just after the first release of the Kinect SDK, Microsoftstated: “We started with a sensor that took voice and movement. We thought, this wouldbe fun to play with. And it was. But something amazing is happening: the world isstarting to imagine things we hadn’t even thought of. Unexpected things. Helpful things.Beautiful things. Inspired things. Which is why, even though the world keeps asking uswhat we’ll do with Kinect next, we’re just as excited to ask the world the same thing.” Theprimary motivation behind this project was mainly because of this reason: Kinect has asignificant potential to enhance diverse fields in unimaginable ways.

This project is designed to allow the user to input a WAV audio file and then “conduct”the musical piece by gesturing with both the right and left hands to control tempo (thespeed of the piece) and volume, respectively. The tentative name of the project,“LuteKinduct,” will be implemented in C# with the Kinect SDK, .NET framework, andthe SoundTouch library (with a C# wrapper class, SoundTouchSharp). The SoundPlayerclass in C# will be used for audio playback, and a Windows Form will be used to build theGUI for the application.

The user will load the WAV file from a GUI, choose the time signature of the piece from adrop-down menu, and then conduct the first few beats of the piece for the application to setthe initial tempo, calculated with beats-per-minute. After the first cycle of beat patterns,the audio file will begin playback with that initial tempo and a default volume. To controlthe tempo of the music, the user should move the right hand faster or slower; to control thevolume of the music, the user should appropriately reach the left arm out from the shoulderand raise or lower it to adjust the volume. Finally, if the user creates a circular motionwith both hands (the sensor only looking for the right hand), the audio file will stop.

The overarching goal of this project is to assist music conducting courses at PLU byproviding immediate feedback to both students and professors with conductingfundamentals, while also supplementing the courses with hands-on experience with a“larger” ensemble from the audio file.

Special thanks to Dr. Michelle Dolan and Mr. Joshua Blake for their assistance with themathematical theories behind gesture-based interaction with the Kinect.

3

Page 4: Main

2 Research Review

2.1 Functionality of the Kinect Hardware

The Kinect sensor itself has been considered relatively ground-breaking technology becauseof its technical complexity and innovation with a very reasonable price tag. There are twosensor types available: the Xbox 360 version, and the Windows version. The former is usedmainly for games on the Xbox 360, while the latter is used by software developers. Bothsensors have the same hardware, although there are rumors that the Windows sensor hasmore advanced software pre-installed that allow more functionality. The version used inthis project is the Xbox 360 sensor; while Microsoft doesn’t encourage using the Xbox 360version, the SDK still allows the sensor to be used. Ideally, the Windows sensor would bepurchased for future uses of the application (since this current sensor is owned for personaluse as well).

The Kinect sensor itself has an infrared projector and a monochrome CMOS(complimentary metal-oxide semiconductor). A color VGA video camera is in between thetwo that detects the the three colors and is referred in documentation as the “RGBcamera.” The two other projectors are used primarily to allow the sensor to capture animage of a room in 3-D, while the RGB camera is used for facial recognition. The infraredprojector and monochrome CMOS are usually grouped together as a depth sensor, and willbe referred to as such throughout the paper.

The video and depth sensors have a 640 x 480-pixel resolution and function at 30 FPS. Therecommended distance of space between the sensor and the user is around 6 feet, althougha recent release allows a seated position of the user. An image of the sensor can be foundbelow.1

Figure 1: The Kinect sensor

1http://channel9.msdn.com/Series/KinectSDKQuickstarts/Understanding-Kinect-Hardware

4

Page 5: Main

When the Kinect sensor is first initialized, it scans the environment and analyzes the playspace. It then detects and tracks 48 points on the user’s body, mapping them to a digitalreproduction of that user’s body shape and skeletal structure. Joints are made into “Joint”structures (vectors), and distances between them are also analyzed. The positions of eachjoint (if active tracking is enabled) can be found efficiently by calling the .Positionmethod.2

Although the color space and depth space are important to the Kinect, the focus of theproject will be with the skeleton coordinate space. Microsoft’s documentation states,“Each frame, the depth image captured is processed by the Kinect runtime into skeletondata. Skeleton data contains 3D position data for human skeletons for up to two peoplewho are visible in the depth sensor. The position of a skeleton and each of the skeletonjoints (if active tracking is enabled) are stored as (x, y, z) coordinates. Unlike depth space,skeleton space coordinates are expressed in meters.” The x, y, and z-axes are the body axesof the depth sensor as shown below.3

Figure 2: The coordinate system of the Kinect sensor

As a right-handed coordinate system, the Kinect sensor is at the origin, and the positivez-axis extends in the direction the sensor is facing (-z direction as user looking at thesensor), the positive x-axis goes to the left and positive y extends upward.

2.2 Mathematical Theories

Nearly all of the theory behind the mathematics of the project have been fleshed out. Asstated in the previous section, the coordinate system for the Kinect is straightforward toretrieve the x, y, and z positions for the hand joints. More complicated is the algorithm forretrieving the coordinates and adjusting the tempo of the piece based on the speed of thegestures.

In the GUI, the user will pick the beat pattern that they are conducting in for the pieceout of options of 4/4, 3/4, and 2/4 (6/8 and 9/8 tentative). The individual options will be

2http://electronics.howstuffworks.com/microsoft-kinect3.htm3http://msdn.microsoft.com/en-us/library/hh973078.aspx

5

Page 6: Main

discussed later. The general concept for the gesture detection is to monitor the position ofthe right hand’s joint’s x or y position over time. This will be achieved with a timer and ananalysis of the respective coordinates within a frames-per-second time span. Next, the lastten x or y values will be kept in a data structure such as a list or queue and the averagevelocity (distance divided by time) of the first five and last five will be calculated in eachframe. A minimum threshold will be set at the beginning of the application for x and yvalues to be discarded when the position exceeds the threshold to avoid jitter. Theapplication will then analyze when the average velocity changes from negative to positiveor vice versa and will put up a flag if, for a few frames, the average velocity has changedfrom its previous state. Positive to negative average velocity of the y position will indicatea downward direction of the hand, negative to positive will be an upward direction;positive to negative velocity of the x position will indicate a right movement of the hand,and negative to positive will indicate a left movement. To begin playback of an audio file,the code will wait for an average velocity change from negative to positive of the y positioncoordinates. In the description of each beat pattern, the beginning upward direction will beassumed. When the hand moves from one direction to another, it is referred to in music asan “ictus.”

When the flag comes up for the ictus event, a part of the code will track the time betweeneach of those events to calculate an average beats-per-minute rate, which will control howthe tempo of the audio file will be adjusted. This is in contrast to the initialbeats-per-minute rate that the user specifies when conducting the first four beats beforethe audio file plays through. After testing, it will be determined if any modifications areneeded to calculate the beats-per-minute rate with ictus events.

4/4: Four beats in a measure. Ictus detector looks for positive to negative average velocityfor the y position (beat one), then negative to positive for the x position (beat two),positive to negative again for the x position (beat three), and finally, negative to positivevelocity for the y position (beat four).

3/4: Three beats in a measure. Ictus detector looks for positive to negative averagevelocity for the y position (beat one), positive to negative for the x position (beat two),and negative to positive velocity for the y position (beat three).

2/4: Two beats in a measure. Ictus detector looks for negative to positive velocity for the xposition (beat one), and positive to negative velocity for the x position (beat two).

2.3 Modifications from Previous Documents

In the Requirements Document, consideration was taken to implement the project usingMIDI audio files for simplicity sake. However, after further research, it seems that WAVfiles will be the most convenient type of audio file to use for processing. C# has aSoundPlayer class in the .NET framework that works to play an audio file using only a fewlines of code. A prototype has been created with a GUI made in Windows Form that opensup a basic GUI that allows the user to browse the computer for the file (OpenFileDialog),

6

Page 7: Main

select the file and SoundPlayer will play back the piece (provided it has been encoded inWAV).

The SoundTouch Audio Processing Library is an open-source library that implements atime-stretching algorithm known as Synchronous-OverLap-Add. The author describes thismethod as “cutting the sound data into shortish sequences between tens to hundredsmilliseconds each, and then joining these sequences back together with a suitable amountof sound data either skipped or partially repeated in between these sequences to achieve ashorter or longer playback time than originally.”4

This is the most suitable library to use for the time-stretching aspect of the projectbecause of its portability, simplicity, and broad capabilities. Since the SoundTouch libraryis written in C++, an appropriate wrapper class, SoundTouchSharp, will be used toimplement the methods in the project.5

3 Design Methodology

3.1 UML Class Diagram

A tentative UML Class Diagram may be found below in Figure (NUMBER). TheConductor class depends on the TempoAdjuster, VolumeAdjuster, and AudioFileclasses.

The Conductor class is used to set up the Kinect and connect the Tempo adjuster, Volumeadjuster, and audio file classes together. The Conductor class also initializes the GUI withfile browsing.

4O. Parvianinan. “Time and pitch scaling in audio processing.” http://www.surina.net/article/

time-and-pitch-scaling.html5http://code.google.com/p/practicesharp/source/browse/trunk/PracticeSharpApp/Core/

SoundTouchSharp.cs?r=157

7

Page 8: Main

Fig

ure

3:T

he

firs

tve

rsio

nof

the

UM

LD

iagr

am

8

Page 9: Main

3.2 Detailed Use Cases

Most of these use cases are replicas of the ones found in the Requirements Document.However, the implementations are better realized and organized.

3.2.1 Use Case #1: Set up the Kinect sensor

The user will be given the option, when the GUI begins, to adjust the tilt angle of theKinect sensor. Upon pressing either the up or down buttons on the GUI, the sensor willadjust the tilt angle. There also will be an option for the user to retrieve a preview of theview from the RGB camera. Lastly, there will be a label that will indicate if there is aconnection issue or loading issue with the sensor; if not, will display ”Ready.”

3.2.2 Use Case #2: Load a WAV file

The user will open up the GUI and click the ”Browse” button to find the specified file onthe computer using OpenFileDialog. Once the file is double-clicked and the ”Import”button is clicked following, the file path will be corrected to link the SoundPlayer in C# tothe file and initialize the audio file. (Prototype available)

3.2.3 Use Case #3: Begin conducting a musical piece

Once the file has been loaded and the user clicks on the ”Ready” button, a window willopen that displays both the RGB viewer and the Skeletal viewer from the Kinect Toolkitto the user. The system will record the initial beats-per-minute rate when the user gesturesthe first beats of the piece. The application will also wait until the y position has anegative-to-positive average velocity a second time to begin playback of the piece at theappropriate speed that the user initially specified.

3.2.4 Use Case #4: Change the speed of the musical piece

More details of the mathematics behind this process may be found earlier in this paper,but the application will essentially look for change in average velocities from negative topositive or vice versa, and then calculate the times between each of those events andaverage those times to find the beats-per-minute. Once that is found, the value iscompared to the initial beats-per-minute (the first set of beats given by the user) andadjusted accordingly.

9

Page 10: Main

3.2.5 Use Case #4: Change the volume of the musical piece duringplayback

During playback, if the user raises his/her left arm over a certain threshold or the y positionis at the same level as the shoulder joint, the application will look for a positive or negativeaverage velocity to adjust the volume, similar to the time-stretching algorithm.

3.2.6 Use Case #5: Conduct a musical piece

The user will wave his/her arms in a particular type of motion identified with a certaintime signature at the beginning of the piece. The sensor will record the positions of theright and left hands and will also track the changes in position and time of the coordinatesmapped in a maximum of 30 frames-per-second.

3.2.7 Use Case #6: Stop the musical piece at the finish

The user will finish the piece by moving both arms in a circular motion and hold theirhands up briefly. The sensor will wait for that gesture and watch the right hand for thecircle and stop the music. Through this implementation, the piece can be stopped anytimeduring playback.

3.3 User Interface Design

The revised user interface design is shown below for the first GUI that the user sees whenstarting the application. A simplified design, the user will be able to adjust the tilt angle ofthe sensor and preview an RGB view from the camera. The upper-right hand will indicateif there is a problem with the Kinect connection to the computer or if the system is readyto start the audio playback. The user will be able to browse through files on the computerand pick a WAV file to load into the application. Finally, the user can specify if tempoadjustment and volume is enabled (default is both disabled). The user will also be able topick the time signature of the piece via a drop-down menu (not shown).

Although not yet available, a window will open up after the user submits the ready buttonthat will show the user the RGB view from the camera, as well as the skeletal viewer and afew buttons that show playback (potentially time left in piece, time signature, etc.)

10

Page 11: Main

Figure 4: The first GUI that the user will interact with

4 Work Completed

Reasonable progress has been made since the last release of a major project document, theRequirements Document. Prototyping of various aspects of the Kinect SDK has takenplace, although a bit discouraging because the resources often provided online and in printare inconsistent of each other due to the fast turn-around of releases to updates of theSDK. In other words, there was certainly some confusion at first because of the syntax ofthe older versions of the SDK and the recently-released version in October.

Prototypes have been made of the general GUI which is actually very simple to use with aWindows Form. Consideration will be very important in the next stages of the project asto whether or not it will be feasible to combine a Windows form (GUI) with the main partsof the project or if it will complicate the internal functionality of the project and the useroutput.

There was uncertainty with the exact implementation of an audio player in C#, but aftersome research, a rudimentary prototype was built that takes a file (using OpenFileDialog)from the GUI, loads it into the SoundPlayer, and the SoundPlayer plays the audio file. Thisis assumed that the audio file is a WAV file type. A screenshot can be found below.

11

Page 12: Main

Figure 5: A prototype of loading SoundPlayer and browsing files

A prototype has also been built with the Kinect SDK using books from Safari Online atthe Pacific Lutheran library. The greatest challenge will be connecting these very distinctpieces of the project together into one cohesive application.

5 Future Work

There has been considerable progress on the development of the project throughout thesemester, and it is now much more apparent that the project is feasible to complete in thespring semester. More work will be done in J-Term (and potentially some over winterbreak).

Task List:

• Continued research with Kinect SDK and Kinect Toolkit (and Explorer

• Research and prototyping of time-stretching libraries (i.e. SoundTouch andSoundTouchSharp

• Coordinate system with joint position prototyping

• Gestures/positions with average velocity

• Main menu and GUIs

• Volume adjustment with SoundPlayer in C#

• Additional items, time permitted (video recording, machine learning

12

Page 13: Main

6U

pdate

dT

imeta

ble

Fig

ure

6:A

Gan

ttC

har

tup

dat

edas

of12

.10.

12

13

Page 14: Main

7 References

J. Ashley, J. Webb, Beginning Kinect Programming with the Microsoft Kinect SDK, 1stEdition, New York: Apress, 2012.

S. Crawford, (2012 December 02). ”How Microsoft Kinect Works.” [Online]. Available:http://electronics.howstuffworks.com/microsoft-kinect2.htm

D. Catuhe, Programming with the Kinect for Windows Software Development Kit,Redmond: Microsoft Press, 2012.

S. Kean, J. Hall, P. Perry, Meet the Kinect: an Introduction to Programming Natural UserInterfaces, New York: Apress, 2011.

J. Liberty, Programming C#, Sebastopol: O’Reilly and Associates, Inc., 2001.

M. Heath, (2012 October 12). “NAudio.” Available: http://naudio.codeplex.com/

Microsoft Corporation. (2012 October 10). Kinect for Windows SDK Documentation.[Online]. Available: http://msdn.microsoft.com/en-us/library/hh855347.aspx

Microsoft Corporation. (2012 October 12). Human Interface Guidelines. [Online].Available: http://www.microsoft.com/en-us/kinectforwindows/develop/learn.aspx

Microsoft Corporation. (2012 December 07). “SoundPlayer Class (System.Media).”[Online]. Available:http://msdn.microsoft.com/en-us/library/system.media.soundplayer.aspx

Microsoft Corporation. (2012 December 06). “Coordinate Spaces.” [Online]. Available:http://msdn.microsoft.com/en-us/library/hh973078.aspx

O. Parvianinen. (2012 October 10). “Sound Touch Audio Processing Library:SoundStretch Audio Processing Utility.” Available:http://www.surina.net/soundtouch/soundstretch.html

14

Page 15: Main

Y. Naveh. (2012 October 11). “PracticeSharp.” Available:http://code.google.com/p/practicesharp/

Additional thanks to the contributors on StackOverflow.

8 Glossary

ictus - : the recurring stress or beat in a rhythmic or metrical series of sounds(Merriam-Webster Dictionary)

tempo - : the rate of speed of a musical piece or passage indicated by one of a series ofdirections (as largo, presto, or allegro) and often by an exact metronome marking; rate ofmotion or activity : pace (Merriam-Webster Dictionary)

time signature - : a sign used in music to indicate meter and usually written as a fractionwith the bottom number indicating the kind of note used as a unit of time and the topnumber indicating the number of units in each measure (Merriam-WebsterDictionary)

15