Lamello: Passive Acoustic Sensing for Tangible Input ...andrewhead/pdf/lamello.pdf · Dan B...

4
Lamello: Passive Acoustic Sensing for Tangible Input Components Valkyrie Savage ?, Andrew Head , Bj ¨ orn Hartmann , Dan B Goldman ? , Gautham Mysore ? , Wilmot Li ? ? Adobe Research, UC Berkeley EECS {valkyrie,andrewhead,bjoern}@eecs.berkeley.edu, {dgoldman,gmysore,wilmotli}@adobe.com ABSTRACT We describe Lamello, an approach for creating tangible in- put components that recognize user interaction via passive acoustic sensing. Lamello employs comb-like structures with varying-length tines at interaction points (e.g., along slider paths). Moving a component generates tine strikes; a real- time audio processing pipeline analyzes the resultant sounds and emits high-level interaction events. Our main contribu- tions are in the co-design of the tine structures, information encoding schemes, and audio analysis. We demonstrate 3D printed Lamello-powered buttons, sliders, and dials. Author Keywords 3D Printing; Sound; Tangible Interaction ACM Classification Keywords H.5.2 User Interfaces: Prototyping INTRODUCTION Tangible input components have advantages over “flat” inter- faces like touch screens. They are critical for eyes-free inter- action and muscle memory, and can improve task speed and precision [9]. Such devices typically comprise integrated as- semblies of electronics, enclosures, and microcontroller code. Recently, researchers have begun to explore acoustically sensing interactions – such as scratching – with digitally- fabricated objects that encode information in surface tex- tures [7, 11]. In this paper, we extend this line of work from sensing surface features to sensing interactions with tangible components that users can push, slide, and turn. Our technique, Lamello, integrates algorithmically-generated tine structures into movable components to create passive tan- gible inputs (Figure 1). Manipulating these inputs creates sounds which can be captured using an inexpensive contact microphone and interpreted using real-time audio signal pro- cessing. Lamello predicts the fundamental frequency of each tine based on its geometry: thus, recognition does not require Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the au- thor/owner(s). CHI 2015, April 18–23, 2015, Seoul, Republic of Korea. ACM 978-1-4503-3145-6/15/04. http://dx.doi.org/10.1145/2702123.2702207 Figure 1. Passive tangible inputs that can be sensed acoustically. training examples. The decoded high-level events can then be forwarded to interactive applications. The name “Lamello” is derived from the Lamellophone family of instruments, which create sound through vibrating tongues of varying lengths. Recognizing mechanically-generated sound for input has im- portant limitations – only movement generates sound, so steady state cannot be sensed – but also appealing character- istics: Components can be fabricated from a single material (e.g., 3D printed ABS plastic), and “wiring” only requires at- taching a microphone. In this paper, we provide design and fabrication guidelines, and demonstrate several components that use the Lamello approach. Our evaluation shows that training-free recognition is possible, though our recognizer only has useful accuracy for a subset of tested tines. RELATED WORK Lamello builds on prior work in interactive device design and acoustic sensing. Recent work proposes a variety of ap- proaches for creating interactive objects via digital fabrica- tion: Printed Optics [16] uses printed lightpipes in optically- sensed input components; Sauron [13] modifies hollow 3D models to enable an embedded camera to sense interactions. These systems generate geometry designed to work with spe- cific recognition techniques. We adopt a similar strategy but focus on acoustic sensing for enabling interactivity. Passive acoustic sensing has been used for detecting mid- air gestures (SoundWave [3]), enabling touch sensing on the forearm and palm (Skinput [6]), distinguishing between dif- ferent objects striking a touch surface (TapSense [5]), and recognizing stroke gestures on surfaces [4]. Active swept- frequency ultrasound sensing, which uses controlled audio output to infer interactions, can detect touch and grasp ges- tures on rigid objects [12]. The most relevant previous sys- tems are Acoustic Barcodes [7] and Stane [11], which encode Tangible Interactions CHI 2015, Crossings, Seoul, Korea 1277

Transcript of Lamello: Passive Acoustic Sensing for Tangible Input ...andrewhead/pdf/lamello.pdf · Dan B...

Lamello: Passive Acoustic Sensingfor Tangible Input Components

Valkyrie Savage?†, Andrew Head†, Bjorn Hartmann†,Dan B Goldman?, Gautham Mysore?, Wilmot Li?

? Adobe Research, † UC Berkeley EECS{valkyrie,andrewhead,bjoern}@eecs.berkeley.edu, {dgoldman,gmysore,wilmotli}@adobe.com

ABSTRACTWe describe Lamello, an approach for creating tangible in-put components that recognize user interaction via passiveacoustic sensing. Lamello employs comb-like structures withvarying-length tines at interaction points (e.g., along sliderpaths). Moving a component generates tine strikes; a real-time audio processing pipeline analyzes the resultant soundsand emits high-level interaction events. Our main contribu-tions are in the co-design of the tine structures, informationencoding schemes, and audio analysis. We demonstrate 3Dprinted Lamello-powered buttons, sliders, and dials.

Author Keywords3D Printing; Sound; Tangible Interaction

ACM Classification KeywordsH.5.2 User Interfaces: Prototyping

INTRODUCTIONTangible input components have advantages over “flat” inter-faces like touch screens. They are critical for eyes-free inter-action and muscle memory, and can improve task speed andprecision [9]. Such devices typically comprise integrated as-semblies of electronics, enclosures, and microcontroller code.

Recently, researchers have begun to explore acousticallysensing interactions – such as scratching – with digitally-fabricated objects that encode information in surface tex-tures [7, 11]. In this paper, we extend this line of work fromsensing surface features to sensing interactions with tangiblecomponents that users can push, slide, and turn.

Our technique, Lamello, integrates algorithmically-generatedtine structures into movable components to create passive tan-gible inputs (Figure 1). Manipulating these inputs createssounds which can be captured using an inexpensive contactmicrophone and interpreted using real-time audio signal pro-cessing. Lamello predicts the fundamental frequency of eachtine based on its geometry: thus, recognition does not require

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for third-party components of this work must behonored. For all other uses, contact the owner/author(s). Copyright is held by the au-thor/owner(s).CHI 2015, April 18–23, 2015, Seoul, Republic of Korea.ACM 978-1-4503-3145-6/15/04.http://dx.doi.org/10.1145/2702123.2702207

buttonslider

tines

tine

dial

Figure 1. Passive tangible inputs that can be sensed acoustically.

training examples. The decoded high-level events can then beforwarded to interactive applications. The name “Lamello” isderived from the Lamellophone family of instruments, whichcreate sound through vibrating tongues of varying lengths.

Recognizing mechanically-generated sound for input has im-portant limitations – only movement generates sound, sosteady state cannot be sensed – but also appealing character-istics: Components can be fabricated from a single material(e.g., 3D printed ABS plastic), and “wiring” only requires at-taching a microphone. In this paper, we provide design andfabrication guidelines, and demonstrate several componentsthat use the Lamello approach. Our evaluation shows thattraining-free recognition is possible, though our recognizeronly has useful accuracy for a subset of tested tines.

RELATED WORKLamello builds on prior work in interactive device designand acoustic sensing. Recent work proposes a variety of ap-proaches for creating interactive objects via digital fabrica-tion: Printed Optics [16] uses printed lightpipes in optically-sensed input components; Sauron [13] modifies hollow 3Dmodels to enable an embedded camera to sense interactions.These systems generate geometry designed to work with spe-cific recognition techniques. We adopt a similar strategy butfocus on acoustic sensing for enabling interactivity.

Passive acoustic sensing has been used for detecting mid-air gestures (SoundWave [3]), enabling touch sensing on theforearm and palm (Skinput [6]), distinguishing between dif-ferent objects striking a touch surface (TapSense [5]), andrecognizing stroke gestures on surfaces [4]. Active swept-frequency ultrasound sensing, which uses controlled audiooutput to infer interactions, can detect touch and grasp ges-tures on rigid objects [12]. The most relevant previous sys-tems are Acoustic Barcodes [7] and Stane [11], which encode

Tangible Interactions CHI 2015, Crossings, Seoul, Korea

1277

Design Fabricate Process Live Audio

Tine model

Input component Detect onset

user slid downto position 17

3D print, attach mic Classify (multiply and sum FFT) Reconstruct

Predict fundamental frequencies

Figure 2. Lamello reuses information between the design, fabrication, and audio processing steps to allow training-free passive acoustic sensing.

information in fabricated objects that emit specific, detectablesounds during interaction. We extend their contributions: Our3D tines encode additional information in their vibration fre-quencies, and we enable a wider variety of interactions thanstroking or scratching. Designing specific vibration frequen-cies for objects has also been explored in procedurally gener-ated metallophones [14]. Other approaches to providing pas-sive tangible input include IR markers [15], capacitive sig-natures [1], and magnetic fields [8]. Audio sensing has thebenefit that it does not require direct contact or line-of-sight.

LAMELLO: AUDIO GENERATION AND PROCESSINGThere are two main challenges in developing a passive acous-tic sensing technique that supports a variety of input controls:designing the physical mechanisms for generating sounds anddeveloping recognition algorithms that can interpret thosesounds in the intended manner (Figure 2).

Generating sounds with tinesTo generate sounds, we embed tine structures in input compo-nents (Figure 1). Our tines are rectangular beams, attached attheir base to the component and free to deflect at their top. In-teracting with a component causes tine plucks; these vibratethe body of the component and are captured by a contact mi-crophone. Tines can be arranged in configurations supportingdifferent interactions (e.g., sliding, rotating, pressing).

Acoustic uniqueness: Different tines should generateunique, distinguishable sounds: we achieve this by system-atically varying tine geometry. We model a vibrating tine asan ideal cantilevered beam of uniform density in free vibra-tion [10]:

f0 =1.8752

√E bh

312

ρ(bh)L4

2π Hz

Fundamental frequency (f0) is governed by several variables:tine breadth (b), height (h), and length (L), as well as materialproperties (density ρ, Young’s ModulusE). Our designs keepb and h constant, varying L to achieve different frequencies.

Our prototypes are 3D printed, resulting in non-uniform ma-terial deposition. To test the applicability of our model,we compared predicted and observed f0 for several tinesprinted on two uPrint SE Plus FDM printers using StratasysABSplus-P430 thermoplastic. We find an appropriate mate-rial parameter by minimizing the error between observationsand measurements. Fitted E values ranged from 9500 to15500 based on print orientation and particular printer. Theremaining error µ = 69.0Hz (σ = 112.5Hz) shows our

Figure 3. We experimented with two different encoding mechanisms forsliders: linearly increasing sequences (left) and de Bruijn (right). deBruijn sequences allow classification of fewer tine lengths, but requiremore consecutive tine recognitions to determine position and direction.

model usefully applies to printed tines. Estimation of f0 canbe further improved by measuring post-print with calipers.

Physical and fabrication constraints: A tine which is toothin can break or hit adjacent tines when struck, reducingrecognition accuracy. A too-thick tine requires greater strik-ing force, interfering with a user’s experience. In our expe-rience, ABS tines need 1:25 < thickness : length < 1:2to have appropriate stiffness for classification. Tine perfor-mance also depends on print orientation: on fused-depositionmodeling machines, a tine is most reliable with its length laidin the printer’s XY plane, as it is thus filled by a continuousextrusion. When printed in Z, tines can break as layers sep-arate. Filleting can mitigate this problem, but may reducethe accuracy of the vibration model in predicting f0. Wefabricated tines with f0 between 400Hz (4mm x 50mm x6mm) and 4000Hz (7.25mm x 6.0mm x 1.2mm). Mini-mum tine size is determined by printer limitations: our print-ers have Z resolution 0.254mm and minimum XY featuresize 1.194mm.

Alternative fabrication techniques: Other printing or fabri-cation processes may not be orientation dependent. We havelaser cut tines from Polyoxymethylene (Delrin) sheets, inte-grating these tine strips into 3D printed components. Tinesizes are similar, as laser cutting caused heat deflection insmaller feature sizes. Smaller tine sizes and higher frequen-cies may be achievable using different fabrication processes,e.g., injection molding or MEMS micromachining.

Encoding information: We use unique f0s to differentiatebuttons and directions on a D-pad. For position sensing, f0can increase across the range of motion (Figure 3 left). Ifmore distinctions are needed than can be reliably recognizedby varying f0, we create de Bruijn patterns [2] (Figure 3right). A de Bruijn sequence D(k, n) is one which, givenan alphabet size k and a subsequence length n, contains eachsubsequence exactly once: we can uniquely infer sequenceposition from n recognitions. This requires fewer f0s, butmore contiguous tine recognitions to determine user input.

Tangible Interactions CHI 2015, Crossings, Seoul, Korea

1278

A

C D

Bstriker striker

strikersstriker

tine

tines

tines

tinesmotion

motion

motion

motion

side

bottom

side

top/sideFigure 4. The Lamello technique can be used to sense a variety of phys-ical motions, including up/down on a button, back/forward on a slider,and rotation on a dial or scroll wheel. Four tines used together can senseup, down, left, and right on a direction pad.

Figure 5. Two typical tine strikes (100ms): high-frequency (left) and low-frequency (right). We mark transients and resonance. Note the higherfrequency has less energy (darker colors) and faster decay (shorter).

Integration of tines into larger componentsWe augmented several traditional input components: buttons,sliders, dials, and joysticks. Each has a “striker” attached tothe user-facing “handle” (Figure 4). These strikers overlapwith tine ends by 0.25− 1mm, balancing clear signal gener-ation with easy interaction. Through testing, we determinedthat a triangular striker profile works best. The button has arib around its shaft that strikes a tine when a user depressesit. The slider has a wiper that overlaps with the tops of tines(tines have different lengths, but are top-aligned). The dialworks similarly, arranged radially rather than linearly. TheD-pad derives from the button: a striker strikes a tine on thebase as the user moves the handle up, down, left, or right.

Audio processing pipelineThe audio signal of a tine strike is characterized by an ini-tial transient—a short high energy sound across a wide rangeof frequencies—followed by free vibration with a local long-decay energy peak at the tine’s resonant frequency (Figure5). Conceptually, our recognizer detects a transient, finds thedominant resonant frequency after the transient passes, andcompares it to predicted tine frequencies.

Our audio processing pipeline, written in Python, uses ba-sic frequency-domain features for classification. We sampleour contact microphone at 16000Hz. Our frames are 2048samples (128ms), and our hop length (offset between succes-sive, overlapping frames) is 800 samples (50ms), for a frameoverlap of 61%. Analyzing a frame takes 5ms, plus addi-tional latency incurred by sound hardware. In addition to thereal-time audio stream, our recognition algorithm also takesan ordered list of (idi, f0i) tuples describing the tine orderingand fundamental frequencies of a component as input.

For each frame, we first determine if a tine strike is present us-ing a standard onset detector (with an empirically-determinedamplitude threshold). Once an onset is detected, we wait 2frame hops for the transient response to pass. We classifythe subsequent frame (computation time: 5ms). Our best-case onset-to-classification latency is therefore 2 ∗ 50ms +5ms = 105ms. In practice, we have seen latency of 107.3ms(σ=9.67ms). Our sound card and driver introduce latency asthey collect and report blocks: one could reduce overheadwith optimized sound drivers and sample block sizes.

To classify, we compute a Fast Fourier Transform on the win-dow, then normalize FFT bin values to represent fractions ofoverall audio energy. For each tine idi, we generate a newmetric: the dot product of the scaled FFT and a Gaussiancentered at the bin for f0i, representing the fraction of audioenergy ei in the neighborhood of f0i. To account for lowerenergy at higher frequencies, we use a scaling factor propor-tional to the frequency and a σ for the Gaussians empiricallydetermined per component, giving an adjusted list of eiadj .

Mapping a recognized tine identity idR = argmax(eiadj)to user actions is straightforward. For buttons and joysticks,idR maps directly to a discrete input (press, up, down, left,or right). Similarly, for dials and sliders that encode positionwith linearly increasing tine lengths, idR maps to a uniqueposition. For dials and sliders that use a de Bruijn sequenceD(k, n), we use each sequence of n recognized tines to de-termine the corresponding position within the sequence.

EVALUATIONTo determine whether Lamello tines can be classified reliablyand in real-time, we performed an accuracy test and built areal-time demonstration using Lamello controls.

We recorded 10 strikes for each tine on a printed dial andslider, and a laser-cut Delrin slider. Printed components wereactuated with their strikers; Delrin tines were hand-plucked todetermine effects of different striking methods. We classifiedeach strike and report classification accuracy in Table 1.

We achieve promising accuracy (precision = 93%, recall =90%) with a four-tine slider (predicted frequencies 924, 1103,1340, and 1662Hz). This suggests that useful interactionwith Lamello is within reach. However, precision and recallrates are much lower on a set of 7 tines with frequencies above2kHz: the recognizer fails to classify higher f0, which havelower energies and shorter decays (Figure 6).

To explore the utility of our technique, we used a Lamelloslider and mouse emulation to control a Pong game. Theachieved latency was sufficient for simple gameplay; how-ever, Lamello may be better suited to provide input for ap-plications such as volume or lighting control, where some la-tency and occasional misclassified events are acceptable.

Control P4tines R4tines P7tines R7tines

FDM Slider 93% 90% 49% 56%FDM Dial 90% 85% 63% 54%Plucked Delrin 98% 97% 72% 73%

Table 1. Recognition precision and recall of our printed input compo-nents using model geometry-predicted frequencies.

Tangible Interactions CHI 2015, Crossings, Seoul, Korea

1279

FDM SliderRe

call

0%25%50%75%

100%

Tine Frequency (Hz)

924 1103 1340 1662 2116 2784 38240%0%

30%

100%100%100%

60%

FDM Dial

Tine Frequency (Hz)

840 1003 1218 1511 1923 2530 34780%20%20%

70%90%90%90%

0% 0% 0%

Figure 6. Per-tine recall for striking Lamello slider and dial tines. Weobserve high recall for low-frequency subsets of tines.

DISCUSSION AND FUTURE WORKInitial experiments with Lamello are encouraging: compo-nents augmented with tines are easy to print and use, and tinesproduce unique, predictable frequencies. However, classifi-cation accuracy still needs improvement, and may require anew approach for f0 > 2kHz. We identified several sourcesof errors to address in future work:

Striker mechanism: Finger-plucked tines having higherrecognition rates than striker-actuated tines suggests thatstriker-created noise contributes to misclassifications.

Signal attenuation: While microphones placed at oppositeends of a printed slider produce similar overall accuracy,tines are more correctly classified by the closer microphone.Though we could not directly correlate microphone distanceand signal RMS energy, this suggests minimizing distance be-tween microphone and tines may improve accuracy.

Resonance and harmonics: Struck tines exhibit an energypeak at the predicted f0, but their frequency spectrum is con-siderably more complex due to harmonics, component reso-nance, and other unmodeled material effects (e.g., the layeredconstruction of 3D prints). Competing with non-fundamentalvibrations is most problematic for short tines, which havelower energy. Future work can also probe optimal frequencydistributions to avoid overlap between tine harmonics.

Beyond recognition rates, the Lamello approach can only de-tect position changes—it cannot sense static configurations,nor can we currently distinguish between the two directionsin which a tine can be struck. In the future, we envision alibrary of parametric models added to larger tangible devices:we plan to leverage existing microphones in smartphones andtablets for sensing. This would open applications beyond pro-totyping (e.g., custom controllers for tablet games), but willrequire more sophisticated signal processing to filter out envi-ronmental noise and to deinterlace multiple input componentsmanipulated simultaneously.

In conclusion, we have demonstrated the use of mechanicallygenerated sound to support interactions beyond strokes [7]and scratches [11]. We hope our work opens a larger space ofLamellophonic input devices.

ACKNOWLEDGEMENTSThis work is supported in part by the NSF under grant DGE1106400 and a Sloan Fellowship.

REFERENCES1. Chan, L., Muller, S., Roudaut, A., and Baudisch, P.

CapStones and ZebraWidgets: Sensing stacks ofbuilding blocks, dials and sliders on capacitive touchscreens. In CHI ’12, 2189–2192.

2. de Bruijn, N. G. A combinatorial problem. InKoninklijke Nederlandse Akademie v. Wetenschappen49, Indagationes Mathematicae (1946), 758–764.

3. Gupta, S., Morris, D., Patel, S., and Tan, D. SoundWave:Using the Doppler effect to sense gestures. In CHI ’12,1911–1914.

4. Harrison, C., and Hudson, S. E. Scratch input: Creatinglarge, inexpensive, unpowered and mobile finger inputsurfaces. In UIST ’08, 205–208.

5. Harrison, C., Schwarz, J., and Hudson, S. E. TapSense:Enhancing finger interaction on touch surfaces. In UIST’11, 627–636.

6. Harrison, C., Tan, D., and Morris, D. Skinput:Appropriating the body as an input surface. In CHI ’10,453–462.

7. Harrison, C., Xiao, R., and Hudson, S. Acousticbarcodes: Passive, durable and inexpensive notchedidentification tags. In UIST ’12, 563–568.

8. Hwang, S., Ahn, M., and Wohn, K.-y. MagGetz:Customizable passive tangible controllers on and aroundconventional mobile devices. In UIST ’13, 411–416.

9. Klemmer, S. R., Hartmann, B., and Takayama, L. Howbodies matter: Five themes for interaction design. In DIS’06, 140–149.

10. Meirovitch, L. Analytical Methods in Vibration. TheMacMillan Company, 1967.

11. Murray-Smith, R., Williamson, J., Hughes, S., andQuaade, T. Stane: Synthesized surfaces for tactile input.In CHI ’08, 1299–1302.

12. Ono, M., Shizuki, B., and Tanaka, J. Touch & Activate:Adding interactivity to existing objects using activeacoustic sensing. In UIST ’13, 31–40.

13. Savage, V., Chang, C., and Hartmann, B. Sauron:Embedded single-camera sensing of printed physicaluser interfaces. In UIST ’13, 447–456.

14. Umetani, N., Mitani, J., and Igarashi, T. Designingcustom-made metallophone with concurrenteigenanalysis. In NIME ’10, 26–30.

15. Weiss, M., Wagner, J., Jansen, Y., Jennings, R.,Khoshabeh, R., Hollan, J. D., and Borchers, J. SLAPWidgets: Bridging the gap between virtual and physicalcontrols on tabletops. In CHI ’09, 481–490.

16. Willis, K. D. D., Brockmeyer, E., Hudson, S., andPoupyrev, I. Printed optics: 3D printing of embeddedoptical elements for interactive devices. In UIST ’12,589–598.

Tangible Interactions CHI 2015, Crossings, Seoul, Korea

1280