Feasibility of the Living Canvasresearch.navisto.ch/pdf/mm2008-naef-lc.pdf · The Living Canvas...

Feasibility of the Living Canvas – Restricting Projection to a Performer on Stage

Martin Naef

Glasgow School of Art Glasgow, UK

+44-141-353-4424

[email protected]

Cathie Boyd

Cryptic Glasgow, UK

+44-141-354-0544

[email protected]

ABSTRACT The Living Canvas initiative aims to use a performer on stage as a dynamic projection surface. Using machine vision in the near-infrared spectrum enables the system to follow and adapt to the performer, restricting projection to the silhouette. Ultimately, the system aims to create the illusion of a completely dynamic costume. This paper introduces the concept and presents an implementation and analysis of the performance-critical stages of the projection pipeline, proving the feasibility of the idea as well as analysing the limitations introduced by current digital projection technology.

Bringing together the research from computer graphics and machine vision with the artistic vision and guidance from Cryptic, the initiative aims to create and explore a new expressive medium by taking projection systems on stage to a highly interactive level and providing a powerful new tool for live video artists.

Categories and Subject Descriptors

I.3.6 [Computer Graphics] Methodology and Techniques – Interaction Techniques Additional

General Terms Algorithms, Measurement, Performance, Experimentation.

Keywords Projection systems, art & design, performance, augmented reality.

1. Introduction and Motivation The Living Canvas initiative aims to explore the novel artistic possibilities of using the performer’s body and clothes as a projection surface in the context of a stage performance. Ultimately a new projection system will enable a dynamic or even improvised performance by detecting the posture and silhouette of the performer and projecting imagery precisely to the selected

parts of body. This will enable the performer to “wear virtual costumes” that adapt to the body, or even receive a different face. The dynamic nature of the system should give full control to the performer who can freely move around on the stage, with the projection always “following” the performer. Figure 1 shows a simulation of the proposed system.

Figure 1 : Simulated result of the Living Canvas system. A virtual costume is projected onto a white garment.

The Living Canvas initiative is split into two parts. The feasibility of the key technical issues was first established during a small-scale study. The second stage will then bring together director, performers, choreographers, designers and computer scientists to further develop the technology based on the artistic requirements and fully explore the new artistic medium. This paper presents the results of the first technical feasibility phase of the Living Canvas initiative. The main objectives were to develop a prototype of the key technology to prove that:

• the silhouette of a performer on stage can be acquired precisely using standard computer vision methods using cameras working in the near-infrared spectrum;

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

MM’08, October 26–31, 2008, Vancouver, British Columbia, Canada.

Copyright 2008 ACM 978-1-60558-303-7/08/10...$5.00.

• current cameras, computers and projectors are fast enough to achieve a total system latency that is small enough to enable the performers to move naturally;

• existing standard calibration and image warping methods are sufficient to enable precise masking of the projected image to

cover the performer fully without light-spill to the background.

The Living Canvas prototype system receives texture data from a standard real-time video processing environment and successfully restricts its projection to the performer on stage. The system is fast and robust enough for a theatrical performance, proving the feasibility of the Living Canvas concept.

The project builds upon previous work to acquire images of people standing inside a spatially immersive display used for virtual reality applications [1, 2]. It reverses the problem in that it does not aim to acquire the geometry or texture of the performer, but instead project onto them.

The remainder of the paper is structured as follows: First, the Living Canvas idea is put into the context of previous multimedia performances and the existing research base for the required technology components is outlined. The following sections present the prototype system, including the physical experimentation stages, hardware components and the software pipeline including latency measurements. The final sections present an analysis of the technology including lighting requirements, precision trade-offs and their impact on the artistic concept, followed by an outlook into future work required for the final deployment in a theatre production.

2. Related Work The use of video projection in contemporary multi-media performance has become a standard means of expression. This ranges from projected static backdrops or moving scenery [3], dynamic lighting design [4] to performances where performers directly interact with the video [5] or video artists working with real-time captures of the performers on stage [6]. Projection onto the performers has been explored previously as part of the overall lighting design, or with the artists acting as projection surfaces [7, 8].

Most previous performances projecting directly onto the performers required the performers to follow a strict choreography. The performer had to assume a defined posture and position to receive the intended projection. The performer could be given more freedom by allowing the projection to spill to the background or floor, but that again restricts the freedom for the stage designer and also limits the projected content to general textures or light effects. The projection system used for the “Jew of Malta” production [9], developed by Art+Com Media and ZKM, proved the feasibility of tracking performers on a large stage and projecting onto them in an opera context. While it distinguished between performers, it was not able to distinguish individual body parts. The cameras used for this (at 25 FPS) also put constraints on the speed of motion. The Living Canvas project aims to push technology significantly further by exploiting the latest camera technology, while at the same time reducing the amount of hardware required.

Computer vision has been used for stage performances in projects such as Sensuous Geographies [10], where cameras track the performers who can then interact with the projection or sound through their motions. Tracking precision and speed are less of an issue in these kinds of installations, as they show the visuals interacting with or reacting to the performer. For the Living Canvas, however, the projection and the performer have to

become one; any visible delay between the visuals and the performer detracts from that unity.

None of these previous projects go as far as detecting the exact posture and silhouette of the performer in a real-time context to adapt the visuals accordingly. Full 3D motion capturing has been used for dance performances, where the exact posture is acquired through a set of markers or sensors attached to the body [11] to control live visuals. These systems, however, do not capture the silhouette of the performer as required for the precise projection onto the performer; they only use motion capture data to drive visualisation systems for the background projection.

2.1 Vision and Acquisition The Living Canvas project builds upon a wealth of computer vision research to solve the problems of image segmentation (to distinguish between the performer and the background), calibration (for the precise correspondence between the camera image and the projector) and, in future work, image analysis (to detect the posture).

In particular, the system builds and improves upon the lessons learned during the implementation of the blue-c tele-presence system [1] and a successive low-cost version based on two pairs of colour/IR cameras for stereoscopic user acquisition [2]. The acquisition problem has also been solved in a similar fashion for the construction of the Lightstage [12], where the background is lit uniformly to reveal the silhouette in the near IR spectrum, while a colour camera acquires the foreground image.

The idea of projection onto people has been popularized by the “Invisibility Cloak” [13], originally developed for tele-presence applications. The system does not restrict projection to the actual silhouette and relies on retro-reflective clothes.

The technology used is closely related to projector-based augmented reality efforts, particularly on issues such as camera and projector calibration. A range of projects have explored projecting onto 3D objects to change their appearance (e.g. [14], [15]), or avoiding shadows caused by people blocking projectors [16]. This project differs from these in its focus on artistic exploitation and particularly in the strict low-latency and lighting aspects required for an actual stage performance.

The main technical contribution of this work lies in the selection and optimisation of the low-level vision algorithms to enable a sufficiently high quality of image processing and analysis while maintaining a frame rate of 200 Hz and in reducing the total latency to the inherent limits of standard hardware. The system development, however, is governed by the requirements of the novel artistic vision and to enable its deployment in an actual stage performance.

3. System Overview The Living Canvas system consists of two major components: the machine vision system to track the performer, and the projection system. Additional components include the video feed from a separate workstation and calibration tools as shown in Figure 2.

3.1 Physical Environment The test environment consists of a regular projection screen used as stage background. The performer stands in front of the screen,

wearing white clothes to receive the dynamic projection. A digital projector with a camera mounted as close to the projector as possible is installed in front of the stage.

The prototype stage for experimentation is set up in a lab environment. A portable, 3 m × 2.25 m front projection screen is used for both stage background projection and as the calibration target surface.

Vision System

Detect Silhouette

Detect Posture

Calibration

Image Generator

Video Workstation

Receive Video

MAX/MSP/Jitter

Warp Silhouette

Mask Video

Stage

Projector (front - actor)

Projector (background)

Acquisition Hardware

Camera (200Hz, IR)

IR Illuminator (75W)

LED Marker

Figure 2 : System overview, including acquisition, video processing, image generation and projection. An additional workstation provides video streams for projection onto the performers and the stage background.

The initial development and additional experiments were conducted in a miniature stage environment on a desk (Figure 3). Experiments were conducted with backgrounds consisting of a dark, fabric-covered office separator as well as a back-projection screen to test concurrent projection to both the performer and the background.

3.2 System Hardware and Environment The prototype system is built completely using off-the-shelf hardware. Components were selected to meet the performance criteria at a price point that makes the system affordable for small to medium scale productions.

Figure 3 : Test environments: a) Small scale stage used for experiments. b) Projector / camera pair with IR illuminator. c) Full scale setup with calibration pattern on back screen.

The acquisition system is based on a Point Grey Dragonfly Express Firewire digital camera with a greyscale sensor. An inexpensive Computar 3.5-10 mm F1.0 vari-focal lens optimized for IR was selected for the camera. The camera system provides a good image quality at 640×480 pixel resolution and a fast frame rate of 200 Hz with relatively low noise levels. An IR filter is attached to the lens to restrict vision to the near-IR spectrum. The machine vision system is therefore undisturbed by other projection systems (e.g. background visuals) or visible lighting.

The scene is lit using a CCTV 75W near-infrared illuminator. Its peak output is centred on a 750 nm wavelength where the camera sensor still offers good sensitivity. Longer wavelengths typical for LED-based light sources would reduce the influence from visible stage lighting at the cost of increased camera noise due to the sensitivity fall-off of the sensor.

A portable Dell 5100MP digital presentation projector is used for projecting onto the performer, while a larger Panasonic projector with wide angle lens illuminates the stage background.

Computing power is provided by a PC with two Intel Xeon 3.6 GHz processors and an nVidia GeForce 8800GTX graphics accelerator board.

The system receives the video stream to be projected from a separate system on the network. This is typically a high-powered laptop computer.

4. Processing Pipeline The image processing for the Living Canvas is implemented in a parallel pipeline with two threads running all machine vision tasks, and a third thread feeding the graphics system for rendering. Additional threads are used to decode video files and for the user interface. Figure 4 and Figure 5 depict the vision pipeline structure and the intermediate images.

Camera

Smoothing

Marker Extraction

Smoothing

Thresholding

Laplace Operator

Find local minima

Segmentation

Bkg. Subtraction

n Erosion Steps

m Dilation Steps

Calibration

Edge detection

Correspondencelookup table

(texture coords.)

(optional)

Figure 4 : The vision processing pipeline.

4.1 Vision Pipeline The vision pipeline processes the camera image to extract the silhouette of the performer on stage. Performance was the main consideration when implementing the vision pipeline. Several of the image processing modules are optimized to great effect using the single-instruction-multiple-data (SIMD) extensions available in modern processors (SSE2 in our case), as they lend themselves

naturally to vectorisation. Additional image processing operations are provided by the Intel OpenCV image processing library [17].

4.1.1 Image Acquisition The source images are captured from the camera using Point Grey’s proprietary camera SDK. The camera is configured to deliver 200 images per second in an 8 bit greyscale format and 640 × 480 pixels resolution. This leaves a 5 ms time slot to complete the image processing. Input buffers are kept at a minimum and images are dropped if the pipeline cannot process them fast enough to avoid introducing any extra latency.

4.1.2 Smoothing The input image is smoothed using a 3×3 Gauss kernel. This is an effective and computationally inexpensive method to reduce noise and other acquisition artefacts.

Additional experiments were conducted using median filters with different spans. While such filters provide a higher image quality, they are significantly more computationally expensive. A 3×3 median filter (OpenCV) took 6.5 ms for a single frame, whereas a 3×3 Gauss was measured at 1.18 ms for the OpenCV version, and 0.19 ms for the SSE2 optimized implementation. As the Gauss filter proved to be good enough, there were no further experiments conducted with the more advanced filters.

4.1.3 Segmentation Background segmentation is based on a basic background subtraction method (as summarised in [18]). During a learning phase over 100 frames, the mean μ and standard deviation σ are calculated for each pixel. A threshold t is calculated for each pixel, defined as the maximum of a fixed threshold L and the standard deviation multiplied by a factor K. Both L and K can be set through the user interface to tweak the segmentation for best performance under given lighting conditions.

Besides the traditional distance-threshold method to identify changed pixels, the segmentation module offers two special modes that take into account the specific lighting setup.

Frontal light setup: Exploiting the fact that the performers wear garments that reflect IR well, segmentation only accepts pixels P brighter than the mean image as foreground:

Eq. 1. Foreground = P > μ + max(L, K · σ)

as opposed to the traditional methods that compare the absolute distance against the threshold. This optional mode very effectively reduces artefacts caused by shadows.

Back light setup: Only the stage background is lit, the performer shows up as a black silhouette. In this case, only darker pixels are accepted as background. See Section 5.1 for a discussion of the individual lighting options.

The image segmentation code greatly profits from an SSE2 optimized implementation. A standard portable version was measured at 1.18 ms per frame, whereas the SSE2 version takes only 0.07 ms.

4.1.4 De-noising An optional de-noising step may be configured to further reduce pixel noise in the image mask generated by the segmentation stage. It is based on the morphological operators erode followed by dilate as provided by the Open CV toolbox. The number of erosion and dilation steps can be configured to decrease or increase the silhouette extent.

At only 0.6 ms processing time per frame for a pair of erode/dilate operations, no further performance optimization was required.

After the final de-noising stage, the resulting mask is copied into a separate buffer to be used by the rendering process (0.2 ms).

4.1.5 Marker Extraction The marker extraction step follows after the initial image smoothing in a separate thread, operating on its own copy of the image buffer. It detects bright spots in the image, corresponding to the IR LEDs attached to the costume. Although marker extraction is not used and tested in the current implementation, we include the performance analysis below as an indication for the feasibility within the framework’s time constraints.

Bright spots are first detected by applying a pixel threshold to the image, thus removing all unwanted background information. The image is then smoothed further with two passes of the optimized 3×3 Gauss filter, turning the bright spots into smooth blobs. The blob centre is detected by applying a Laplace operator (OpenCV) to the image, which highlights the centre of a blob with a strong negative pulse that can be identified efficiently as a local minimum. No attempts were made at extracting sub-pixel location as the higher accuracy was not considered necessary. The total point extraction time is dominated by the Laplace operator and amounts to 3.8 ms. Further speed optimization would be feasible, but was not required as the total pipeline time remained comfortably within the available 5 ms slot.

4.2 Video Feed The video texture to be projected onto the performers is streamed through a network link from Cycling74 Max/MSP/Jitter, a real-time video processing environment that has become a standard among live video and sound artists. This enables the Living Canvas system to tap into the vast amount of creative tools provided by Max/MSP/Jitter and lets the artists use the environment they are familiar with. In a typical production scenario, the same video workstation will also drive the projection system for the stage background. Alternatively, a video file can be decoded on the local machine to minimise system complexity for testing purposes.

4.3 Rendering Process The rendering process receives the projection mask from the vision pipeline, and the video image from a separate video feed thread. Rendering the projector output image is handled in a separate thread to avoid blocking the image vision processing pipeline.

The output image is a composition of the video image and mask. All pixels classified as background are left black to avoid spill onto the stage set. The image is rendered as a regular 2D grid, which corresponds to the checkerboard pattern projected during

the calibration stage (see Section 4.4). Texture unit 0 provides the video image covering the whole projection area. Texture unit 1 provides the mask derived from the camera image, which is warped into the projection space by displacing the texture coordinates at each grid point according to the calibration. All warping and compositing operations are therefore handled by the graphics hardware in a single rendering pass, placing no computational burden onto the main CPU. Rendering requires no frame-buffer clearing or front/back buffer swapping, hence operating at the smallest latency possible at the cost of some potential minor image tearing artefacts.

Rendering time is dominated by the texture upload operation to keep the mask and video image current and takes approximately 1.5 ms per frame on average using nVidia GeForce 8800 GTX graphics hardware.

Figure 5 : Stages of the processing pipeline - screenshots from the small-scale system. a) Near-infrared camera image. b) Segmented image. c) Mask merged with video texture for projection.

4.4 Calibration Calibration is integrated into the vision processing pipeline. It establishes a 2D correspondence between the projector and camera pixels.

Figure 6 : Calibration: a) Projected pattern. b) Extracted feature points in camera image used to establish 2D correspondence.

First, the camera is manually configured to roughly correspond with the projector. A checkerboard pattern is projected onto a plane (Figure 6a) that preferably corresponds approximately to the performer’s position to calibrate the system. The vision software detects the corners (Figure 6b) and stores the corresponding camera coordinates into the texture coordinates used for rendering. Obviously, the IR filter on the camera must be removed for calibration.

While this calibration method might seem overly simplistic, it works well in practice as long as the camera is mounted close to the projector lens. It is simple, robust, and it takes a minimal amount of time and expertise to recalibrate the system. A more

once additional cameras are added to retrieve 3D information (see Section

sophisticated calibration system (such as [19]) will be required

en optimized to fit into the time

4.5.1 Camera Acquisition Delay ses the frame time for

4.5.2 Processing and Rendering Delay ta is copied into

wo environments. A small scale

7 – Future Work).

4.5 Latency Analysis All software processing has beslots defined by the maximum frame-rates of the hardware in use: All machine vision tasks run at the full camera rate of 200 Hz, and graphics rendering greatly exceeds the projector frame rate of 60 Hz at approximately 1.5 ms rendering time per frame. The resulting total system latency from the camera to the projected frame is therefore dominated by the inherent transmission delays and buffering required by the hardware, which are mostly outside software control.

The camera runs at 200 Hz and fully usensor exposure (approx. 4.5 ms). The camera saturates the S800 Firewire bus. Using a packet size of 8160 bytes, it can be assumed that neither camera nor Firewire transmission card add any significant amount of buffering. The best case from the beginning of the frame until the software receives the buffer is therefore estimated at 10 ms, which is also a typical delay if further processing keeps up at full rate and no further buffering is conducted.

Vision processing adds a further 5 ms before dathe rendering buffer. Rendering starts almost immediately after the buffer is ready on a multiprocessor system. With 1.5 ms average update time, rendering runs significantly faster then the actual output frame rate. In the best case, rendering finishes exactly before the next frame-buffer readout, keeping the latency minimal. The worst case delay adds one camera frame (5 ms) on top of the rendering time, as older frame data is replaced with the most recent data. As the frame-buffer is updated while it is read out, at least part of the output frame always reflects the latest data.

Finally, digital projectors typically add a processing delay of one frame (16.6 ms) internally before the image is actually projected.

The total input to output latency therefore includes 10 ms cameraacquisition delay, 5 ms vision processing, a minimum of 1.5 ms rendering, and the internal projector delay of 16.6 ms; the best case total delay therefore adds up to 33 ms, or 16.5 ms when using a projector without internal buffering (e.g. analogue CRT). Although digital projectors with very high frame rates exist [20], they do not fundamentally improve on the latency as they download a sequence of images encoded into a single block (e.g. a video frame over DVI), although somewhat higher base frame rates (e.g. 200 Hz) have been achieved.

5. Testing the System The system was tested using tstage with dolls as “performers” was used for the initial system development and performance benchmarking. A second full size stage was used to test the system with performers wearing a range of garments.

5.1 Lighting Optimization Careful lighting of the environment is crucial to achieve robust image segmentation. The system provides several options to cater for a wider range of stage setups and backgrounds.

Initial experiments were conducted successfully with frontal IR lighting against both a dark background and a back-projection screen on the small-scale stage, similar in concept to [2]. With both backgrounds, the garments are significantly more reflective than the stage set. Segmentation is also supported by the distance fall-off effect, lighting the objects closer than the background (e.g. the performer) brighter. Accepting brighter pixels only as foreground (see Section 4.1.3) therefore results in a robust segmentation.

The same lighting setup proved to be difficult in the large scale testing environment with a front-projection screen as stage background. Due to space limitations and a narrow projection angle, the performer has to stand relatively close to the projection screen, which is IR-reflective. The resulting contrast between performer and background proved to be insufficient. Instead, aiming the IR illuminator at the background at an oblique angle results in a dark performer in front of a bright background. While the narrow beam of the IR illuminator in combination with space restriction make an even illumination of the background difficult, it nonetheless results in a significantly more robust segmentation. Using several illuminators would easily solve the remaining issues.

5.2 Garments The selection of the costume influences both segmentation and projection quality. In the case of frontal illumination, the garment should reflect near-IR light well, preferably with a diffuse pattern and no strong highlights to clearly reveal the silhouette. In the case where the background is illuminated, a garment with little IR reflectivity is preferred to avoid interference, but this is not critical as long as the performer does not enter the light cone.

Similarly, a matt white garment is required to achieve a high quality projection. Specular highlights should be kept to a minimum.

Our tests with a range of costumes revealed that many white garments have similar properties in both the visible spectrum and near infrared, as opposed to colored fabrics where visible and IR spectra are often totally unrelated.

Figure 7 shows a range of garments as seen by the near-IR camera and a regular photographic camera. Black cotton is included as an example of a fabric where visible and IR spectra are highly uncorrelated. We found velvet to be a particularly interesting fabric: Its complex reflection properties result in good segmentation even at oblique angles, and it receives projection beautifully despite the irregular highlights. We also found that projecting a texture (Figure 7 top row) reveals little visible differences between fabrics. White costumes can be selected mostly for their visual appearance under projection with little regard to the segmentation characteristics.

Figure 7 : A range of different garments and their reflection properties in the visible and the near IR range. Top row: projecting texture. Middle row: projecting pure white. Bottom row: IR camera image. a) Lycra mat, b) Lycra shiny, c) Crushed velvet, d) Black cotton.

5.3 Precision and Calibration Precision of the mask depends on a range of factors. The camera choice defines the theoretical resolution limits. With a single Dragonfly Express camera covering a stage area of 4 m, the resulting pixel size equals 6.25 mm. This is enough to cover limbs with reasonable precision, but will not render details such as individual fingers clearly (see background in Figure 8a). Projector resolution is typically more than twice that of the camera (in our experiments 1400x1050 for the projector vs. 640x480 for the camera), which provides sufficient texture information. Higher resolution cameras are available, but typically result in significantly increased latency and cost.

The director may choose any size of active projection area, trading the active space for resolution depending on the artistic needs for a given performance. Several independent systems can be run in parallel, although there are no special provisions currently to handle overlapping projections.

Segmentation errors further decrease the effective mask resolution. Depending on lighting conditions and garment, pixels corresponding to oblique regions around the performer’s silhouette tend to be misclassified as background. Segmentation and post-filtering parameters can be adjusted to bias the resulting error towards or against favouring full body coverage at the cost of background spill, depending on the director’s preferences.

Calibration errors between the camera and projector can arise from several sources. First, the camera resolution is lower than the projector resolution, leading to small pixel offsets from the inherent limitations of the calibration process. Second, the image warping process uses piecewise linear interpolation between the calibration support points (checkerboard pattern), therefore ignoring the non-linear effects such as lens distortion inside the interpolated regions. Higher order interpolation could easily be implemented, though, if this was identified as an issue.

Figure 8 : Error analysis using the full scale stage system. a) Calibrated system with nearly no spill onto background. Effects of latency when moving an arm: Spill onto background (b) or missing projection on body (c).

A fundamental error results from ignoring the distance between the performer and camera/projector. Calibration is correct only at the original calibration plane. To minimize the error when moving out of the calibrated plane, the distance between projector and camera origins have to be minimized in relation to the distance between the camera and the performer. The further away the camera/projector pair is from the stage, the smaller the error. However, placing the camera too far away increases the demands on the camera lens quality; otherwise the image might appear too dark. Also, vibrations of the camera or projector are amplified with the distance. Calibration errors due to the changing distance will be handled once additional cameras are introduced to track body parts in 3D (see Section 7 – Future Work) at the cost of a more time consuming calibration procedure.

5.4 Speed The system latency defined as the time between the start of camera exposure until projection of the masked video is analytically identified to be approximately 33 ms (see Section 4.5).

A delay of 33 ms means that projection lags by 3.3 cm if a performer is moving at 1 m/s. Limb motion easily exceeds such speeds, hence an accurate projection mask is not obtainable for fast dance performances unless intelligent motion prediction schemes are introduced. Given the limitations of current projector technology, performers have to adapt their playing to the system latency to avoid excessive background spill. Figure 8b and c show the effect of the latency.

6. Suitability for Stage Performance While the technology has proven to work well in a lab environment, the ultimate test lies in its deployment within an actual production.

6.1 Artist’s Reflections The technology has been developed in close collaboration with Cryptic. We have asked their artistic director to reflect upon the technology and comment on the suitability for a stage performance based on the prototype and its limitations regarding latency, precision, and stage and lighting setup. The following paragraphs reflects the comments of the artist.

“This technology opens up many exciting new areas for live performance, exploring further the areas of what is real and an illusion on stage. It creates a visual tool which can follow the performer on stage and has enormous potential not just for theatre

work, but for opera and children’s theatre too. However as with all technologies there are limits as to speed, distance, angles, time delay which need to be explored. Test bedding the prototype is crucial for understanding the limits of this tool.

As with any good multi-media performance, it is imperative that the technology is an integral part of the performance rather than an add-on within the design. And only once the limits are tested in the right conditions can we decide how best to use Living Canvas in performance.

The question of the impact of speed and latency is a difficult one to answer until we work with a performer and test the variables. As we are not planning to choreograph a dance piece, we will explore what speed is comfortable and still believable. Stylising the performance so it is graceful and slow can always enhance the theatricality of the work. It also depends on when exactly one is using this technology within a performance. It could be too much to have it in use all the time – instead, we would use it at certain points and certain areas, where the performance could become dreamlike.

We imagine that in order for this technology to look its best, it is imperative that there is significant depth on stage. The audience cannot be too close to the stage. Quite often with dance, depth is required to appreciate the choreography and lighting. With Living Canvas it would be most affective in a longer space which would mean that the venues where we tour to would need to ensure depth so that the audience is looking at a distance and preferably straight on like most traditional studio theatre spaces. For larger auditoriums, one solution for sight lines would be to block off the side seats; however until we test bed this tool we might discover that 5 metres is enough depth.

There are many exciting applications for Living Canvas and until we test it live in performance we can only guess how it will respond to the artists on stage. One thing we can be sure is that Living Canvas narrows the gap of what is real and virtual on stage which will no doubt enhance the whole theatrical experience - the possibilities are endless.”

6.2 Workshop Results The technology was extensively tested in a week long workshop at the Centre for Contemporary Arts in Glasgow. The workshop brought together artist director, stage manager, multimedia technician and lighting expert to explore the potential and issues in a stage-like environment. The technology proved to be robust during the workshop. While the aforementioned limitations regarding precision and latency did show up during testing, the artistic director actually felt inspired to exploit them as an effect, asking for controls to introduce further “errors” into the system.

A special focus was put onto the lighting conditions. As expected, the incandescent stage lights generated a lot of energy in the near-infrared spectrum – easily overpowering the dedicated IR illuminators – which will require further attention in the follow-up project. However, it was found that this can be used to the advantage of the system, lowering actual costs by doing away with the dedicated illuminators and instead controlling visible light intensities with dark gels. Figure 9 shows selected images

from the workshop, an in-depth analysis of the results has been published in [21].

Figure 9 : Images from the exploratory workshop.

7. Discussion and Future Work This first study has shown the technical feasibility of the Living Canvas idea. An initial prototype including the major machine vision and projection components has been built and tested. Measurements of the total system latency and tests indicate that the system is fast enough for a theatrical stage performance, although it will not be able to cope with fast movements typical for a dance performance unless faster projectors without internal delay become available. A range of fabrics and costumes have been tested successfully with the system, showing that there are few restrictions to the costume design.

Future development work will focus on the integration of pose detection of the performer to enable the vision of virtual costumes based on algorithms as described in [22-24]. Adding more cameras to the system will support 3D tracking of limbs and the head, identified by small infrared LED clusters. The same tracking system will also provide the necessary information for a full 3D warping step to reduce the mismatch between the silhouette acquired through the camera and the projection. The main challenge will be to make the system fast and robust while keeping calibration complexity low enough for deployment in a typical theatre production.

In addition to the feature extensions mentioned above, bringing the system onto a theatrical stage requires further experimentation with the near-infrared lighting system. Currently, the segmentation system assumes a static background. However, traditional stage spotlights emit a significant amount of energy in the near-IR band and therefore influence the segmentation system. The Living Canvas system must interact with the lighting system to select background statistics according to the current stage lighting conditions. Digital projection systems, on the other hand, emit no energy in the near infrared spectrum and can therefore be ignored safely.

Finally, Cryptic plans to incorporate the dynamic projection system into a new work scheduled to premiere in 2010. The guidance from Cryptic’s artistic director and the collaborating artists will drive the further development of the technology, making it robust enough for deployment in an actual production.

8. Acknowledgements This feasibility study was funded by the UK Arts and Humanities Research Council (AHRC) under grant AH/E006698/1, with additional support from Cryptic.

9. References [1] Gross, M., Wuermlin, S., Naef, M., Lamboray, E., Spagno,

C., Kunz, A., Koller-Meier, E., Svoboda, T., Van Gool, L., Lang, S., Strehlke, K., Vande Moere, A. and Staadt, O. 2003. blue-c: a spatially immersive display and 3D video portal for telepresence. In Proceedings of the ACM SIGGRAPH 2003 Papers (San Diego, California, 2003). ACM Press

[2] Rhee, S.-M., Ziegler, R., Park, J., Naef, M., Gross, M. H. and Myoung-Hee, K., 2007. Low-Cost Telepresence for Collaborative Virtual Environments. Transactions on Visualization and Computer Graphics, 13, 1, pp. 156-166.

[3] The Builders Association and dbox, 2005. Super Vision. New York.

[4] Zehnder, D., 1999. Un-do. Zurich. [5] Jones, B. T., Kaiser, P., Eshkar, S. and Downie, M., 2005.

22. Arizona State University. [6] Woodworth, Z., 2005. One way the wrong way. [7] Stein, G., Katz, L. and Faver, C., 2002. Making of

Americans: The Silent Scream of Martha Hersland. [8] Seligmann, D. D., 2001. Toward a digital stage architecture:

a long-term research agenda in digitally enabled theater. Multimedia, IEEE, 8, 4, pp. 6-9.

[9] Werner, A., 2002. Marlowe: Der Jude von Malta (The Jew of Malta). München.

[10] Rubidge, S., MacDonald, A., Moffatt, M. and Besse, S., 2003. Sensuous Geographies. Glasgow.

[11] Medlin, M. and Wishart, S., 2007. Quartet. London. [12] Debevec, P., Wenger, A., Tchou, C., Gardner, A., Waese, J.

and Hawkins, T. 2002. A lighting reproduction approach to live-action compositing. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques (SIGGRAPH 2002) (San Antonio, Texas, 2002). ACM Press

[13] Tachi, S. 2003. Telexistence and Retro-reflective Projection Technology (RPT). In Proceedings of the 5th Virtual Reality International Conference (VRIC2003) (Laval Virtual, France, May 13-18, 2003)

[14] Raskar, R., Welch, G., Low, K.-L. and Bandyopadhyay, D. 2001. Shader Lamps: Animating Real Objects With Image-Based Illumination In Proceedings of the 12th Eurographics Workshop on Rendering (London, UK, June 25-27, 2001). Eurographics

[15] Morishima, S., Yotsukura, T., Binsted, K., Nielsen, F. and Pinhanez, C. 2000. HyperMask: Talking Head Projected Onto Real Objects. In Proceedings of the International Conference on Multimedia Modeling (MMM'00) (Nagano, Japan, November 13-15, 2000)

[16] Jaynes, C., Webb, S., Steele, R. M., Brown, M. and Seales, W. B. 2001. Dynamic Shadow Removal from Front Projection Displays. In Proceedings of the 12th IEEE Visualization 2001 (VIS'01) (San Diego, CA, USA, October 21-26, 2001). IEEE

[17] Bradski, G., 2000. Programmer’s tool chest: The OpenCV Library. Doctor Dobbs Journal, M and T Publishing Inc., 25, 11, pp. 120-126.

[18] Piccardi, M. 2004. Background subtraction techniques: a review. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2004 (2004)

[19] Svoboda, T., Martinec, D. and Pajdla, T., 2005. A Convenient Multicamera Self-Calibration for Virtual Environments. Presence: Teleoperators & Virtual Environments, 14, 4, pp. 407-422.

[20] Jones, A., McDowall, I., Yamada, H., Bolas, M. and Debevec, P. 2007. An interactive 360° light field display. In Proceedings of the ACM SIGGRAPH 2007 emerging technologies (San Diego, California, 2007). ACM Press

[21] Naef, M. and Boyd, C. 2008. Testing the Living Canvas on Stage. In Proceedings of the Electronic Information, the Visual Arts and Beyond (EVA London 2008) (London, UK, July 22-24, 2008). The British Computer Society

[22] Forsyth, D. A., Arikan, O., Ikemoto, L., O'Brien, J. and Ramanan, D., 2006. Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis. Now Publishers.

[23] Wren, C. R., Azarbayejani, A., Darrell, T. and Pentland, A. P., 1997. Pfinder: Real-Time Tracking of the Human Body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 7, pp. 780-785.

[24] Caillette, F., Galata, A. and Howard, T., 2008. Real-Time 3-D Human Body Tracking using Learnt Models of Behaviour'. Computer Vision and Image Understanding (CVIU) Journal, 109, 2, pp. 112-125.

Feasibility of the Living Canvasresearch.navisto.ch/pdf/mm2008-naef-lc.pdf · The Living Canvas...

Documents

Transcript of Feasibility of the Living Canvasresearch.navisto.ch/pdf/mm2008-naef-lc.pdf · The Living Canvas...