EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky...

4
EYE SOCIETY Jacky Mallett and V. Michael Bove, Jr. MIT Media Laboratory 20 Ames Street, Room E15-368B Cambridge MA 02139 USA [email protected], [email protected] ABSTRACT We describe a system of small wireless autonomous mobile cameras that operate as a group to solve machine vision tasks. Given sufficient on-board computing to do real- time scene analysis without external processing resources, these cameras enable us to investigate how scene understanding can be improved when each camera is independently capable of analyzing its own sensor data, and sharing information on what it sees with its fellow robots. - INTRODUCTION The availability of inexpensive, low-power embedded processors that match the processing capacity of the workstations that many of us used for machine vision in the recent past suggests that we combine these chips with the increasingly widespread wireless networking technologies to create the hardware substrate for a range of smart wireless devices, in particular small, intelligent mobile cameras. An intelligent mobile camera is often thought of as a robot, but interest is growing in applications where such devices can perform visual problem- solving as a cooperating group. - Copyright 2002 IEEE. Published in the 2003 International Conference on Multimedia and Expo (ICME 2003), scheduled for July 6-9, 2003 in Baltimore, Maryland, USA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966. Applications for coordinating multiple cameras are faced with a number of logistical and technological challenges. Logistically the cameras must be correctly positioned at known locations, and this calibration information associated with the data from the camera. Processing of the visual data is usually done in a centralized location, in which case limitations on communications bandwidth and processor speed can restrict system scalability, while the existence of a single point of failure (the processing system) limits robustness. Despite these challenges, the rapidly dropping cost and size of cameras and the increased ease of acquiring images from many of them simultaneously have enabled a growing number of multi-camera applications. The film industry has recently made extensive use of multiple-camera- array shots, typically for specialized effects as exemplified by the film The Matrix. Similarly machine vision researchers have shown increased interest in their application to a number of practical scene and object analysis and virtual reality problems. BACKGROUND The problem of multi-camera synthesis has received increasing attention from several different fields of research, and in several different forms. Virtual reality researchers such as Moezzi [1] and Kanade [2] have examined the field from the perspective of integrating real and virtual environments, with statically placed sets of cameras for

Transcript of EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky...

Page 1: EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky Mallett and V ... time scene analysis without external ... intelligent mobile cameras.

EYE SOCIETYJacky Mallett and V. Michael Bove, Jr.

MIT Media Laboratory20 Ames Street, Room E15-368B

Cambridge MA 02139 [email protected], [email protected]

ABSTRACT

We describe a system of small wirelessautonomous mobile cameras that operate asa group to solve machine vision tasks. Givensufficient on-board computing to do real-time scene analysis without externalprocessing resources, these cameras enableus to investigate how scene understandingcan be improved when each camera isindependently capable of analyzing its ownsensor data, and sharing information onwhat it sees with its fellow robots.

-

INTRODUCTION

The availability of inexpensive, low-powerembedded processors that match theprocessing capacity of the workstations thatmany of us used for machine vision in therecent past suggests that we combine thesechips with the increasingly widespreadwireless networking technologies to createthe hardware substrate for a range of smartwireless devices, in particular small,intelligent mobile cameras. An intelligentmobile camera is often thought of as a robot,but interest is growing in applications wheresuch devices can perform visual problem-solving as a cooperating group.

- Copyright 2002 IEEE. Published in the 2003 International Conference on

Multimedia and Expo (ICME 2003), scheduled for July 6-9, 2003 in Baltimore,Maryland, USA. Personal use of this material is permitted. However, permissionto reprint/republish this material for advertising or promotional purposes or forcreating new collective works for resale or redistribution to servers or lists, or toreuse any copyrighted component of this work in other works, must be obtainedfrom the IEEE. Contact: Manager, Copyrights and Permissions / IEEE ServiceCenter / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA.Telephone: + Intl. 908-562-3966.

Applications for coordinating multiplecameras are faced with a number oflogistical and technological challenges.Logistically the cameras must be correctlypositioned at known locations, and thiscalibration information associated with thedata from the camera. Processing of thevisual data is usually done in a centralizedlocation, in which case limitations oncommunications bandwidth and processorspeed can restrict system scalability, whilethe existence of a single point of failure (theprocessing system) limits robustness.

Despite these challenges, the rapidlydropping cost and size of cameras and theincreased ease of acquiring images frommany of them simultaneously have enableda growing number of multi-cameraapplications. The film industry has recentlymade extensive use of multiple-camera-array shots, typically for specialized effectsas exemplified by the film The Matrix.Similarly machine vision researchers haveshown increased interest in their applicationto a number of practical scene and objectanalysis and virtual reality problems.

BACKGROUND

The problem of multi-camera synthesis hasreceived increasing attention from severaldifferent fields of research, and in severaldifferent forms. Virtual reality researcherssuch as Moezzi [1] and Kanade [2] haveexamined the field from the perspective ofintegrating real and virtual environments,with statically placed sets of cameras for

Page 2: EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky Mallett and V ... time scene analysis without external ... intelligent mobile cameras.

three dimensional data capture. Hilton [3]discusses the use of multiview color imagesto tackle the problem of building animated,realistic, 3D models of people. Mittal [4]presents a system that uses 16 CCD camerasto track people in a cluttered scene, and hasalso examined the related problem oftracking people through the views fromwidely dispersed cameras.[5] An overviewof the requirements for a multi-agentapproach specifically for surveillancecamera control is described by Orwell.[6]

The use of multiple cameras along withother sensors is also being explored forHuman Computer Interaction, in particularin the creation of “intelligent rooms.” [7]Hoover [8] demonstrates the use of anetwork of fixed cameras to track and steer amobile robot.

The related problem of coordinatingmultiple independent robots to perform taskscollaboratively is also receiving moreattention as the computational poweravailable increases. There is still anoticeable reluctance to move to completelocal autonomy; most systems such as thatdescribed by Hajjawi [9] prefer an approachthat includes partial local autonomy but stillinvolves some kind of remote supervisorycontrol system.

There has been some exploration ofspecifically camera-based problems usingrobots; Jennings [10] in particular presentsan approach to robot navigation where twostereo camera equipped robots co-operatively build a mutual map of theirlandscape.

EYE SOCIETY

In this project we are creating a “society” ofsmall, cheap, autonomous wireless mobilecameras each controlled by an embedded

processor board running Linux. Each cameracan independently pan and tilt and movealong an overhead lighting track, and is inconstant communication with its fellowcameras as it does so. With this as a basiswe are investigating how scene analysis canbe improved when each camera isindependently capable of analyzing its owndata, and sharing information on what it seeswith its fellow robots.

Figure 1: Eye Society robots on anoverhead track.

Networks of independent sensors withlimited processing (e.g. [11]) are becomingcommon, but the long-term goal of EyeSociety is to put as much programmableprocessing as possible in a small and mobilepackage, while providing higher-level self-organizing abilities so that the robots canform arbitrary task-oriented groups.

The fundamental principles behind the EyeSociety project are:

• It is possible to put sufficientprocessing into an information-capture device such that a centralprocessing server is not needed formany sophisticated tasks.

• A user or programmer does not needto think in terms of an identifiableindividual device but of the overallsystem.

Page 3: EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky Mallett and V ... time scene analysis without external ... intelligent mobile cameras.

The absence of centralized externalprocessing or control – or even a master-slave hierarchy among the devices –eliminates the possibility of a single point offailure, while moving tasks to the“ecosystem” of cameras improves systemscalability, as each sensor brings alongenough processing power to handle the datait acquires.

Each camera in the project operates as afully autonomous entity, with wireless(multicast) communication to other robotsand access to “offshore” resources such asdatabases and file space as required. Thecurrent version of the robot controllercenters on a commercial processor boardcontaining a StrongARM SA-1110processor running at 206Mhz, 32MB offlash memory, an SA-1111 companion chip(which allows interfacing to USB devices),and additional analog and digital inputs andoutputs used to control locomotion andcamera servos and to accept input fromsensors such as microphones. An add-onboard of our design contains motor drivers.Communication is through IEEE 802.11b.

Because of the limited memory on-board,each robot can mount an external NFS filesystem if needed to provide access toadditional storage space or external data.

Although the processor is fairly modest, weare able to acquire color video at 5frames/second at a resolution of 320 by 240pixels; running a simple color-histogramobject tracking algorithm on the incomingvideo slows down the throughput rate byless than one percent, suggesting that theacquisition speed is limited by busbandwidth rather than processorperformance. Complex algorithms thatperform more operations per pixel do ofcourse execute more slowly.

The cameras operate in a relatively normallaboratory area, with varying lighting andmovable furniture. Since none of thecameras has a fixed position, (although eachcan run only in one dimension along a linearrailing), visual servoing is being developedthat allows them to determine their relativepositions using fixed points in theenvironment. We are currently exploringenvironmentally adaptive ways to do this asa collaborative problem among the robotgroup, rather than using pre-positionedcalibration points.

Allowing the cameras to run autonomouslypermits us to develop control and visualalgorithms with elementary feedback loops.This applies to a single camera, which canadjust its target using both the pan and tiltcontrols and its lateral movement on thetrack. In effect this allows the camera tocapture and operate on a set of multipleviews as it moves under its own controlalong the track. While an obvious use of themovement is for traditional stereopsis andegomotion algorithms, an additionalpossibility is that of self-adjusting positionand heading to find optimal locations foracquisition and calibration (e.g. locatinggood three-point perspective), or to comparepredicted views generated by rendering amodel with actual views and then updatingthe model to account for the differences.

When this principle is applied to a group ofcameras working together things get evenmore interesting. The ability of the camerasto move and collectively self-adjust allowsus to take a more dynamic approach toproblems of visual analysis such asocclusion. Similarly when the cameras areoperating as an organized group, they canuse the pre-computed knowledge of theirrelative positions and views to provide eachother with additional perspective

Page 4: EYE SOCIETY ABSTRACT - web.media.mit.eduweb.media.mit.edu/~vmb/papers/icme2003.pdfEYE SOCIETY Jacky Mallett and V ... time scene analysis without external ... intelligent mobile cameras.

information of the scene from the individualpoints of view. Besides rather elementaryadvantages such as compensating forindividual failure, or lighting effects (suchas specular reflections) thatdisproportionately affect one cameraposition, this also allows hypothesis-testingresults to be computed for objects in thescene from other cameras’ perspectives.

Current work with the robots is aimed atresolving the group calibration problem, andexploring mechanisms that allow the robotsto maximize the information provided bytheir shared view of the space in which theyare positioned.

Longer-term work includes increasing theprocessor power of the robots to allow moresophisticated real-time processing,incorporating additional sensors such asmicrophones, and enabling other sorts ofdevices (both stationary sensing devices andunconstrained-motion floor robots) tobecome part of the acquisition-and-understanding “ecosystem.”

REFERENCES

[1] S. Moezzi, L.-C. Tai, and P. Gerard,“Virtual View Generation for 3D DigitalVideo” IEEE Multimedia, Jan-Mar 1997.

[2] T. Kanade, P. Rander, and P. J.Narayanan, “Virtualized Reality:Constructing Virtual Worlds from RealScenes,” IEEE Multimedia, Jan-Mar 1997.

[3] D. Beresford, T. Gentils, R. Smith, W.Sun, and John Illingworth, “Whole-bodymodeling of people from multiview imagesto populate virtual worlds,” The VisualComputer, Volume 16, Issue 7, pp 411-436.

[4] A. Metal and L. S. Davis, “M2Tracker:A Multi-View Approach to Segmenting and

Tracking People in a Cluttered Scene UsingRegion-Base Stereo.” Proceedings, Part IECCV 2002 7th European Conference onComputer Vision, Copenhagen, Denmark,May 28-31, 2002.

[5] A. Metal and L. S. Davis. “Unifiedmulti-camera detection and tracking usingregion-matching,” Proceedings 2001 IEEEWorkshop on Multi-Object Tracking, pp 3–10.

[6] J. Orwell, S. Massey, P. Remaining, D.Greenhill, and G. Jones, “A multi-agentframework for visual surveillance,” Proc.International Conference on Image Analysisand Processing, 1999, pp: 1104 –1107.

[7] M. H. Cone and K. W. Wilson,“Learning Spatial Event Models fromMultiple-Camera Perspectives,” IEEEIECON '99 Proceedings, Volume: 1, 1999,pp. 149 -156 .

[8] A. Hoover and B. D. Olsen, “Sensornetwork perception for mobile robotics andAutomation,” Proceedings IEEE ICRA '00,Volume 1, 2000, pp. 342 –347.

[9] M. K. Hajjawi and A. Shirkhodaie,“Cooperative visual team working and targettracking of mobile robots,” Proceedings ofthe Thirty-Fourth Southeastern Symposiumon System Theory, 2002 pp. 376 –380.

[10] C. Jennings, D. Murray, and J. J. Little,“Cooperative robot localization with vision-based mapping,” Proceedings 1999 IEEEInternational Conference on Robotics andAutomation, Volume 4, 1999, pp. 2659–2665.

[11] J. D. McLurkin, Algorithms forDistributed Sensor Networks, SM Thesis,University of California at Berkeley, 1999.