Electronic Acknowledgement Receipt - Peopleyang/paper/UCB-BSN... · 2011-03-30 · This...
Transcript of Electronic Acknowledgement Receipt - Peopleyang/paper/UCB-BSN... · 2011-03-30 · This...
Electronic Acknowledgement Receipt
6577791EFS ID:
Application Number: 12631714
Confirmation Number: 2361
International Application Number:
Title of Invention: SYSTEM FOR DETECTION OF BODY MOTION
First Named Inventor/Applicant Name: Ruzena Bajcsy
Customer Number: 37490
Application Type: Utility under 35 USC 111(a)
Time Stamp: 19:39:59
Filing Date:
Receipt Date: 04-DEC-2009
Attorney Docket Number: 010030-002710US
Filer Authorized By: Charles J. Kulas
Filer: Charles J. Kulas/Megan Godsey
Payment information:Submitted with Payment no
File Listing:
Document Number Document Description File Name File Size(Bytes)/
Message DigestMulti
Part /.zipPages
(if appl.)
1 Application Data Sheet 010030-002710US-B08-082-2-ADS.pdf
1185052no 5
ab707445f83afef538b39db5d6d1697966f3e7ea
Warnings:
Information:
2 Drawings-only black and white line drawings
010030-002710US-B08-082-2-Figures.pdf
1324283no 15
230ab0f9ee1c7c9b9807d6a9a0d89e666464296b
Warnings:
Information:
3 Information Disclosure Statement (IDS) Filed (SB/08)
010030-002710US-B08-082-2-IDS.pdf
777024no 4
cb363f71b2ddb0e4c6450e87c1ec40a84ce2b79b
Warnings:
Information:
A U.S. Patent Number Citation or a U.S. Publication Number Citation is required in the Information Disclosure Statement (IDS) form for autoloading of data into USPTO systems. You may remove the form to add the required data in order to correct the Informational Message if you are citing U.S. References. If you chose not to include U.S. References, the image of the form will be processed and be made available within the Image File Wrapper (IFW) system. However, no data will be extracted from this form. Any additional data such as Foreign Patent Documents or Non Patent Literature will be manually reviewed and keyed into USPTO systems.
4 NPL Documents 010030-002710US-Reference1-CVPR4HB-AllenYang.pdf
616677no 8
c5f378852f60f98a3ba3fc46ba564393e48fed20
Warnings:
Information:
5 NPL Documents 010030-002710US-Reference2-JAISE08-AllenYang.pdf
435447no 13
a32457a386377ea074f444c1abe62baf22c0da56
Warnings:
Information:
6 NPL Documents 010030-002710US-Reference3-EECS-2007-143.pdf
1220691no 19
85cebc28099ff5e89f36e84b1f5091ba21050a19
Warnings:
Information:
7 NPL Documents 010030-002710US-Reference4-spots09-paper6.pdf
2340714no 11
7afd37e789b4c26477ec6aec5dd79d7857b83a9d
Warnings:
Information:
8010030-002710US-B08-082-2-
SystemforDetectionofBodyMotion-v6.pdf
198715yes 36
e6475477f29832165778b2ee1913d976948a0ce6
Multipart Description/PDF files in .zip description
Document Description Start End
Specification 1 33
Claims 34 35
Abstract 36 36
Warnings:
Information:
Total Files Size (in bytes): 8098603
This Acknowledgement Receipt evidences receipt on the noted date by the USPTO of the indicated documents, characterized by the applicant, and including page counts, where applicable. It serves as evidence of receipt similar to a Post Card, as described in MPEP 503. New Applications Under 35 U.S.C. 111 If a new application is being filed and the application includes the necessary components for a filing date (see 37 CFR 1.53(b)-(d) and MPEP 506), a Filing Receipt (37 CFR 1.54) will be issued in due course and the date shown on this Acknowledgement Receipt will establish the filing date of the application. National Stage of an International Application under 35 U.S.C. 371 If a timely submission to enter the national stage of an international application is compliant with the conditions of 35 U.S.C. 371 and other applicable requirements a Form PCT/DO/EO/903 indicating acceptance of the application as a national stage submission under 35 U.S.C. 371 will be issued in addition to the Filing Receipt, in due course. New International Application Filed with the USPTO as a Receiving Office If a new international application is being filed and the international application includes the necessary components for an international filing date (see PCT Article 11 and MPEP 1810), a Notification of the International Application Number and of the International Filing Date (Form PCT/RO/105) will be issued in due course, subject to prescriptions concerning national security, and the date shown on this Acknowledgement Receipt will establish the international filing date of the application.
EFS Web 2.2.2
PTO/SB/14 (07-07)
Approved for use through 06/30/2010. OMB 0651-0032
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it contains a valid OMB control number.
Application Data Sheet 37 CFR 1.76 Attorney Docket Number
Application Number
Title of Invention
The application data sheet is part of the provisional or nonprovisional application for which it is being submitted. The following form contains the
bibliographic data arranged in a format specified by the United States Patent and Trademark Office as outlined in 37 CFR 1.76.
This document may be completed electronically and submitted to the Office in electronic format using the Electronic Filing System (EFS) or the
document may be printed and included in a paper filed application.
Secrecy Order 37 CFR 5.2
Portions or all of the application associated with this Application Data Sheet may fall under a Secrecy Order pursuant to
37 CFR 5.2 (Paper filers only. Applications that fall under Secrecy Order may not be filed electronically.)
Applicant Information:
Applicant
Inventor Legal Representative under 35 U.S.C. 117 Party of Interest under 35 U.S.C. 118Applicant Authority
SuffixPrefix Given Name Middle Name Family Name
Residence Information (Select One) US Residency Non US Residency Active US Military Service
City State/Province Country of Residence
Citizenship under 37 CFR 1.41(b)
Mailing Address of Applicant:
CountryPostal Code
Address 1
Address 2
City State/Province
Applicant
Inventor Legal Representative under 35 U.S.C. 117 Party of Interest under 35 U.S.C. 118Applicant Authority
SuffixPrefix Given Name Middle Name Family Name
Residence Information (Select One) US Residency Non US Residency Active US Military Service
City State/Province Country of Residence
Citizenship under 37 CFR 1.41(b)
Mailing Address of Applicant:
CountryPostal Code
Address 1
Address 2
City State/Province
Applicant
Inventor Legal Representative under 35 U.S.C. 117 Party of Interest under 35 U.S.C. 118Applicant Authority
SuffixPrefix Given Name Middle Name Family Name
Residence Information (Select One) US Residency Non US Residency Active US Military Service
City State/Province Country of Residence
010030-002710US
SYSTEM FOR DETECTION OF BODY MOTION
1
Ruzena Bajcsy
Berkeley CA USi
USi
US94720
665 Soda Hall
Berkeley CA
i
2
Allen Y. Yang
Berkeley CA USi
CNi
US94720
307 Cory Hall
Berkeley CA
i
3
S. Shankar Sastry
Berkeley CA USi
EFS Web 2.2.2
PTO/SB/14 (07-07)
Approved for use through 06/30/2010. OMB 0651-0032
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it contains a valid OMB control number.
Application Data Sheet 37 CFR 1.76 Attorney Docket Number
Application Number
Title of Invention
Citizenship under 37 CFR 1.41(b)
Mailing Address of Applicant:
CountryPostal Code
Address 1
Address 2
City State/Province
Applicant
Inventor Legal Representative under 35 U.S.C. 117 Party of Interest under 35 U.S.C. 118Applicant Authority
SuffixPrefix Given Name Middle Name Family Name
Residence Information (Select One) US Residency Non US Residency Active US Military Service
City State/Province Country of Residence
Citizenship under 37 CFR 1.41(b)
Mailing Address of Applicant:
CountryPostal Code
Address 1
Address 2
City State/Province
All Inventors Must Be Listed - Additional Inventor Information blocks may be
generated within this form by selecting the Add button.
Correspondence Information:
Enter either Customer Number or complete the Correspondence Information section below.
For further information see 37 CFR 1.33(a).
An Address is being provided for the correspondence Information of this application.
Customer Number
Email Address
Application Information:
Title of the Invention
Attorney Docket Number Small Entity Status Claimed
Application Type
Subject Matter
Suggested Class (if any) Sub Class (if any)
Suggested Technology Center (if any)
Total Number of Drawing Sheets (if any) Suggested Figure for Publication (if any)
010030-002710US
SYSTEM FOR DETECTION OF BODY MOTION
USi
US94720
514 Cory Hall
Berkeley CA
i
4
Roozbeh Jafari
Richardson TX USi
IRi
US75080
800 W. Campbell Road, EC33
Richardson TX
i
Add
37490
Remove EmailAdd Email
SYSTEM FOR DETECTION OF BODY MOTION
010030-002710US
15 19
Nonprovisional
Utility
EFS Web 2.2.2
PTO/SB/14 (07-07)
Approved for use through 06/30/2010. OMB 0651-0032
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it contains a valid OMB control number.
Application Data Sheet 37 CFR 1.76 Attorney Docket Number
Application Number
Title of Invention
Publication Information:
Request Early Publication (Fee required at time of Request 37 CFR 1.219)
Request Not to Publish. I hereby request that the attached application not be published under 35 U.S.
C. 122(b) and certify that the invention disclosed in the attached application has not and will not be the subject of
an application filed in another country, or under a multilateral international agreement, that requires publication at
eighteen months after filing.
Representative Information:
Representative information should be provided for all practitioners having a power of attorney in the application. Providing
this information in the Application Data Sheet does not constitute a power of attorney in the application (see 37 CFR 1.32).
Enter either Customer Number or complete the Representative Name section below. If both sections
are completed the Customer Number will be used for the Representative Information during processing.
Customer Number US Patent Practitioner Limited Recognition (37 CFR 11.9)Please Select One:
Customer Number
Domestic Benefit/National Stage Information:
This section allows for the applicant to either claim benefit under 35 U.S.C. 119(e), 120, 121, or 365(c) or indicate National Stage
entry from a PCT application. Providing this information in the application data sheet constitutes the specific reference required by
35 U.S.C. 119(e) or 120, and 37 CFR 1.78(a)(2) or CFR 1.78(a)(4), and need not otherwise be made part of the specification.
Prior Application Status
Application Number Continuity Type Prior Application Number Filing Date (YYYY-MM-DD)
Additional Domestic Benefit/National Stage Data may be generated within this form
by selecting the Add button.
Foreign Priority Information:
This section allows for the applicant to claim benefit of foreign priority and to identify any prior foreign application for which priority is
not claimed. Providing this information in the application data sheet constitutes the claim for priority as required by 35 U.S.C. 119(b)
and 37 CFR 1.55(a).
Application Number Country Parent Filing Date (YYYY-MM-DD) Priority Claimed
Yes No
Additional Foreign Priority Data may be generated within this form by selecting the
Add button.
Assignee Information: Providing this information in the application data sheet does not substitute for compliance with any requirement of part 3 of Title 37
of the CFR to have an assignment recorded in the Office.
.
Assignee
010030-002710US
SYSTEM FOR DETECTION OF BODY MOTION
37490
Pending Remove
non provisional of 61119861 2008-12-04
Remove
i
1
EFS Web 2.2.2
PTO/SB/14 (07-07)
Approved for use through 06/30/2010. OMB 0651-0032
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it contains a valid OMB control number.
Application Data Sheet 37 CFR 1.76 Attorney Docket Number
Application Number
Title of Invention
If the Assignee is an Organization check here.
Organization Name
Mailing Address Information:
Address 1
Address 2
City State/Province
Country Postal Code
Phone Number Fax Number
Email Address
Additional Assignee Data may be generated within this form by selecting the Add
button.
Signature:
A signature of the applicant or representative is required in accordance with 37 CFR 1.33 and 10.18. Please see 37
CFR 1.4(d) for the form of the signature.
Signature Date (YYYY-MM-DD)
First Name Last Name Registration Number
This collection of information is required by 37 CFR 1.76. The information is required to obtain or retain a benefit by the public which
is to file (and by the USPTO to process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.14. This
collection is estimated to take 23 minutes to complete, including gathering, preparing, and submitting the completed application data
sheet form to the USPTO. Time will vary depending upon the individual case. Any comments on the amount of time you require to
complete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S. Patent and
Trademark Office, U.S. Department of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. DO NOT SEND FEES OR
COMPLETED FORMS TO THIS ADDRESS. SEND TO: Commissioner for Patents, P.O. Box 1450, Alexandria, VA 22313-1450.
010030-002710US
SYSTEM FOR DETECTION OF BODY MOTION
The Regents of the University of California
1111 Franklin Street, 12th Floor
Oakland CA
US 94607-5200i
35809Kulas
/Charles J. Kulas/ 2009-12-03
Charles
Privacy Act Statement
EFS Web 2.2.2
The Privacy Act of 1974 (P.L. 93-579) requires that you be given certain information in connection with your submission of the attached form related to
a patent application or patent. Accordingly, pursuant to the requirements of the Act, please be advised that: (1) the general authority for the collection
of this information is 35 U.S.C. 2(b)(2); (2) furnishing of the information solicited is voluntary; and (3) the principal purpose for which the information is
used by the U.S. Patent and Trademark Office is to process and/or examine your submission related to a patent application or patent. If you do not
furnish the requested information, the U.S. Patent and Trademark Office may not be able to process and/or examine your submission, which may
result in termination of proceedings or abandonment of the application or expiration of the patent.
The information provided by you in this form will be subject to the following routine uses:
1. The information on this form will be treated confidentially to the extent allowed under the Freedom of Information Act (5 U.S.C. 552)
and the Privacy Act (5 U.S.C. 552a). Records from this system of records may be disclosed to the Department of Justice to determine
whether the Freedom of Information Act requires disclosure of these records.
2. A record from this system of records may be disclosed, as a routine use, in the course of presenting evidence to a court, magistrate, or
administrative tribunal, including disclosures to opposing counsel in the course of settlement negotiations.
3. A record in this system of records may be disclosed, as a routine use, to a Member of Congress submitting a request involving an
individual, to whom the record pertains, when the individual has requested assistance from the Member with respect to the subject matter of
the record.
4. A record in this system of records may be disclosed, as a routine use, to a contractor of the Agency having need for the information in
order to perform a contract. Recipients of information shall be required to comply with the requirements of the Privacy Act of 1974, as
amended, pursuant to 5 U.S.C. 552a(m).
5. A record related to an International Application filed under the Patent Cooperation Treaty in this system of records may be disclosed,
as a routine use, to the International Bureau of the World Intellectual Property Organization, pursuant to the Patent Cooperation Treaty.
6. A record in this system of records may be disclosed, as a routine use, to another federal agency for purposes of National Security
review (35 U.S.C. 181) and for review pursuant to the Atomic Energy Act (42 U.S.C. 218(c)).
7. A record from this system of records may be disclosed, as a routine use, to the Administrator, General Services, or his/her designee,
during an inspection of records conducted by GSA as part of that agency's responsibility to recommend improvements in records
management practices and programs, under authority of 44 U.S.C. 2904 and 2906. Such disclosure shall be made in accordance with the
GSA regulations governing inspection of records for this purpose, and any other relevant (i.e., GSA or Commerce) directive. Such
disclosure shall not be used to make determinations about individuals.
8. A record from this system of records may be disclosed, as a routine use, to the public after either publication of the application pursuant
to 35 U.S.C. 122(b) or issuance of a patent pursuant to 35 U.S.C. 151. Further, a record may be disclosed, subject to the limitations of 37
CFR 1.14, as a routine use, to the public if the record was filed in an application which became abandoned or in which the proceedings were
terminated and which application is referenced by either a published application, an application open to public inspections or an issued
patent.
9. A record from this system of records may be disclosed, as a routine use, to a Federal, State, or local law enforcement agency, if the
USPTO becomes aware of a violation or potential violation of law or regulation.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
PATENT APPLICATION
SYSTEM FOR DETECTION OF BODY MOTION
INVENTORS: Ruzena Bajcsy, a citizen of the USA, residing at: 665 Soda Hall Berkeley, CA 94720 Allen Y. Yang, a citizen of China, residing at: 307 Cory Hall Berkeley CA 94720 S. Shankar Sastry, a citizen of the USA, residing at: 514 Cory Hall Berkeley, CA 94720 Roozbeh Jafari, a citizen of Iran, residing at: 800 W. Campbell Rd., EC33 Richardson, TX 75080
Please direct communications to:
Trellis Intellectual Property Law Group, PC
1900 Embarcadero Rd. Suite 109
Palo Alto, CA 94303
Phone: 650-842-0300
ASSIGNEE: The Regents of the University of California
ENTITY: Small
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
1
SYSTEM FOR DETECTION OF BODY MOTION
Acknowledgement of Government Support
[01] This invention was made with Government support under Office of US Army
Research Laboratory Grant No. MURIW911NF06-1-0076. The Government may have
certain rights to this invention.
Claim of Priority
[02] This application claims priority from U.S. Provisional Patent Application Serial
No. 61/119,861, entitled SYSTEM FOR DETECTION OF BODY MOTION, filed on
December 4, 2008, which is hereby incorporated by reference as if set forth in full in this
application for all purposes.
Copyright Disclaimer
[03] A portion of the disclosure recited in the specification contains material which
may be subject to copyright protection. Specifically, a functional language such as
computer source code, pseudo-code or similar executable or design language may be
provided. The copyright owner has no objection to the facsimile reproduction of the
specification as filed in the Patent and Trademark Office. Otherwise all copyright rights
are reserved.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
2
Background
[04] Motion sensors and specialized processing can be used to measure and classify
the actions of persons or objects. For example, multiple sensors can be placed at body
locations such as wrists, ankles, midsection, etc. By analyzing the motion measured by
each sensor the subject’s overall body movement or action can be determined.
[05] Some sensor-based action recognition approaches utilize a single sensor while
others use multiple sensors mounted in different locations to improve the accuracy of
overall action recognition. Action recognition systems typically include feature extraction
and classification processing that can be either distributed or centralized. However,
conventional approaches may not have sufficient accuracy in recognizing the actions of a
body or object for many modern applications.
[06] Human action detection is useful in many applications such as medical-care
monitoring, athlete training, tele-immersion, human-computer interaction, virtual reality,
motion capture, etc. In some applications, such as medical-care monitoring that takes
place in a user’s home, it may be desirable to maintain a low-cost system with a minimal
number of sensors, and to reduce resource use such as processing power, bandwidth, cost,
etc., while still maintaining desired accuracy and performance.
Brief Description of the Drawings
Figure 1 illustrates an example wireless body sensor arrangement.
Figure 2 illustrates an example motion sensor system.
Figure 3 illustrates an example of action duration variations.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
3
Figure 4 illustrates example waveforms of accelerometer and gyroscope readings for two
repetitive stand-kneel-stand actions.
Figure 5 illustrates an example one-dimensional manifold model using a two-dimensional
subspace.
Figure 6 illustrates an example sparse l1 solution for a stand-to-sit action on a waist
sensor with corresponding residuals.
Figure 7 illustrates example waveforms for a multiple segmentation hypothesis on a wrist
sensor node.
Figure 8 illustrates an example invalid representation waveform.
Figure 9 illustrates a flow diagram of an example method of determining a motion using
distributed sensors.
Figure 10 illustrates example waveforms for an x-axis accelerometer reading for a stand-
sit-stand action.
Figure 11 illustrates example waveforms for an x-axis accelerometer reading for a sit-lie-
sit action.
Figure 12 illustrates example waveforms for an x-axis accelerometer reading for a bend
down action.
Figure 13 illustrates example waveforms for an x-axis accelerometer reading for a kneel-
stand-kneel action.
Figure 14 illustrates example waveforms for an x-axis accelerometer reading for a turn
clockwise then counter action.
Figure 15 illustrates example waveforms for an x-axis accelerometer reading for a turn
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
4
clockwise 360o action.
Figure 16 illustrates example waveforms for an x-axis accelerometer reading for a turn
counter clockwise 360o action.
Figure 17 illustrates example waveforms for an x-axis accelerometer reading for a jump
action.
Figure 18 illustrates example waveforms for an x-axis accelerometer reading for a go
upstairs action.
Figure 19 illustrates basic components and subsystems in a basic description of a system
suitable for practicing the invention.
Detailed Description of Embodiments of the Invention
[07] In particular embodiments, body motions are determined by using one or more
distributed sensors or sensor nodes. Although a preferred embodiment of the invention
uses accelerometer and global positioning system (GPS) position sensors, features
described herein may be adaptable for use with any other suitable types of sensors or
position sensing systems such as, triangulation or point-of-reference sensors (e.g.,
infrared, ultrasound, radio-frequency, etc.), imaging (e.g., video image recognition),
mechanical sensing (e.g., joint extension, shaft encoders, linear displacement, etc.),
magnetometers or any other type of suitable sensors or sensing apparatus. Sensors may be
included with other functionality such as in a cell phone or other electronic device. In
general, various modifications, substitutions or other variations from the particular
embodiments described herein will be acceptable and are included within the scope of the
claims.
[08] In one embodiment of the present invention, a body action recognition system
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
5
functions on a single cell phone device, such as Apple iPhone®, GOOGLE gPhoneTM
,
Nokia N800/900TM
, etc. Other mobile devices that include motion sensors can also be
employed, such as PDA’s, personal navigation systems, etc. While some functionality is
possible with standard cell phones, smart phones typically includes integrated motion
sensors, a processor and memory. These components can be sufficient to support an
implementation of the body action recognition algorithm. Particular embodiments may
be able to employ whatever capacity a particular cell phone device happens to be
equipped with to make an evaluation of the owner's physical movements. By example,
ranging location can be provided by cell towers, WiFi locations, GPS sensors, etc.
Magnetometers providing magnetic North indication can be employed by the present
invention, as well as a gyroscope and other position sensors. Wireless links which
provide useful input to the presents invention can include Bluetooth, ZigBeeTM
, etc.
[09] The cell phone can be placed at a specified body position such as around the
waist or the neck. The software reads motion sensor data directly from the onboard
sensor, and executes a classification algorithm via the processor. Information on the
statistics or nature of the action classification can be visually or audibly presented on the
device to indicate to the user activity level, warnings of dangerous situations (e.g.,
unsteady gait), health status or other information that can be derived from the action
recognition.
[10] In addition, a wireless connection between the cell phone and a base station
computer can be established to transmit and store classification results. When potentially
harmful human actions are detected (e.g., a fall), alert information (e.g., nature of alert,
location, verbal requests) can be transmitted to the monitoring station that can be
subsequently forwarded to emergency responders, preferred personal contacts, health
care personal or other preferred contacts. During an alert event, continuous sensor data
can be transmitted to a base station that has a more powerful processing capability (more
powerful processor and access to a larger human actions database) to validate the human
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
6
action classification and reduce false alerts.
[11] In one embodiment, sensors are worn on a user’s body. Each sensor’s data is
relayed to a local, in-home, base station and then to a remote managing facility.
Preliminary processing can be performed at one or more points in the transfer or relay of
sensor data. For example, sensor data can be subjected to sensor-level processing by
associating one or more of the sensors with resources such as a digital processor,
memory, etc. In one approach, one or more sensors are included in a sensor node
assembly (when highly miniaturized may be referred to as a “mote”) that can include
processing resources and data communication resources such as a wireless transceiver. A
body-level controller that includes functions such as a wireless transceiver, cell phone,
personal digital assistant (PDA), GPS unit or other customized unit or device worn on the
body can also perform preliminary processing in addition to, or in place of, the sensor-
level processing. A local base station can receive the sensor data and perform additional
processing. The local sensor data is transferred to the remote managing facility where
further processing and analysis can be performed to make a final determination of body
motion or actions based on the sensor data.
[12] In one embodiment, the preliminary processing at the sensor, body or local
levels acts to analyze and filter data that is not deemed to be of substantial significance in
ultimate determination of a body motion or action. Other actions can be performed by
the preliminary processing such as optimizing, calibrating (e.g., normalizing) or
otherwise adjusting the raw sensor data in order to aid in efficient motion analysis.
Sensor data may be combined or transferred at the sensor, body, local or other lower
levels in order to facilitate analysis.
[13] In a preferred embodiment feature extraction and classification functions can be
performed at various levels (e.g., local or global) in the system in order to reduce
communication bandwidth requirements and sensor node power consumption. In the
preferred embodiment a common classification approach can be used at the local and
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
7
global level simplifying system design while also improving classification accuracy. By
modeling the distribution of multiple classes as a mixture subspace model, i.e., one
subspace for each action class, we seek the sparest linear representation of the sample
with respect to all the training examples. In this model the dominant coefficients in the
sparse representation correspond to the class of the test sample. If the action cannot be
classified locally the action can be transmitted to a global classifier that uses the same
structure but incorporates additional senor samples to improve classification accuracy.
This method is scalable, i.e., multiple classification levels can be used, robust from the
viewpoint that the processing structure does change when sensors are added or removed,
while at the same time it minimizes communication requirements.
[14] Figure 19 illustrates basic parts of a system for performing body motion
identification according to embodiments of the invention. In Figure 19, system 1900
includes body sensors 1920 worn by user 1910. In this example, there are five sensors
located on the right and left wrists as 1920-1 and 1920-2, respectively; right and left
ankles as 1920-3 and 1920-4, respectively, and midsection 1920-5. The sensors are
coupled to a central body controller 1930. Data relay between the sensors and central
body controller can be by wired or wireless communication. If desired, one or more
sensors can be coupled to additional sensor resources such as by using node assemblies to
provide data processing or other functions. Similarly, data transfers between the central
body controller and the local base station can be by wireless communications such as
Bluetooth, Zigbee, Wi-Fi, or the like. Note that, in general, any suitable type of
communication link may be used among any one or more components in the system.
[15] Central controller 1930 relays the information to an access point or local base
station 1940 that is located in or near to the house or other enclosure 1950. A data link
between the local base station is provided via Internet 1960 to managing entity 1970.
Managing entity 1970 uses the received sensor data and database 1980 in order to make a
determination of an action performed by user 1910. Managing entity 1970 can also send
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
8
data to the local subsystems such as the local base station, central body controller,
sensors, motes, discussed above, in order to control, interrogate or perform other
functions with the subsystems.
[16] Managing entity 1970 can interface with many different local subsystems such
as outdoor user 1990 having a similar set of sensors as those described above for user
1910. Outdoor user 1990’s sensor data is provided to an outdoor base station 1992.
Outdoor base station can be, for example, a cellular network site, satellite in a satellite
telephone network, radio-frequency communication, etc. Many such indoor and outdoor
users can be managed by a single managing entity as illustrated by multiple users at 1994.
Although specific numbers and types of components are illustrated, it should be apparent
that many variations in type and number are possible to achieve the functionality
described herein.
[17] In one embodiment, managing entity 1970 can provide communications to user
1910 via a display and user input device (a “user interface”). For example, the central
body controller can be a cell phone having a display, pointing device, touch screen,
numeric keypad, QWERTY keyboard, etc. Other devices can be used to provide a user
interface such as a desktop personal computer, laptop, sub-notebook, ultra-portable
computing device, etc. (not shown).
[18] As described herein (i.e., in this specification and in the included documents),
preliminary processing may be performed at any of the local subsystems or at other
locations prior to the data reaching managing entity 1970 in order to reduce the amount
of data that is relayed downstream. For example, data filtering (i.e., discarding) can occur
at low levels in the system such as at the sensor, sensor node, central body controller or
other local level of operation. In a preferred embodiment, the computational approach
described herein allows functions such as feature extraction and classification to be used
to identify false data and “outlier” data. Where false data is data that is not desirable for
action classification and wherein outlier data is data that results in a motion classification
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
9
that is not a motion that is of-interest to the system. For example, if the only
classifications that are desired to be detected are standing, sitting and walking, then data
that results in a determination of a running motion is outlier data. The false data and
outlier data can be prevented from further propagation and use in the system by
performing local preliminary processing. This improves the efficiency of data processing
and can reduce data communication bandwidth requirements and power consumption.
[19] The use of some or all of the functionality described herein allows the system to
adapt to changes in the deletion, addition or modification of sensors and sensor data. For
example, the Adaptive Global Recognition that uses Distributed Sparsity Classifier
(DSC) functions (portions of which may operate at any point in the system) allows the
system to continue to perform effectively when a sensor is turned off, removed,
malfunctioning, broken or otherwise is halted or impaired in its operation. In some cases,
as described herein, performance may actually improve after a sensor is removed or shut
off.
[20] Thus, one benefit of the system is that a user can modify the sensor arrangement
and the system can automatically adjust to perform with the new arrangement. For
example, if a user develops a skin irritation where a sensor is mounted and removes the
sensor the system can adapt to the new configuration and still maintain motion
identification. In this case, once the managing entity determines that a sensor is missing it
can send a message to the user to inform the user of the missing sensor data. The user can
then reply to indicate that the user intended that the system be modified by removing a
sensor or the user may be alerted that a sensor has stopped without the user’s intent.
[21] This type of sensor placement or operation modification in the field is a benefit
to ongoing work with the system. “Hot plugging/unplugging” of sensors can be
performed where a sensor is added or removed without having to power down other parts
of the system or require user modification of system software. Since the managing entity
can detect such changes (while still maintaining operations in view of the changes) the
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
10
managing entity can then communicate with the user to make sure that the changes were
intended and the results after the changes are acceptable. Another feature allows the
managing entity to turn each sensor on or off remotely in order to test system adaptability
before a change is made.
[22] In one embodiment, one or more sensors are coupled to a body, where each
sensor is positioned at about a designated location on the body, and where each sensor is
configured to acquire motion data related to movement of the designated location on the
body and at which the sensor is positioned, and to reduce the motion data into
compressed and transmittable motion data; and a base station configured to receive the
compressed motion data via wireless communication from at least one of the plurality of
sensors, the base station being further configured to remove outlier information from the
received motion data, and to match the received motion data to a predetermined action,
where the predetermined action indicates a movement of the body.
[23] In one embodiment, a method can include: acquiring motion data related to
movement of a designated location on a body using a sensor ; reducing the motion data
into compressed motion data; transmitting the compressed motion data to a base station
using a wireless connection (the base station may be integrated into the sensor as a
single module); removing outlier information from the transmitted motion data to create
outlier rejected motion data; and matching the outlier rejected motion data to a
predetermined action for indicating a movement of the body.
[24] A distributed recognition approach for classification of movement actions using
an attachable motion sensing and processing network is described herein. For example,
the motion sensor network can be attached to, or wearable by, a human being, and the
sensor network can be relatively low-bandwidth (e.g., in a range of about 250 kbit per
second at 2.4 GHz for the IEEE 802.15.4 protocol). A set of pre-segmented motion
sequences may be utilized as training examples, and an algorithm can substantially
simultaneously segment and classify such human actions. Further, particular
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
11
embodiments can also reject outlying actions that may not be found in the training set.
[25] The classification in particular embodiments can be operated in a distributed
fashion using individual sensors on a body, and a base station processing system that is
located on the body or remote from the body under detection. In the particular
embodiment, the distribution of multiple action classes satisfies a mixture subspace
model, with one subspace for each action class. Given a new test sample, a relatively
sparse linear representation of the sample with respect to the training examples can be
acquired. In this approach, dominant coefficients in the linear representation may
correspond to an action class of the test sample. Thus, membership of the dominant
coefficients may be encoded in the linear representation. Further, convex optimization
solvers are used to compute such representation via l1-minimization, and have been
known to be very efficient in processing high-dimensional data in linear or quadratic
algorithm complexity. For example, by using up to 8 body sensors, an algorithm in
particular embodiments can achieve state-of-the-art accuracy of about 98.8% on a set of
12 action categories (or with one body sensor, the algorithm can achieve accuracy of
approximately 60 to 90%. However, particular embodiments can support a relatively
large number of different actions (e.g., 10, 20, etc.). In addition, the recognition precision
may decrease gracefully using smaller subsets of sensors, validating distributed
framework robustness.
[26] Particular embodiments can also utilize wired or wireless communication
between each sensor node and the base station. Further, different subsets of sensors that
are available (e.g., due to dropped wireless signals) can be accommodated. Software can
be used on each sensor node for local computations, and on a central computer or base
station. Feature selection or compression of data obtained from each sensor node can be
performed to reduce information. Overall performance in particular embodiments is
gained from a combination of sensor accuracy and outlier rejection.
[27] Applications of particular embodiments include: (i) monitoring activities in
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
12
elderly people, the disabled and the chronically ill (e.g., remote over a network) for
nursing homes and hospitals; (ii) hospital emergency room monitoring for nursing
coverage; (iii) diagnosis of diseases (e.g., Parkinson’s, etc.); (iv) monitoring of prisoners
or solders (e.g., where sensors are embedded in uniforms), such as via base station
monitoring; (v) athletic training, (vi) monitoring patients in clinical drug studies, (vii)
monitoring of animal activities and (viii) machine monitoring. Of course, particular
embodiments are also amenable to many more applications.
Human Action Recognition Introduction
[28] Human action recognition can be achieved using a distributed wearable motion
sensor network. One approach to action recognition is computer vision. As compared
with a model-based or appearance-based vision system, various aspects distinguish the
body sensor network approach of particular embodiments. In one aspect, the system does
not require adding cameras or other sensor instrumentation to the environment. In
another aspect, the system has the necessary mobility to support continuous monitoring
of a subject during the daily activities of the subject. In another aspect, and with the
continuing miniaturization of mobile processors and sensors, it has become possible to
manufacture wearable sensor networks that densely cover the human body to record and
analyze relatively small movements of the human body (e.g., breathing, spine
movements, heart beats, etc.). Such sensor networks can be used in applications, such as
medical-care monitoring, athlete training, tele-immersion, and human-computer
interaction (e.g., integration of accelerometers in Wii game controllers, smart phones,
etc.).
[29] Figure 1 illustrates an example wireless body sensor arrangement system 100.
For example, sensors 102 can be positioned at designated locations on a body, such as
sensor 102-1 at a waist location, sensor 102-2 at a left wrist location, sensor 102-3 at a
left upper arm location, sensor 102-4 at a right wrist location, sensor 102-5 at a right
ankle location, sensor 102-6 at a right knee location, sensor 102-7 at a left ankle
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
13
location, and sensor 102-8 at a left knee location.
[30] In some sensor networks, the computation performed by the sensor node is
fairly simple: (i) extract and filter sensor data and (ii) transmit the data to a
microprocessor-based server over a network for processing. In particular embodiments,
distributed pattern recognition is employed, whereby each sensor node can classify local
information. When the local classification detects a possible object or event, the sensor
node can become fully active and transmit the measurement to a centralized server. If
wireless interconnection is employed, it is desirable to reduce power consumption
because, e.g., the power consumption required to successfully send one byte over a
wireless channel is equivalent to executing between 1x103 and 1x10
6 instructions on an
onboard processor. Thus, such sensor networks should reduce communication, while
preserving recognition performance. On the server side, a global classifier can receive
data from the sensor nodes and further optimize the classification. The global classifier
can be more computationally involved than the distributed classifiers, but the global
classifier may also adapt to changes of available network sensors due to local
measurement error, sensor failure, and communication congestion.
[31] Feature extraction in wearable sensor networks can include three major types of
features. The first such feature can involve relatively simple statistics of a signal
sequence, such as the max, mean, variance, and energy. The second such feature may be
computed using fixed filter banks (e.g., fast Fourier transform (FFT), finite impulse
response filters, wavelets, etc.). The third such feature may be based on classical
dimensionality reduction techniques (e.g., principal component analysis (PCA), linear
discriminant analysis (LDA), etc.).
[32] In terms of classification on the action features, some approaches have used,
e.g., thresholding or k-nearest-neighbor (kNN), due to the simplicity of the algorithms for
mobile devices. Other more sophisticated techniques have also been used, such as
decision trees and hidden Markov models.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
14
[33] For distributed pattern recognition, distributed speech recognition and
distributed expert systems have been used, but for most distributed sensor systems, each
local observation from the distributed sensors is biased and insufficient to classify all
classes of actions. For example, the sensors placed on a lower-body may not perform
well in classification of those actions that mainly involve upper body motions (and vice
versa). Consequently, traditional majority-voting type classifiers may not achieve the best
performance globally.
[34] Figure 2 illustrates an example motion sensor system 200. Any suitable number
of sensors 102 (e.g., 102-1, 102-2, … 102-N) can be located at designated positions on a
body for motion sensing. The sensors 102 that are active may then transmit information
to base station 202. For example, each sensor 102 can include accelerometer 210 and
gyroscope 204. Controller 206 can receive motion data from accelerometer 210 and
gyroscope 204, and may provide information to transmitter 208 for transmission to base
station 202.
[35] Thus, design of a wearable sensor network in particular embodiments can
include: (i) sensors placed at various body locations, which communicate with (ii) a base
station that can communicate with a computer server. For example, the base station and
computer server can be connected through a universal serial bus (USB) port, or any other
suitable connection (e.g., a wireless connection). Further, the sensors and base station
may be built using commercially available products such as Tmote SkyTM
boards from
companies such as Sentilla Corporation of Redwood City, California. Such products can
run software such as TinyOS on an 8 MHz microcontroller with 10K random-access
memory (RAM) and communicates using the 802.15.4 wireless protocol. Each sensor
node can include a triaxial accelerometer and a biaxial gyroscope, which may be attached
to the Tmote SkyTM
board. In this example, each axis is reported as a 12-bit value to the
sensor , thus indicating values in the range of +/- 2g for the accelerometer, and +/- 500o/s
for the gyroscope.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
15
[36] To avoid packet collision in the wireless channel, a time division multiple
access (TDMA) protocol can be used to allocate each sensor node a specific time slot
during which to transmit data. This allows transmission of sensor data at about 20 Hz
with minimal packet loss. To avoid drift in the network, the base station can periodically
broadcast a packet to resynchronize individual timers for sensor each node. The code to
interface with the sensors and transmit data may be implemented directly on a mote using
nesC, a variant of the C programming language. Any other suitable hardware and
software approach can be suitable.
[37] In one example, a set of L wearable sensor nodes with triaxial accelerometers
and biaxial gyroscopes are attached to the human body. For example, denote
5))(),(),(),(),(()( ℜ∈= T
llllll tttztytxta ρθ as the measurement of the five sensors on
node l at time t, and LTT
L
TTtatatata
5
21 ))(,),(),(()( ℜ∈= Κ collects all sensor
measurements. Further, denote lLlaaas
×ℜ∈= 5))(,),2(),1(( Κ as an action sequence of
length l. Given K different classes of human actions, a set of ni training examples
{inii ss ,1, ,,Κ } can be collected for each ith class. The durations of the sequences
naturally may be different. Given a new test sequence s that may contain multiple actions
and possible other outlying actions, a distributed algorithm can be used to substantially
simultaneously segment the sequence and classify the actions.
[38] Solving this problem mainly involves challenges of simultaneous segmentation
and classification, variation of action durations, identity independence, and distributed
recognition. Simultaneous segmentation and recognition from a long motion sequence
can be achieved, where the test sequence may contain other unknown actions that are not
from the K classes. An algorithm in particular embodiments can be robust as to these
outliers.
[39] Figure 3 illustrates an example of action duration variations 300. For variation
of action durations, where the durations of different actions can vary dramatically in
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
16
practice, a difficulty in segmentation of actions may exist in determining duration of a
proper action. In addition to the variation of action durations, different people move
differently for the same actions, for identity independence.
[40] Figure 4 illustrates example waveforms 400 of accelerometer and gyroscope
readings for two repetitive stand-kneel-stand actions. For a test sequence, identity-
independent performance can be seen by excluding the training samples of the same
subject. Figure 4 shows readings of x-axis accelerometers (first and third diagrams) and
x-axis gyroscopes (second and fourth diagrams) from eight distributed sensors on two
repetitive stand-kneel-stand actions or sequences from two subjects.
[41] A distributed recognition system may also consider: (i) how to extract compact
and accurate low-dimensional action features for local classification and transmission
over a band-limited network; (ii) how to classify the local measurement in real time using
low-power processors; and (iii) how to design a classifier to globally optimize the
recognition and be adaptive to the change of the network.
[42] In particular embodiments, a distributed action recognition algorithm can
simultaneously segment and classify 12 human actions using up to 8 wearable motion
sensors. This approach utilizes an emerging theory of compressed sensing and sparse
representation, where each action class can satisfy a low-dimensional subspace model.
For example, a 10-D linear discriminant analysis (LDA) feature space may suffice to
locally represent 12 action subspaces on each node. If a linear representation is sought to
represent a valid test sample with respect to all training samples, the dominant
coefficients in the sparsest representation correspond to the training samples from the
same action class, and hence they encode the membership of the test sample.
[43] In one example system, three integrated components can be employed: (i) a
multi-resolution action feature extractor; (ii) fast distributed classifiers via l1-
minimization; and (iii) an adaptive global classifier. Particular embodiments can include
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
17
a method to accurately segment and classify human actions from a continuous motion
sequence. The local classifiers that reject potential outliers can reduce the sensor-to-
server communication requirements by approximately 50%. Further any subsets of the
sensors can be activated or deactivated on-the-fly, due to user control, sensor failure,
and/or network congestion. The global classifier may adaptively update the optimization
process and improve the overall classification upon available local decision.
[44] Particular embodiments can also support a public database and/or benchmark in
order to judge the performance and safeguard the reproducibility of extant algorithms for
action recognition using wearable sensors in pattern recognition. For example, a public
benchmark system may be referred to as a “Wearable Action Recognition Database”
(WARD). Such a database may contain many human or other suitable subjects across
multiple age groups, and be made available via the Internet.
Classification via Sparse Representation
[45] Classification via sparse representation can include an efficient action
classification method on each sensor node, where action sequences are pre-segmented.
Given an action segment of length l from node j, l
jjjj laaas ×ℜ∈= 5))(,),2(),1(( Κ , a
new vector can be defined:
Equation 1: lTT
j
T
j
T
j
S
j laaas 5))(,,)2(,)1(( ℜ∈= Κ& , as the stacking of the l
columns of js (where js can be interchangeably used to denote stacked vector S
js ).
[46] Since the length l can vary among different subjects and actions, l can be
normalized to be substantially the same for all the training and test samples. For
example, this can be achieved by oversampling filtering such as by linear interpolation,
FFT interpolation, etc., or by other suitable techniques. After normalization, the
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
18
dimension of samples sj can be denoted as Dj = 5l. Subsequently, a full-body action
vector v can be defined that stacks the measurement from all L nodes:
Equation 2: DTT
L
TT sssv ℜ∈= ),,,(21
Κ , where lLDDD L 51 =++= Κ .
[47] In particular embodiments, the samples v in an action class may satisfy a
subspace model, called an action subspace. If the training samples {invv ,,1 Κ } of the ith
class sufficiently span the ith action subspace, given a test sample DT
L
Tyyy ℜ∈= ),,( 1 Κ
in the same class i, y can be linearly represented using the training examples of the same
class:
Equation 3:
=
⇔++=
ii
ii
nnLLL
nn
s
s
s
s
s
s
y
y
y
vvy
α
α
α
ααΜΜ
ΛΜΜ
Κ2
1
2
1
1
2
1
2
1
11 .
Also, such linear constraints may also hold on each node j:
j
ii
D
njnjj vsy ℜ∈++= ,1,1 αα Κ .
[48] Complex data, such as human actions, typically includes complex nonlinear
models. The linear models may be used to approximate such nonlinear structures in a
higher-dimensional subspace, as shown in Figure 5 (e.g., a one-dimensional manifold
model 500 using a two-dimensional subspace). Such linear approximation may not
produce good estimation of the distance/similarity metric for the samples on the
manifold. However, as shown in the example below, given sufficient samples on the
manifold as training examples, a new test sample can be accurately represented on the
subspace, provided that any two classes do not have similar subspace models.
[49] To recover label(y), one way is to reformulate the recognition using a global
sparse representation. Since label(y) = i is unknown, y can be represented using all the
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
19
training samples from all K classes:
Equation 4: ( ) Ax
x
x
x
AAAy
K
K =
=Μ
Κ2
1
21 ,
where i
i
nD
niiii vvvA×ℜ∈= ),,,( ,2,1, Κ collects the training samples of class i,
i
i
nT
niiiix ℜ∈= ),,,( ,2,1, ααα Κ collects the corresponding coefficients in Equation 3
above, and nD
A×ℜ∈ , where Knnnn +++= Κ21 . Since y satisfies both Equations 3 and
4, one solution of x in Equation 4 can be TT
ixx )0,0,,0,,0(* ΚΚ= . The solution is
naturally relatively sparse, where on average only 1/K terms in *x are nonzero values.
[50] On each sensor j, solution *x in Equation 4 is also a solution for the
representation:
Equation 5: ( ) xA
x
x
x
AAAyj
K
jjj
j K
)(2
1
)()()(
21=
=Μ
Κ ,
where ij
i
nDjA
×ℜ∈)(
includes row vectors in Ai that correspond to the jth node. Hence,
*x can be solved either globally using Equation 4, or locally using Equation 5, provided
that the action data measured on each sensor node are sufficiently discriminant. Local
classification versus global classification will be discussed in more detail below.
[51] As to local classification in each sensor node, one major difficulty in solving
Equation 5 is the high dimensionality of the action data. In compressed sensing, one
reduces the dimension of a linear system by choosing a linear projection jDd
jR×
ℜ∈ :
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
20
Equation 6: djj
jjjj xAxARyRy ℜ∈=== )()( ~~ & .
For example, these matrices may be computed offline and simply stored on each sensor
node, and Rj may not be computed on the sensor node.
[52] After projection Rj, the feature dimension d may be much smaller than the
number n of all training samples. Therefore, the new linear system of Equation 6 may be
underdetermined. Numerically, stable solutions exist to uniquely recover sparse solutions
*x via l1-minimization:
Equation 7: 1
* minarg xx = subject to xAyj
j
)(~~ = .
[53] In one experiment, multiple projection operators were tested, including PCA,
LDA, and a random project. This experiment resulted in the finding that 10-D feature
spaces using LDA lead to the best recognition in a very low-dimensional space. After the
(sparsest) representation x is recovered, the coefficients can be projected onto each action
subspaces:
Equation 8: nTT
ii xx ℜ∈= )0,0,,0,,0()( ΚΚδ , .,,1 Ki Κ=
Finally, the membership of the test sample yj may be assigned to the class with the
smallest residual:
Equation 9: label2
)( )(~~minarg)( xAyy i
j
jij δ−= .
[54] In one experiment, 12 action categories were designed: (1) stand-to-sit, (2) sit-
to-stand, (3) sit-to-lie, (4) lie-to-sit, (5) stand-to-kneel, (6) kneel-to-stand, (7) rotate-right,
(8) rotate-left, (9) bend, (10) jump, (11) upstairs, and (12) downstairs. More details on an
example experiment setup are shown below.
[55] To implement l1-minimization on a sensor node, suitable fast sparse solvers can
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
21
be used. In testing a variety of methods, such as (orthogonal) matching pursuit (MP),
basis pursuit (BP), LASSO, and a quadratic log-barrier solver, it was found that BP gives
a favorable trade-off between speed, noise tolerance, and recognition accuracy.
[56] To demonstrate the accuracy of the BP-based algorithm on each sensor node
(see, e.g., Figure 1 for example sensor node locations on a body), the actions can be
manually segmented from a set of long motion sequences from three subjects. In total,
there are 626 samples in the data set in this particular example. The 10-D feature
selection is via LDA, and the classification may be substantially identity-independent.
The accuracy of this example classification on each node over 12 action classes is shown
below in Table 1.
Table 1: Recognition accuracy on each node over 12 action classes
Sensor number 1 2 3 4 5 6 7 8
Accuracy (%) 99.9 99.4 99.9 100 95.3 99.5 93 100
[57] Figure 6 illustrates an example sparse l1 solution 600 for a stand-to-sit action on
a waist sensor node (top diagram), with corresponding residuals (bottom diagram). This
represents an example of the estimated sparse coefficients x and its residuals. As an
example of the speed involved, a simulation in MATLAB takes an average 0.03s to
process one test sample on a typical 3G Hz personal computer (PC). This example shows
that if the segmentation of the actions is known, and with no other invalid samples the
sensors can recognize the 12 actions individually with relatively high accuracy. Thus, the
mixture subspace model is a good approximation of the action data. The sparse
representation framework can provide a unified solution for recognizing and segmenting
valid actions, while rejecting invalid actions. Further, this approach is adaptive to the
change of available sensors on the fly.
Distributed Segmentation and Recognition
[58] A multi-resolution action segmentation can be introduced on each sensor node,
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
22
and an estimate of a range of possible lengths for all actions of interest can be obtained
from the training examples. This estimated range can be evenly divided into multiple
length hypotheses: (h1, …, hs). At each time t in a motion sequence, the node tests a set
of s possible segmentations: y(1) = (a(t-h1), …, a(t)), … y(s) = (a(t-hs), …, a(t)), as
shown in Figure 7. Figure 7 illustrates an example waveforms 700 for a multiple
segmentation hypotheses on a wrist sensor node at a given time (or number of samples on
the time domain) t = 150 of a “downstairs” sequence. A good segment is h1, while others
are false segments, and the movement between about t = 250 and about t = 350 represents
an outlying action that the subject performed.
[59] With each candidate, y may again be normalized to length l, and a sparse
representation x may be estimated using l1-minimization, as discussed above. Thus,
based on this sparsity assumption, if y is not a valid segmentation with respect to the
training examples due to either incorrect t or h, or the real action performed is not in the
training classes, the dominant coefficients of its sparsest representation x may not
correspond to any single class. As shown below in Equation 10, a sparsity concentration
index (SCI) can be used:
Equation 10: ].1,0[1
1/)(max)(
11,,1∈
−
−⋅=
=
K
xxKxSCI
jKj δΚ&
[60] If the nonzero coefficients of x are evenly distributed among K classes, then
SCI(x) = 0, while if all the nonzero coefficients are associated with a single class, then
SCI(x) = 1. Therefore, a sparsity threshold τ1 may be introduced and applied to all sensor
nodes, where if SCI(x) > τ1, the segment is a valid local measurement, and its 10-D LDA
features y~ can be sent to the base station. Figure 8 illustrates an example invalid
representation waveform 800, where SCI(x) = 0.13. In Figure 6 above, the action is
correctly classified as "Class 1," where SCI(x) = 0.7.
[61] A global classifier that adaptively optimizes the overall segmentation and
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
23
classification can also be introduced. For example, suppose at time t, and with a length
hypothesis h, the base station receives L' action features from the active sensors ( LL ≤' ).
For example, these features may be from the first L' sensors: '10
'1 )~,~('~ LTT
L
Tyyy ℜ∈= Κ .
Then the global sparse representation x satisfies the following linear system
Equation 11: xAAxRAx
R
R
y
L
'~
'
00
00
'~
'
1
==
=
ΛΛ
ΜΛΜΟΜ
ΚΚ
,
where DdLR
×ℜ∈ '' may be a new projection matrix that only extracts action features from
the first L' nodes. Consequently, an effect of changing active sensors for the global
classification may be formulated via global projection matrix R'. During this
transformation, data matrix A and sparse representation x may remain unchanged. The
linear system of Equation 6 then becomes a special case of Equation 11 when L'=1.
[62] Similar to the outlier rejection criterion on each sensor node in particular
embodiments, a global rejection threshold τ2 can be introduced. If SCI(x) > τ2 in
Equation 11, this is an indication that the most significant coefficients in x are
concentrated in a single training class. Hence y~ may be assigned to that class, and a
corresponding length hypothesis h may provide segmentation of the action from the
motion sequence.
[63] Thus in particular embodiments, an overall algorithm on the sensor nodes and
on the network server may provide a substantially unified solution to segment and
classify action segments from a motion sequence using two simple parameters τ1 and τ2.
Typically, τ1 may be selected to be less restricted than τ2 in order to increase a recall rate,
because passing certain amounts of a false signal to a global classifier may be rejected by
τ2 when the action features from multiple nodes are jointly considered. The formulation
of adaptive classification Equation 11 via a global projection matrix R' and two sparsity
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
24
constraints τ1 and τ2 provides a relatively simple means of rejecting outliers from a
network of multiple sensor nodes. This approach compares favorably to other classical
methods, such as kNN and decision trees, because these methods need to train multiple
thresholds and decision rules when the number L' and the set of available sensors vary in
the full-body action vector TT
L
Tyyy )~,~('~
'1 Λ= .
[64] Further, a change of active sensor nodes can affect l1-minimization and the
classification of the actions. In compressed sensing, the efficacy of l1-minimization in
solving for the sparsest solution x in Equation 11 is characterized by an l0/l
1 equivalence
relation. An example condition for the equivalence to hold is the k-neighborliness of '~A .
As a special case, it can be shown that if x is the sparsest solution in Equation 11 for L' =
L, x may also be a solution for L' < L. Thus, the decrease of L' may lead to sparser
solutions of x.
[65] On the other hand, a decrease in available action features may also make '~y less
discriminant. For example, if a reduction is made to L' = 1, and only a wrist sensor node
is activated, then the l1-solution x may have nonzero coefficients associated to multiple
actions with similar wrist motions, albeit sparser. This is an inherent problem in methods
of classifying human actions using a limited number of motion sensors. In theory, if two
action subspaces in a low-dimensional feature space have a small subspace distance after
the projection, the corresponding sparse representation cannot distinguish the test
samples from the two classes. As will be shown below, reducing the available sensors
can reduce the discriminant power of the sparse representation in a lower-dimensional
space.
[66] Figure 9 illustrates a flow diagram of basic steps in an example method 900 of
determining a motion using distributed sensors. The flow diagram is entered at (902).
Motion data can be acquired in one or more sensors positioned on a body (904). The
motion data can then be reduced to form compressed motion data (906). This
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
25
compressed motion data can then be subjected to LSC classification (908). The steps of
904-906 can be repeated, such as by utilizing TDMA, to receive compressed data from
multiple working sensor nodes, or to iteratively receive data from one or more sensor
nodes. At step 910 a check is made whether the LSC classification has resulted in a valid
motion. If so, step 912 is performed to call the DSC procedure to classify the motion
data. If not, sensor data acquisition is resumed.
[67] Outlier information can be removed from the received motion data. Motion
data with outlier information removed can then be compared to predetermined actions to
indicate movement of the body. At step 914 a check is made as to whether the motion
data can be verified as a valid motion. If so, output classification (918) occurs. Otherwise
the data is rejected (916), thus completing the flow (916).
[68] Note that the processing of the various steps illustrated in Figure 9 can be
performed at any suitable point in the system of Figure 19. In one embodiment,
(discussed in association with Figure 19), a managing entity can perform final
classification and other functions.
[69] In other embodiments different components in the system can be used to
perform various portions of the processing. For example, in a particular embodiment
described in Reference 4, no managing entity need be present. The managing software
can be located within 1930 of Figure 19, the mobile station per subject. Central body
controller 1930 can be used to perform management functions including classification
functions such as LSC classification 908 and DSC classification 912. An action database
or portions thereof, and other code and data can be stored on different components, e.g.,
at nodes such as body sensors 1920-1 to 5 and/or 1930 of Figure 19.
[70] Performance of the system may be validated using a data set collected from,
e.g., three male subjects at ages of 28, 30, and 32. In this particular experiment, 8
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
26
wearable sensor nodes were placed at different body locations, such as shown in Figure 1.
A set of 12 action classes was designed: (1) Stand-to-Sit (StSi); (2) Sit-to-Stand (SiSt);
(3) Sit-to-Lie (SiLi); (4) Lie-to-Sit (LiSi); (5) Stand-to-Kneel (StKn); (6) Kneel-to-Stand
(KnSt); (7) Rotate-Right (RoR); (8) Rotate-Left (RoL); (9) Bend; (10) Jump; (11)
Upstairs (Up); and (12) Downstairs (Down). This system was tested under various action
durations. Toward this end, the subjects were asked to perform StSi, SiSt, SiLi, and LiSi
with two different speeds (slow and fast), and to perform RoR and RoL with two
different rotation angles (90o and 180
o). The subjects were asked to perform a sequence
of related actions in each recording session based on their own interpretations of the
actions. There are 626 actions performed in the data set (see, e.g., Table 3 below for the
numbers in individual classes).
[71] Table 2 below shows precision versus recall of the algorithm with different
active sensor nodes. For these particular experiments, τ1 = 0.2 and τ2 = 0.4. When all
sensor nodes are activated, the algorithm can achieve about 98.8% accuracy among the
extracted actions, and 94.2% detection of the true actions. The performance may
decrease when more sensor nodes become unavailable to the global classifier.
Experimental results show that if one sensor node is maintained on the upper body (e.g.,
sensor 102-2 at position 2 in Figure 1) and one motion sensor node is maintained on the
lower body (e.g., sensor 102-7 at position 7 in Figure 1), the algorithm can still achieve
about 94.4% precision and 82.5% recall. Further, on average the 8 distributed classifiers
that reject invalid local measurements reduce the node-to-station communication by
about 50%.
Table 2: Precision versus recall with different sets of activated sensors
Sensors 2 7 2,7 1,2,7 1-3,7,8 1-8
Precision [%] 89.8 94.6 94.4 92.8 94.6 98.8
Recall [%] 65 61.5 82.5 80.6 89.5 94.2
[72] As to the relatively low recall on single sensor nodes (e.g., 102-2 and 102-7),
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
27
this is due to the relatively large number of potential outlying segments presented in a
long motion sequence (see, e.g., Figure 7). Also, the difference may be compared using
two “confusion” tables (see, e.g., Tables 3 and 4 below). As shown in these examples, a
single node (e.g., 102-2) that is positioned on a left wrist performed poorly mainly on two
action categories: Stand-to-Kneel and Upstairs-Downstairs, both of which involve
significant movements of the lower body, but not the upper body. This is one reason for
the low recall shown in Table 2 above. On the other hand, for the actions that are
detected using sensor node 102-2, the system can still achieve about 90% accuracy, thus
demonstrating the robustness of the distributed recognition framework.
Table 3: Confusion table using sensors 102-1 through 102-8
Class (total) 1 2 3 4 5 6 7 8 9 10 11 12
1: StSi (60) 60 0 0 0 0 0 0 0 0 0 0 0
2: SiSt (60) 0 52 0 0 0 0 0 0 0 0 0 0
3: SiLi (62) 1 0 58 0 0 0 0 0 0 0 0 0
4: LiSi (62) 0 0 0 60 0 0 0 0 0 0 0 0
5: Bend (30) 1 0 0 0 29 0 0 0 0 0 0 0
6: StKn (33) 0 0 0 0 0 31 0 0 0 0 0 0
7: KnSt (30) 0 0 0 0 0 0 30 0 0 0 1 0
8: RoR (95) 0 0 0 0 0 0 0 93 0 0 0 1
9: RoL (96) 0 0 0 0 0 0 0 0 96 0 0 0
10: Jump (34) 0 0 0 0 0 0 0 0 0 31 0 0
11: Up (33) 0 0 0 0 0 0 0 0 0 0 24 0
12: Down (31) 0 0 0 0 0 0 0 0 0 0 3 26
[73] Examples of classification results are shown to demonstrate algorithm accuracy
using all 8 sensor nodes (e.g., 102-1 through 102-8). Each of Figures 10-18 plots the
readings from x-axis accelerometers on the 8 sensor nodes. The segmentation results are
then superimposed. Indications of correctly classified action segment locations, as well
as false classification locations are shown, and some valid actions may not be detected by
the algorithm.
[74] Figure 10 illustrates example waveforms 1000 for an x-axis accelerometer
reading for a stand-sit-stand action. Figure 11 illustrates example waveforms 1100 for an
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
28
x-axis accelerometer reading for a sit-lie-sit action. Figure 12 illustrates example
waveforms 1200 for an x-axis accelerometer reading for a bend down action. Figure 13
illustrates example waveforms 1300 for an x-axis accelerometer reading for a kneel-
stand-kneel action. Figure 14 illustrates example waveforms 1400 for an x-axis
accelerometer reading for a turn clockwise then counter action. Figure 15 illustrates
example waveforms 1500 for an x-axis accelerometer reading for a turn clockwise 360o
action. Figure 16 illustrates example waveforms 1600 for an x-axis accelerometer
reading for a turn counter clockwise 360o action. Figure 17 illustrates example
waveforms 1700 for an x-axis accelerometer reading for a jump action. Figure 18
illustrates example waveforms 1800 for an x-axis accelerometer reading for a go upstairs
action.
Table 4: Confusion table using sensor 102-2
Class (total) 1 2 3 4 5 6 7 8 9 10 11 12
1: StSi (60) 37 0 2 0 0 0 0 4 0 0 0 0
2: SiSt (60) 0 50 0 0 0 0 0 0 2 0 0 0
3: SiLi (62) 1 0 38 0 0 0 0 0 0 0 0 0
4: LiSi (62) 0 7 0 32 0 0 0 0 0 0 0 0
5: Bend (30) 0 1 0 0 26 0 0 0 0 0 0 0
6: StKn (33) 0 1 0 1 0 7 0 2 3 0 0 0
7: KnSt (30) 0 1 0 0 1 0 6 3 3 0 0 0
8: RoR (95) 0 0 0 0 0 0 0 92 0 0 0 0
9: RoL (96) 0 0 0 0 0 0 0 0 95 0 0 0
10: Jump (34) 0 0 0 0 0 0 0 0 1 24 0 0
11: Up (33) 0 0 0 0 0 0 0 1 8 0 0 0
12: Down (31) 0 0 0 0 0 0 1 0 3 0 0 0
Cell Phone Example
[75] In one experiment, a cellular phone that incorporated a three axis accelerometer
(Apple iPhone®) was utilized as a sensor node for human action classification. A
software application was loaded on the iPhone that enabled streaming of three axis
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
29
accelerometer data via a Wi-Fi (IEEE 802.11 protocol) data link to a PC. Five (5)
subjects were studied. Wearing the iPhone attached to a lanyard worn around the neck,
subjects performed a series of six action categories: (1) stand-to-sit, (2) sit-to-stand, (3)
walk, (4) upstairs, (5) downstairs and (6) stand still. A subset of the accelerometer data
collected was hand segmented to create training examples for each human action class.
Continuous, non segmented accelerometer data, was processed using LSC and the
predicted action class was compared to the known actions recorded during the test.
Human action classification accuracy of 86% was achieved using LDA projection of two
second data segments.
Conclusion
[76] Building on emerging compressed sensing theory, particular embodiments
include a distributed algorithm approach for segmenting and classifying human actions
using a wearable motion sensor network. For example, a framework provides a unified
solution based on l1-minimization to classify valid action segments and reject outlying
actions on the sensors and the base station. The example experiments show that a set of
12 action classes can be accurately represented and classified using a set of 10-D LDA
features measured at multiple body locations. Further, the proposed global classifier can
adaptively adjust the global optimization to boost the recognition upon available local
measurements.
[77] Further details are shown in the papers included with this application. For
example, a design description of an example implementation of an LSC procedure is
illustrated at page 6 of Reference 1. A design description of a DSC procedure is shown at
page 8 of Reference 1.
[78] Although the description has been described with respect to particular
embodiments thereof, these particular embodiments are merely illustrative, and not
restrictive. For example, particular embodiments may also be applied to classify a broad
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
30
body of biological signals, such as electrocardiogram signals, respiratory patterns, and
waveforms of brain and/or other organ activities, via respective biological sensors. Such
actions or motions may be by humans or other biological animals and functionality
described herein may even be applied for motion by mechanical entities such as robots or
other machines.
[79] Other applications may benefit from aspects or embodiments of the invention.
For example, multiple sensors could be placed at designated positions on manufacturing
equipment and by analyzing the data captured by each sensor, the “health” of the
production line could be determined. Another example is a power management
application where local conditions of power generation, consumption, loss and other
factors are monitored and used to adjust the performance of the power system,
performing local classification and/or data filtering and aggregation may reduce the
amount of data traffic, reduce complexity of analysis, improve speed or efficiency of the
system, or provide other benefits. In general, any system that uses distributed data-
sensing and data relay may benefit from features described herein.
[80] Higher classification accuracy may be realized using features described herein.
For example, as shown in the accompanying References, a higher human action
classification accuracy may be realized such as in the range 95% to 99% versus 85% to
90% for other approaches. False alarms nay be reduced, which has been a significant
problem plaguing other systems.
[81] Power consumption may be reduced by requiring less data transmission
between nodes and base station since only outlier classifications require transmission and
global consideration. Thus, continuous streaming of data need not be required.
[82] The system accuracy can degrade gracefully with the loss of sensor nodes. The
system design is more easily scalable as additional nodes can be added to improve
performance or add functionality.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
31
[83] Embodiments described herein can support the expansion or introduction of
new training data to help the system to adapt or learn as it is used. This can be important
since most previous systems have been "one-size-fits-all" and as a consequence accuracy
and false alarms have been an issue.
[84] In one embodiment, the approach requires only two parameters to optimize
system performance - outlier threshold at the local classifier and at the global level. Other
approaches can require significant tuning to obtain good performance. Embodiments can
use the same or similar classifier algorithms at the mote (i.e., node, sensor or body) level
and at the global level, simplifying the system development.
[85] Any suitable programming language can be used to implement the routines of
particular embodiments including C, C++, Java, assembly language, etc. Different
programming techniques can be employed such as procedural or object oriented. The
routines can execute on a single processing device or multiple processors. Although the
steps, operations, or computations may be presented in a specific order, this order may be
changed in different particular embodiments. In some particular embodiments, multiple
steps shown as sequential in this specification can be performed at the same time.
[86] Particular embodiments may be implemented in a computer-readable storage
medium for use by or in connection with the instruction execution system, apparatus,
system, or device. Particular embodiments can be implemented in the form of control
logic in software or hardware or a combination of both. The control logic, when
executed by one or more processors, may be operable to perform that which is described
in particular embodiments.
[87] Particular embodiments may be implemented by using a programmed general
purpose digital computer, by using application specific integrated circuits, programmable
logic devices, field programmable gate arrays, optical, chemical, biological, quantum or
nanoengineered systems, components and mechanisms may be used. In general, the
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
32
functions of particular embodiments can be achieved by any means as is known in the art.
Distributed, networked systems, components, and/or circuits can be used.
Communication, or transfer, of data may be wired, wireless, or by any other means.
[88] It will also be appreciated that one or more of the elements depicted in the
drawings/figures can also be implemented in a more separated or integrated manner, or
even removed or rendered as inoperable in certain cases, as is useful in accordance with a
particular application. It is also within the spirit and scope to implement a program or
code that can be stored in a machine-readable medium to permit a computer to perform
any of the methods described above.
[89] As used in the description herein and throughout the claims that follow, “a”,
“an”, and “the” includes plural references unless the context clearly dictates otherwise.
Also, as used in the description herein and throughout the claims that follow, the meaning
of “in” includes “in” and “on” unless the context clearly dictates otherwise.
[90] Thus, while particular embodiments have been described herein, latitudes of
modification, various changes, and substitutions are intended in the foregoing
disclosures, and it will be appreciated that in some instances some features of particular
embodiments will be employed without a corresponding use of other features without
departing from the scope and spirit as set forth. Therefore, many modifications may be
made to adapt a particular situation or material to the essential scope and spirit.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
33
We Claim:
1. A method for obtaining data in a distributed sensor system, wherein
aggregated sensor data is used to achieve a result, the method comprising:
obtaining sensor data from first and second sensors at a local site;
using local processing to perform at least a portion of first classification of
the sensor data.
2. The method of claim 1, wherein the first classification includes a
Distributed Sparsity Classifier (DSC) function.
3. The method of claim 2, wherein the first classification includes a Local
Sparsity Classififier (LSC) function.
4. The method of claim 3, wherein the distributed sensor system includes
nodes, wherein each node comprises a particular sensor and resources associated with the
particular sensor, the method further comprising:
performing the LSC function at one or more nodes.
5. The method of claim 1, wherein the distributed sensor system is used in
human body action detection.
6. The method of claim 1, further comprising:
using additional processing to perform at least a portion of second
classification of the sensor data.
7. The method of claim 6, wherein the additional processing is performed
at least in part by a managing entity at a location remote from where the local processing
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
34
is performed.
8. The method of claim 1, further comprising:
determining that a performance of a sensor has changed; and
adapting a classification of data in response to the determining.
9. An apparatus for obtaining data in a distributed sensor system, wherein
aggregated sensor data is used to achieve a result, the apparatus comprising:
a processor;
a processor-readable medium including one or more instructions for:
obtaining sensor data from first and second sensors at a local site; and
using local processing to perform at least a portion of first classification of
the sensor data.
10. A processor-readable medium including instructions executable by a
processor for obtaining data in a distributed sensor system, wherein aggregated sensor
data is used to achieve a result, the processor-readable medium comprising one or more
instructions for:
obtaining sensor data from first and second sensors at a local site; and
using local processing to perform at least a portion of first classification of
the sensor data.
Attorney Docket No.: 010030-002710US
Client Reference No.: B08-082-2
35
SYSTEM FOR DETECTION OF BODY MOTION
Abstract
An approach for determining motions of a body using distributed sensors is
disclosed. In one embodiment, an apparatus can include: a plurality of sensors coupled to
a body, where each sensor is positioned at about a designated location on the body, and
where each sensor is configured to acquire motion data related to movement of the
designated location on the body and at which the sensor is positioned, and to reduce the
motion data into compressed and transmittable motion data; and a base station configured
to receive the compressed motion data via wireless communication from at least one of
the plurality of sensors, the base station being further configured to remove outlier
information from the received motion data, and to match the received motion data to a
predetermined action, where the predetermined action indicates a movement of the body.
100
Figure 1
102-1
102-2
102-3
102-4
102-6
102-7
102-8
102-5
200
Figure 2
Accelerometer
210
Gyroscope
204
Controller
206
Transmitter
208
Sensor Node
102
Sensor Node
102-1
Sensor Node
102-2
Sensor Node
102-N
Located on a body for
motion sensing
Base Station
202
300
Figure 3
400
Figure 4
500
Figure 5
600
Figure 6
700
Figure 7
800
Figure 8
900
Figure 9
End
916
Call LSC to Classify Motion Data
908
Acquire motion data in a sensor
node positioned on a body
904
Reduce the motion data to form
compressed motion data
906
Is Motion
Valid?
910
Y
N
Start
902
Call DSC to Classyfy Motion Data
912
Is Data Valid?
914
Y
Output Classification
918
Reject Data
916
N
1000
Figure 10
1100
Figure 11
1200
Figure 12
1300
Figure 13
1400
Figure 14
1500
Figure 15
1600
Figure 16
1700
Figure 17
1800
Figure 18
INTERNET
MANAGING ENTITY
Figure 19
1950
1980 1970
1960
1994
1990
1992
1920-3 1920-2
1920-3
1920-5
1920-4
1910
19301940
PTO/SB/08a (04-09)
Approved for use through 05/31/2009. OMB 0651-0031
U.S. Patent and Trademark Office; U.S. DEPARTMENT OF COMMERCE
Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it contains a valid OMB control number.
Doc code: IDS
Doc description: Information Disclosure Statement (IDS) Filed
INFORMATION DISCLOSURE
STATEMENT BY APPLICANT ( Not for submission under 37 CFR 1.99)
Application Number
Filing Date
First Named Inventor
Art Unit
Examiner Name
Attorney Docket Number
EFS Web 2.1.13
U.S.PATENTS
Examiner
Initial*
Cite
NoPatent Number
Kind
Code1Issue Date
Name of Patentee or Applicant
of cited Document
Pages,Columns,Lines where
Relevant Passages or Relevant
Figures Appear
If you wish to add additional U.S. Patent citation information please click the Add button.
U.S.PATENT APPLICATION PUBLICATIONS
Examiner
Initial*
Cite
NoPublication Number
Kind
Code1
Publication
Date
Name of Patentee or Applicant
of cited Document
Pages,Columns,Lines where
Relevant Passages or Relevant
Figures Appear
If you wish to add additional U.S. Published Application citation information please click the Add button.
FOREIGN PATENT DOCUMENTS
Examiner
Initial*
Cite
No
Foreign Document
Number3
Country
Code2
Kind
Code4
Publication
Date
Name of Patentee or
Applicant of cited
Document
Pages,Columns,Lines
where Relevant
Passages or Relevant
Figures Appear
T5
If you wish to add additional Foreign Patent Document citation information please click the Add button
NON-PATENT LITERATURE DOCUMENTS
Examiner
Initials*
Cite
No
Include name of the author (in CAPITAL LETTERS), title of the article (when appropriate), title of the item
(book, magazine, journal, serial, symposium, catalog, etc), date, pages(s), volume-issue number(s),
publisher, city and/or country where published.
T5
BAJCSY et al.
010030-002710US
1
1
i
1
INFORMATION DISCLOSURE
STATEMENT BY APPLICANT ( Not for submission under 37 CFR 1.99)
Application Number
Filing Date
First Named Inventor
Art Unit
Examiner Name
Attorney Docket Number
EFS Web 2.1.13
If you wish to add additional non-patent literature document citation information please click the Add button
EXAMINER SIGNATURE
Examiner Signature Date Considered
*EXAMINER: Initial if reference considered, whether or not citation is in conformance with MPEP 609. Draw line through a
citation if not in conformance and not considered. Include copy of this form with next communication to applicant.
1 See Kind Codes of USPTO Patent Documents at www.USPTO.GOV or MPEP 901.04. 2 Enter office that issued the document, by the two-letter code (WIPO
Standard ST.3). 3 For Japanese patent documents, the indication of the year of the reign of the Emperor must precede the serial number of the patent document. 4 Kind of document by the appropriate symbols as indicated on the document under WIPO Standard ST.16 if possible. 5 Applicant is to place a check mark here if
English language translation is attached.
BAJCSY et al.
010030-002710US
1YANG et al. "DISTRIBUTED SEGMENTATION AND CLASSIFICATION OF HUMAN ACTION USING A WEARABLE
MOTION SENSOR NETWORK", pages 1-8
2YANG et al. "DISTRIBUTED RECOGNITION OF HUMAN ACTIONS USING WEARABLE MOTION SENSOR
NETWORKS" Journal of Ambient Intelligence and Smart Enviromnets 1 (2009) 1-5 IOS Press, pages 1-13
3
YANG et al.."DISTRIBUTED SEGMENTATION AND CLASSIFICATION OF HUMAN ACTIONS USING A WEARABLE
MOTION SENSOR NETWORK" Electrical Engineering & Computer Sciences, University of CA @ Berkeley, Dec. 6,
2007 pages 1-19
4KURYLOSKI et al. "DEXTERNET: AN OPEN PLATFORM FOR HETEROGENEOUS BODY SENSOR NETWORKS
AND ITS APPLCAITONS" IPSN'09 San Francisco, CA USA pages 11-11
INFORMATION DISCLOSURE
STATEMENT BY APPLICANT ( Not for submission under 37 CFR 1.99)
Application Number
Filing Date
First Named Inventor
Art Unit
Examiner Name
Attorney Docket Number
EFS Web 2.1.13
CERTIFICATION STATEMENT
Please see 37 CFR 1.97 and 1.98 to make the appropriate selection(s):
That each item of information contained in the information disclosure statement was first cited in any communication
from a foreign patent office in a counterpart foreign application not more than three months prior to the filing of the
information disclosure statement. See 37 CFR 1.97(e)(1).
OR
That no item of information contained in the information disclosure statement was cited in a communication from a
foreign patent office in a counterpart foreign application, and, to the knowledge of the person signing the certification
after making reasonable inquiry, no item of information contained in the information disclosure statement was known to
any individual designated in 37 CFR 1.56(c) more than three months prior to the filing of the information disclosure
statement. See 37 CFR 1.97(e)(2).
See attached certification statement.
Fee set forth in 37 CFR 1.17 (p) has been submitted herewith.
None
SIGNATURE
A signature of the applicant or representative is required in accordance with CFR 1.33, 10.18. Please see CFR 1.4(d) for the
form of the signature.
Signature Date (YYYY-MM-DD)
Name/Print Registration Number
This collection of information is required by 37 CFR 1.97 and 1.98. The information is required to obtain or retain a benefit by the
public which is to file (and by the USPTO to process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR
1.14. This collection is estimated to take 1 hour to complete, including gathering, preparing and submitting the completed
application form to the USPTO. Time will vary depending upon the individual case. Any comments on the amount of time you
require to complete this form and/or suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S.
Patent and Trademark Office, U.S. Department of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. DO NOT SEND
FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND TO: Commissioner for Patents, P.O. Box 1450, Alexandria,
VA 22313-1450.
BAJCSY et al.
010030-002710US
/Charles J. Kulas/ 2009-12-03
Charles J. Kulas 35809
Privacy Act Statement
EFS Web 2.1.13
The Privacy Act of 1974 (P.L. 93-579) requires that you be given certain information in connection with your submission of the
attached form related to a patent application or patent. Accordingly, pursuant to the requirements of the Act, please be advised
that: (1) the general authority for the collection of this information is 35 U.S.C. 2(b)(2); (2) furnishing of the information solicited
is voluntary; and (3) the principal purpose for which the information is used by the U.S. Patent and Trademark Office is to
process and/or examine your submission related to a patent application or patent. If you do not furnish the requested
information, the U.S. Patent and Trademark Office may not be able to process and/or examine your submission, which may
result in termination of proceedings or abandonment of the application or expiration of the patent.
The information provided by you in this form will be subject to the following routine uses:
1. The information on this form will be treated confidentially to the extent allowed under the Freedom of Information Act
(5 U.S.C. 552) and the Privacy Act (5 U.S.C. 552a). Records from this system of records may be disclosed to the
Department of Justice to determine whether the Freedom of Information Act requires disclosure of these record s.
2. A record from this system of records may be disclosed, as a routine use, in the course of presenting evidence to a
court, magistrate, or administrative tribunal, including disclosures to opposing counsel in the course of settlement
negotiations.
3. A record in this system of records may be disclosed, as a routine use, to a Member of Congress submitting a
request involving an individual, to whom the record pertains, when the individual has requested assistance from the
Member with respect to the subject matter of the record.
4. A record in this system of records may be disclosed, as a routine use, to a contractor of the Agency having need for
the information in order to perform a contract. Recipients of information shall be required to comply with the
requirements of the Privacy Act of 1974, as amended, pursuant to 5 U.S.C. 552a(m).
5. A record related to an International Application filed under the Patent Cooperation Treaty in this system of records
may be disclosed, as a routine use, to the International Bureau of the World Intellectual Property Organization, pursuant
to the Patent Cooperation Treaty.
6. A record in this system of records may be disclosed, as a routine use, to another federal agency for purposes of
National Security review (35 U.S.C. 181) and for review pursuant to the Atomic Energy Act (42 U.S.C. 218(c)).
7. A record from this system of records may be disclosed, as a routine use, to the Administrator, General Services, or
his/her designee, during an inspection of records conducted by GSA as part of that agency's responsibility to
recommend improvements in records management practices and programs, under authority of 44 U.S.C. 2904 and
2906. Such disclosure shall be made in accordance with the GSA regulations governing inspection of records for this
purpose, and any other relevant (i.e., GSA or Commerce) directive. Such disclosure shall not be used to make
determinations about individuals.
8. A record from this system of records may be disclosed, as a routine use, to the public after either publication of the
application pursuant to 35 U.S.C. 122(b) or issuance of a patent pursuant to 35 U.S.C. 151. Further, a record may be
disclosed, subject to the limitations of 37 CFR 1.14, as a routine use, to the public if the record was filed in an application
which became abandoned or in which the proceedings were terminated and which application is referenced by either a
published application, an application open to public inspections or an issued patent.
9. A record from this system of records may be disclosed, as a routine use, to a Federal, State, or local law
enforcement agency, if the USPTO becomes aware of a violation or potential violation of law or regulation.
Distributed Segmentation and Classification of Human
Actions Using a Wearable Motion Sensor Network
Allen YangRoozbeh JafariPhilip KuryloskiSameer IyengarS. Shankar SastryRuzena Bajcsy
Electrical Engineering and Computer SciencesUniversity of California at Berkeley
Technical Report No. UCB/EECS-2007-143
http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-143.html
December 6, 2007
Copyright © 2007, by the author(s).All rights reserved.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.
Acknowledgement
Yang and Sastry are partially supported by ARO MURI W911NF-06-1-0076. Jafari is partially supported by the startup fund from the University ofTexas and Texas Instruments. Bajcsy is partially supported by NSF IIS0724682. Kuryloski, Iyengar, Sastry, and Bajcsy are partially supported byTRUST (Team for Research in Ubiquitous Secure Technology), whichreceives support from NSF CCF-0424422, AFOSR FA9550-06-1-0244, andthe following organizations: Cisco, British Telecom, ESCHER, HP, IBM,iCAST, Intel, Microsoft, ORNL, Pirelli, Qualcomm, Sun, Symantec, TelecomItalia, and United Technologies.
1
Distributed Segmentation and Classification of
Human Actions
Using a Wearable Motion Sensor NetworkAllen Y. Yang, Roozbeh Jafari, Philip J. Kuryloski, Sameer Iyengar, S. Shankar Sastry, and Ruzena Bajcsy
Abstract
We propose a distributed recognition framework to classify human actions using a wearable motion sensor network. Eachsensor node consists of an integrated triaxial accelerometer and biaxial gyroscope. Given a set of pre-segmented actions as trainingexamples, the algorithm simultaneously segments and classifies human actions from a motion sequence, and it also rejects unknownactions that are not in the training set. The classification is distributedly operated on individual sensor nodes and a base stationcomputer. Due to rapid advances in the integration of mobile processors and heterogeneous sensors, a distributed recognitionsystem likely outperforms traditional centralized recognition methods. In this paper, we assume the distribution of multiple actionclasses satisfies a mixture subspace model, one subspace for each action class. Given a new test sample, we seek the sparsestlinear representation of the sample w.r.t. all training examples. We show that the dominant coefficients in the representation onlycorrespond to the action class of the test sample, and hence its membership is encoded in the representation. We provide fastlinear solvers to compute such representation via `
1-minimization.
I. INTRODUCTION
In this paper, we consider human action recognition on a distributed wearable motion sensor network. Each sensor node is
integrated with a triaxial accelerometer and biaxial gyroscope. The locations of the sensors are roughly defined to be the waist,
two wrists, left arm, two knees, and two ankles, as shown in Fig 1. Action recognition has been studied to a great extent in
computer vision in the past. Compared to a model-based or appearance-based vision system, the body sensor network approach
has the following advantages: 1. The system does not require to instrument the environment with cameras or other sensors.
2. The system has the necessary mobility to support continuous monitoring of a subject during her daily activities. 3. With
the continuing integration of mobile processors, sensors, and batteries, it has become possible to manufacture wearable sensor
networks that densely cover the human body to record and analyze very small movements of the human body (e.g., breathing
and spine movements). Such sensor networks can be used in applications such as medical-care oriented surveillance, athletic
training, tele-immersion, and human-computer interaction.
Fig. 1. A distributed wearable sensor network. The sensor on the right arm was malfunctioned during the experiment.
Yang, Iyengar, Sastry, and Bajcsy are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley. Jafari is withthe Department of Electrical Engineering, University of Taxes at Dallas. Kuryloski is with the Department of Electrical Engineering and Computer Science,University of California, Berkeley, and the Department of Electrical and Computer Engineering, Cornell University. Corresponding author: Allen Y. Yang,Rm 307 Cory Hall, UC Berkeley, Berkeley, CA 94720. Email: [email protected]. Tel: 510-643-5798. Fax: 510-643-2356.
Yang and Sastry are partially supported by ARO MURI W911NF-06-1-0076. Jafari is partially supported by the startup fund from the University of Texasand Texas Instruments. Bajcsy is partially supported by NSF IIS 0724682. Kuryloski, Iyengar, Sastry, and Bajcsy are partially supported by TRUST (Team forResearch in Ubiquitous Secure Technology), which receives support from NSF CCF-0424422, AFOSR FA9550-06-1-0244, and the following organizations:Cisco, British Telecom, ESCHER, HP, IBM, iCAST, Intel, Microsoft, ORNL, Pirelli, Qualcomm, Sun, Symantec, Telecom Italia, and United Technologies.
2
In traditional sensor networks, the computation carried by the sensor board is fairly simple: Extract certain local information
and transmit the data to a computer server over the network for processing. With recent advances in power-efficient mobile
processors for sensor networks (e.g., FPGA and Intel XScale series), we are interested in studying new frameworks for
distributed pattern recognition. In such systems, each sensor node will be able to classify local, albeit biased, information.
Only when the local classification detects a possible object/event does the sensor node becomes active and transmit the
measurement to the server. On the server side, a global classifier receives data from the sensor nodes and further optimizes the
classification. The global classifier can be more computationally involved than the distributed classifiers, but it has to adapt to
the change of available active sensors due to local measurement error, sensor failure, and communication congestion.
Distributed pattern recognition on sensor networks has several advantages: 1. Good decisions about the validity of the local
information can reduce the communication between the nodes and the server, and therefore reduce power consumption. Previous
studies have shown the power consumption required to send one byte over a wireless network is equivalent to executing between
1e3 and 1e6 instructions on an onboard processor [21]. 2. The framework increases the robustness of action recognition on the
network. Particularly, as we will show later, one can choose to activate some or all of the sensor nodes on the fly, and the global
classifier is able to adaptively adjust the optimization process and improve the recognition upon local decisions. 3. The ability
for the sensor nodes to make biased local decisions also makes the design of the global classifier more flexible. For example,
a system that only monitors abnormal movements (e.g., falling or no movement) can make fairly good estimation using local
decisions and discard the global optimization, and in cases that the central system fails, the network can still support limited
recognition tasks using the distributed classifiers. 4. Finally, in a more general perspective beyond action recognition, the ability
for individual sensor nodes to make local decisions can be used as feedback to support certain autonomous actions/reactions
without relying on the intervention of a central system.
We define distributed action recognition as follows:
Problem 1 (Distributed segmentation and classification): Assume a set of L wearable sensor nodes with integrated triaxial
accelerometers and biaxial gyroscopes are attached to multiple locations of the human body. Denote
al(t) = (xl(t), yl(t), zl(t), θl(t), ρl(t))T ∈ R
5
as the measurement of the five sensors on node l at time t, and
a(t) = (aT1 (t),aT
2 (t), · · · ,aTL(t))T ∈ R
5L
collects all sensor measurement. Denote
s = (a(1),a(2), · · · ,a(l)) ∈ R5L×l
as an action sequence of length l.Given K different classes of human actions, a set of ni training examples {si,1, · · · , si,ni
} are collected for each ith class.
The durations of the sequences naturally may be different. Given a new test sequence s that may contain multiple actions
and possible other outlying actions, we seek a distributed algorithm to simultaneously segment the sequence and classify the
actions.
Solving this problem mainly involves the following difficulties.
1) Simultaneous segmentation and classification. If the test sequence is pre-segmented, classification becomes straightforward
with many classical algorithms to choose from. In this paper, we seek simultaneous segmentation and recognition from
a long motion sequence. Furthermore, we also assume that the test sequence may contain other unknown actions that
are not from the K classes. The algorithm needs to be robust to these outliers.
2) Variation of action durations. One major difficulty in action recognition is to determine the duration of an action. Good
classification depends on correct estimation of both the starting time and the duration of an action. But in practice, the
durations of different actions may vary dramatically (see Fig 2).
Fig. 2. Population of different action durations in our data set.
3
3) Identity independence. In addition to the variation of action durations, different people act differently for the same actions
(see Fig 3). If both the training samples and the test samples are from the same subject, typically the classification could
be greatly simplified. However, it is well known that collecting large numbers of training samples in human biometrics is
expensive, particularly in medical-care oriented applications. Therefore it is desirable for an action recognition algorithm
to be identity independent. For a test sequence in the experiment, we examine the identity-independent performance by
excluding the training samples of the same subject.
Fig. 3. Readings of the x-axis accelerometers (top) and x-axis gyroscopes (bottom) from 8 distributed sensors (shown in different colors)on two repetitive “stand-kneel-stand” sequences from two subjects as the left and right columns.
4) Distributed recognition. A distributed recognition system needs to further consider the following issues: 1. How to
extract compact and accurate low-dimensional action features for local classification and transmission over a band-
limited network? 2. How to classify the local measurement in real time using low-power processors? 3. How to design
a classifier to globally optimize the recognition and be adaptive to the change of the network?
a) Literature Overview.: Action (or activity) recognition using wearable motion sensors has been a prominent topic in
the last five years. Initial studies were primarily focused on single accelerometers [9], [11] or other motion sensors [12], [19].
More recent systems prefer using multiple motion sensors [1], [2], [10], [13], [16], [17], [20]. Depending on the type of sensor
used, an action recognition system is typically composed of two parts: a feature extraction module and a classification module.
There are three major directions for feature extraction in wearable sensor networks. The first direction uses simple statistics
of a signal sequence such as the max, mean, variance, and energy [2], [10], [11], [13], [20]. The second type of feature is
computed using fixed filter banks such as FFT and wavelets [11], [19]. The third type is based on classical dimensionality
reduction techniques such as principal component analysis (PCA) and linear discriminant analysis (LDA) [16], [17]. In terms
of classification on the action features, a large body of previous work favored thresholding or k-nearest-neighbor (kNN) due
to the simplicity of the algorithms implemented on mobile devices [11], [19], [20]. Other more sophisticated techniques have
also been used, such as decision trees [2], [3] and hidden Markov models [16].
For distributed pattern recognition, there exist initial studies on distributed speech recognition [23] and distributed expert
systems [18]. In [23], the authors summarized three major categories of distributed recognition:1 1. All data are relayed to a
computer server for processing, e.g., on a closed-circuit camera system [14]. 2. All data are locally processed, e.g., [15]. One
may further choose to implement a global classifier by a majority-voting scheme on local decisions. 3. A full-fledged distributed
recognition system consists of both front-end processing for feature extraction and global processing for classification [6], [13],
[16], [17], [20]. Our distributed action recognition system falls into the last category. One particular problem associated with
this category is that each local observation from the distributed sensors is biased and may be insufficient to classify all classes.
For example in our system, the sensors placed on the lower-body would not perform well to classify those actions that mainly
involve upper body motions. Consequently, one can not expect majority-voting type classifiers to perform well globally.
b) Contributions of the paper.: We propose a distributed action recognition algorithm that simultaneously segments and
classifies 12 human actions using 1- 8 wearable motion sensor nodes. We assume the wearable sensor network is a typical one-
hop wireless network and all the sensor nodes communicate with a central computer. The work is inspired by a recent study on
face recognition using sparse representation and `1-minimization [22]. We assume each action class satisfies a low-dimensional
subspace model. We show that a 10-D LDA feature space suffices to locally represent the 12 action subspaces on each node.
If a linear representation is sought to represent a valid test sample w.r.t. all training samples, the dominant coefficients in the
sparsest representation correspond to the training samples from the same action class, and hence they encode the membership
of the test sample. We further study fast linear programming routines to solve for such sparse representation.
We investigate a distributed framework for simultaneous segmentation and classification of individual actions from a motion
sequence. On each sensor node, a classifier searches for good segmentation on multiple temporal resolutions. We propose an
effective method to reject action segments that do not correspond to any training class as outliers. Hence an inlying action
segment simultaneously provides the localization of the action and its membership.
1In certain situations it is desirable to consider a complete distributed recognition system where there is no central system and the recognition on the nodesconverge over time via node-to-node communications. In this paper, having a base station is still a practical and efficient solution.
4
When a sensor node detects a valid action segment, it transmits its 10-D feature to the server. The global classifier receives
the distributed feature vectors, and then seeks a global sparse representation of the action features against the corresponding
feature vectors of all the training samples. The global optimization is adaptive to the change of available active nodes.
The focus of this paper is about the distributed action recognition framework. The algorithm is software simulated in
MATLAB. Currently our data set is mainly designed for transient actions (e.g., jumping, kneeling, and stand-to-sit), but it
also contains a limited number of nontransient actions (i.e., turning, going upstairs and downstairs). We are in the process of
gradually expanding the number of subjects and action classes in the database.
II. DESIGN OF THE WEARABLE SENSOR NETWORK
The wearable sensor network consists of sensor nodes placed at various body locations, which communicate with a base
station attached to a computer server through a USB port. The sensor nodes and base station are built using the commercially
available Tmote Sky boards. Tmote Sky runs TinyOS on an 8MHz microcontroller with 10K RAM and communicates using
the 802.15.4 wireless protocol. Each custom-built sensor board has a triaxial accelerometer and a biaxial gyroscope, which is
attached to Tmote Sky (shown in Fig 4). Each axis is reported as a 12bit value to the node, indicating values in the range of
±2g and ±500◦/s for the accelerometer and gyroscope, respectively. Each node is currently powered by two AA batteries.
Fig. 4. The sensor board with the accelerometer and gyroscope. The mother board at the back is Tmote Sky.
The current hardware design of the sensor contributes certain amounts of measurement error. The accelerometers typically
require some calibration in the form of a linear correction, as sensor output under 1g may be shifted up to 15% in some
sensors. It is also worth noting that the gyroscopes produce an indication of rotation under straight line motions. Fortunately
these systematic errors appear to be consistent across experiments for a given sensor board. However, without calibration to
correct them, the errors may affect the action recognition if different sets of sensors are used interchangeably in the experiment.
To avoid packet collision in the network, we use a TDMA protocol that allocates each node a specific time slot during
which to transmit data. This allows us to receive sensor data at 20Hz with minimal packet loss. To avoid drift in the network,
the base station periodically broadcasts a packet to resynchronize the nodes’ individual timers. The code to interface with the
sensors and transmit data is implemented directly on the mote using nesC, a variant of C.
III. CLASSIFICATION VIA SPARSE REPRESENTATION
In this section, we present an efficient action classification method to recognize pre-segmented action sequences on each
sensor node via `1-minimization. We first discuss the representation of action samples in vector form. Given an action segment
of length l from node j, sj = (aj(1),aj(2), · · · ,aj(l)) ∈ R5×l, define a new vector s
Sj as the stacking of the l columns of
sj :
sSj
.= (ai(1)T ,ai(2)T , · · · ,ai(l)
T )T ∈ R5·l. (1)
We will interchangeably use sj and sSj to denote the stacked vector without causing ambiguity.
Since the length l varies among different subjects and actions, we need to normalize l to be the same for all the training and
test samples, which can be achieved by linear interpolation or FFT interpolation. After normalization, we denote the dimension
of samples sj as Dj = 5l. Subsequently, we define a new vector v that stacks the measurement from all L nodes:
v = (sT1 , sT
2 , · · · , sTL)T ∈ R
D, (2)
where D = D1 + · · · + DL = 5lL.
5
In this paper, we assume the samples v in an action class satisfy a subspace model, called an action subspace. If the training
samples {v1, · · · ,vni} of the ith class sufficiently span the ith action subspace, given a test sample y = (yT
1 , · · · ,yTL)T ∈ R
D
in the same class i, y can be linearly represented using the training examples of the same class:
y = α1v1 + · · · + αnivni
⇔
y1
y2...
yL
=
s1
s2
...
sL
1
· · ·
s1
s2
...
sL
ni
α1
α2
...
αni
.(3)
It is important to note that such linear constraint also holds on each node j: yj = α1sj,1 + · · · + αnisj,ni
∈ RDj .
In theory, complex data such as human actions typically constitute complex nonlinear models. The linear models are used
to approximate such nonlinear structures in a higher-dimensional subspace (see Fig 5). Notice that such linear approximation
may not produce good estimation of the distance/similarity metric for the samples on the manifold. However, as we will show
in Example 1, given sufficient samples on the manifold as training examples, a new test sample can be accurately represented
on the subspace, provided that any two classes do not have similar subspace models.
Fig. 5. Modeling a 1-D manifold M using a 2-D subspace V .
In this paper, we are interested in recovering label(y). A previous study [22] proposed to reformulate the recognition using
a global sparse representation: Since label(y) = i is unknown, we can represent y using all the training samples from Kclasses.
y = (A1, A2, · · · , AK)
x1
x2
...
xK
= Ax, (4)
where Ai = (vi,1,vi,2, · · · ,vi,ni) ∈ R
D×ni collects all the training samples of class i, xi = (αi,1, αi,2, · · · , αi,ni)T ∈ R
ni
collects the corresponding coefficients in (3), and A ∈ RD×n where n = n1 + n2 + · · · + nK .
Since y satisfies both (3) and (4), one solution of x in (4) should be
x∗ = (0, · · · , 0,xT
i , 0, · · · , 0)T . (5)
The solution is naturally sparse: in average only 1K
terms in x∗ are nonzero. Furthermore, x
∗ is also a solution for the
representation on each node j:
yj = (A(j)1 , A
(j)2 , · · · , A
(j)K ) · x = A(j)
x, (6)
where A(j)i ∈ R
Dj×ni consists of row vectors in Ai that correspond to the jth node. Hence, x∗ can be solved either globally
using (4) or locally using (6), provided that the action data measured on each node are sufficiently discriminant. We will come
back to the discussion about local classification versus global classification in Section IV. In the rest of this section however,
our focus will be on each node.
One major difficulty in solving (6) is the high dimensionality of the action data. For example, in this paper, we normalize
l = 64 for all action segments (see Fig 2 for the distribution of original lengths). Then Dj = 64 × 5 = 320 for yj on each
node. The high dimensionality makes it difficult to either directly solve for x on the node or transmit the action data over a
band-limited wireless channel. In compressed sensing [4], [5], one reduces the dimension of a linear system by choosing a
linear projection Rj ∈ Rd×Dj :2
yj
.= Rjyj = RjA
(j)x
.= A(j)
x ∈ Rd. (7)
2Notice that Rj is not computed on the sensor node. These matrices are computed offline and simply stored on each sensor node.
6
As a result, the action feature yj is more efficient to transmit than yj in the original data space Dj . On the network server,
the global action vector is of the following form:
y =
y1
y2...
yL
=
R1 0 · · · 00 R2 · · · 0
......
0 0 · · · RL
y1
y2...
yL
.= Ry ∈ R
dL, (8)
where R ∈ RdL×D is equivalent to a global projection matrix.
After the projection Rj , typically the feature dimension d is much smaller than the number n of all training samples.
Therefore, the new linear system (7) is underdetermined. Numerically stable solutions exist to uniquely recover sparse solutions
x∗ via `1-minimization [7]:
x∗ = arg min ‖x‖1 subject to yj = A(j)
x. (9)
In our experiment, we have tested multiple projection operators including PCA, LDA, and random project advocated in [22].
We found that 10-D feature spaces using LDA lead to best recognition in a very low-dimensional space.
After the (sparsest) representation x is recovered, we project the coefficients onto each action subspaces
δi(x) = (0, · · · , 0,xTi , 0, · · · , 0)T ∈ R
n, i = 1, · · · , K. (10)
Finally, the membership of the test sample yj is assigned to the class with the smallest residual
label(yj) = arg mini
‖yj − A(j)δi(x)‖2. (11)
Example 1 (Classification on Nodes): We designed 12 action categories in the experiment: Stand-to-Sit, Sit-to-Stand, Sit-to-
Lie, Lie-to-Sit, Stand-to-Kneel, Kneel-to-Stand, Rotate-Right, Rotate-Left, Bend, Jump, Upstairs, and Downstairs. The detailed
experiment setup is given in Section V.
To implement `1-minimization on the sensor node, we look for fast sparse solvers in the literature. We have tested a variety
of methods including (orthogonal) matching pursuit (MP), basis pursuit (BP), LASSO, and a quadratic log-barrier solver.3 We
found that BP [8] gives the best trade-off between speed, noise tolerance, and recognition accuracy.
Here we demonstrate the accuracy of the BP-based algorithm on each sensor node (see Fig 1 for their locations). The actions
are manually segmented from a set of long motion sequences from three subjects. In total there are 626 samples in the data set.
The 10-D feature selection is via LDA. We require the classification to be identity-independent. Therefore, for each test sample
from a subject, we use all samples from the other two subjects to form the training set. The accuracy of the classification is
shown in Table I. Fig 6 shows an example of the estimated sparse coefficients x and its residuals. In terms of the speed, our
simulation in MATLAB takes in average 0.03s to process one test sample on a typical 3G PC.
TABLE IRECOGNITION ON EACH NODE ON 12 ACTION CLASSES.
Sen # 1 2 3 4 5 6 7 8
Acc [%] 99.9 99.4 99.9 100 95.3 99.5 93 100
Fig. 6. A BP-based `1 solution and its corresponding residuals of a Stand-to-Sit action on the waist node. The action is correctly classified
as class 1. SCI(x) = 0.7 (see (13)).
Example 1 shows that if the segmentation of the actions is known and there is no other invalid samples, all sensor nodes can
recognize the 12 actions individually with very high accuracy, which also verifies that the mixture subspace model is a good
approximation of the action data. Nevertheless, one may question that in such low-dimensional feature spaces other classical
methods (e.g., kNN and decision tree methods) should also perform well. In the next section, we will show that the major
advantage of adopting the sparse representation framework is a unified solution to recognize and segment valid actions and
reject invalid ones. We will also show that the method is adaptive to the change of available sensor nodes on the fly.
3The implementation of these routines in MATLAB is available in SparseLab: http://sparselab.stanford.edu
7
IV. DISTRIBUTED SEGMENTATION AND RECOGNITION
There have been two major approaches in the past to provide partial solutions to simultaneous segmentation and recognition
of human actions on wearable sensors. The first solution assumes different actions are separated by a “rest” state, and such
states can be detected by energy thresholding or a special classifier to distinguish between rest and non-rest. The second
solution assumes all sensors in the network are available at all time, and rejects invalid samples based on the sample distance
between the test and training examples. These two approaches have several drawbacks: 1. For the first approach, the validity
of the rest state between actions is not physically guaranteed. For example, nontransient actions such as walking and running
may last for a long period. 2. The second approach is not robust when the number of active sensors changes over time. In
this case, tuning a list of different distance thresholds to reject outliers when the number of sensors changes can be difficult,
which still highly depends on the condition on the training samples.
We propose a novel framework to simultaneously segment and recognize human actions using the (10-D LDA) action features
extracted from a network of distributed sensors. The unified outlier rejection method applies to both individual nodes and the
global classifier. The outlying action segments may be caused by unknown actions performed by the subjects or by incorrect
segmentation. As a result, the extracted inlying action segments simultaneously provide the segmentation of the actions and
their labels. The framework is also robust w.r.t. different action durations and the change of available sensor nodes.
We first introduce multi-resolution action detection on each sensor node. From the training examples, we can estimate a range
of possible lengths for all actions of interest. We then evenly divide the range into multiple length hypotheses: (h1, · · · , hs).At each time t in a motion sequence, the node tests a set of s possible segmentations: 4
y(1) = (a(t − h1), · · · , a(t)), · · · ,y(s) = (a(t − hs), · · · , a(t)), (12)
as shown in Fig 7. With each candidate y normalized to length l, a sparse representation x is estimated using `1-minimization
in Section III.
Fig. 7. Multiple segmentation hypotheses on a wrist sensor at time t = 150 of a “go downstairs” sequence. h1 is a good segment whileothers are false segments. Notice that the movement between 250 and 350 is an outlying action the subject performed.
Based on the previous sparsity assumption, if y is not a valid segmentation w.r.t. the training examples due to either incorrect
t or h, or the real action performed is not in the training classes, the dominant coefficients of its sparsest representation x
should not correspond to any single class (as shown in Fig 8). We use a sparsity concentration index (SCI) [22]:
SCI(x).=
K · maxj=1,··· ,K ‖δj(x)‖1/‖x‖1 − 1
K − 1∈ [0, 1]. (13)
If the nonzero coefficients of x are evenly distributed among K classes, then SCI(x) = 0; if all the nonzero coefficients are
associated with a single class, then SCI(x) = 1. Therefore, we introduce a sparsity threshold τ1 applied to all sensor nodes:
If SCI(x) > τ1, the segment is a valid local measurement, and its 10-D LDA features y will be sent to the base station.
Fig. 8. The `1 solution and corresponding residuals of an outlying sample on the waist node. SCI(x) = 0.13.
Next, we introduce a global classifier that adaptively optimizes the overall segmentation and classification. Suppose at time
t and with a length hypothesis h, the base station receives L′ action features from the active sensors (L′ ≤ L). Without loss
4A segmentation candidate should be ignored if it overlaps with a previously detected result.
8
of generality, assume these features are from the first L′ sensors: y1, y2, · · · , yL′ . Let
y′ = (yT
1 , · · · , yTL′)T ∈ R
10L′
. (14)
Then the global sparse representation x of y′ satisfies the following linear system
y′ =
R1 · · · 0 · · · 0...
. . ....
...
0 · · · RL′ · · · 0
Ax = R′Ax = A′
x, (15)
where R′ ∈ RdL′
×D is a new projection matrix that only extracts the action features from the first L′ nodes. Consequently, the
effect of changing active sensor nodes for the global classification is formulated via the global projection matrix R′. During
the transformation, the data matrix A and the sparse representation x remain unchanged. The two linear systems (7) and (8)
then become special cases of (15), where L′ = 1 and L, respectively.
Similar to the outlier rejection criterion we previously proposed on each node, we introduce a global rejection threshold τ2.
If SCI(x) > τ2 in (15), the most significant coefficients in x are concentrated in a single training class. Hence y′ is assigned
to that class, and its length hypothesis h provides the segmentation of the action from the motion sequence.5
The overall algorithm on the nodes and on the network server provides a unified solution to segment and classify action
segments from a motion sequence using only two simple parameters τ1 and τ2. Typically τ1 is selected to be less restricted than
τ2 in order to increase the recall rate, because passing certain amounts of false signal to the global classifier is not necessarily
disastrous as the signal would be rejected by τ2 when the action features from multiple nodes are jointly considered.
Finally, we consider how the change of active nodes affects the estimation of x and the classification of the actions. In
compressed sensing, the efficacy of `1-minimization in solving for the sparsest solution x in (15) is characterized by the `0/`1
equivalence relation [7], [8]. A necessary and sufficient condition for the equivalence to hold is the k-neighborliness of A′. As
a special case, one can show that if x is the sparsest solution in (15) for L′ = L, x is also a solution for L′ < L. Hence, the
decrease of L′ leads to possible sparser solutions of x.
On the other hand, the decrease in available action features also makes y′ less discriminant. For example, if we reduce
L′ = 1 and only activate a wrist sensor, then the `1 solution x may have nonzero coefficients associated to multiple actions
with similar wrist motions, albeit sparser. This is an inherent problem for any method to classify human actions using a limited
number of motion sensors. In theory, if two action subspaces in a low-dimensional feature space have a small subspace distance
after the projection, the corresponding sparse representation cannot distinguish the test samples from the two classes. We will
demonstrate in Section V that indeed reducing the available motion sensors will reduce the discriminant power of the action
features in a lower-dimensional space.
In summary, the formulation of adaptive global classification (15) via a global projection matrix R′ compares favorably
to other classical methods such as kNN and decision trees mainly for the following two reasons: 1. The framework provides
a simple means to reject outliers via two sparsity constraints τ1 and τ2. 2. The effects of changing action features can be
quantitatively studied via R′ and its `0/`1 equivalence.
V. EXPERIMENT
We test the performance of the system using a data set we collected from three male subjects at the age of 28, 30, and 32,
respectively. Eight wearable sensors were placed at different body locations (see Fig 1). We designed a set of 12 action classes:
Stand-to-Sit (StSi), Sit-to-Stand (SiSt), Sit-to-Lie (SiLi), Lie-to-Sit (LiSi), Stand-to-Kneel (StKn), Kneel-to-Stand (KnSt), Rotate-
Right (RoR), Rotate-Left (RoL), Bend, Jump, Upstairs (Up), and Downstairs (Down). We are particularly interested in testing
the system under various action durations. For this purpose, we have asked the subjects to perform StSi, SiSt, SiLi, and LiSi
with two different speeds (slow and fast), and perform RoR and RoL with two different rotation angles (90◦ and 180◦). All
subjects were asked to perform a sequence of related actions in each recording session based on their own interpretation of
the actions (e.g., Fig 3). In total there are 626 actions performed in the data set (see Table III for the numbers in individual
classes).
We demonstrate the distributed recognition algorithm against three criteria: 1. What is the accuracy of the algorithm with
all 8 sensors activated, and how well can the global classifier adjust when a certain number of nodes are dropped from the
network. 2. Whether a set of heuristically selected parameters {τ1, τ2} can effectively segment valid actions with different
available nodes. 3. How much communication can be reduced via each node rejecting local measurement compared to simply
streaming all action features to the base station.
Table II shows the accuracy of the algorithm in terms of Precision versus Recall and with different sets of sensor nodes.
For all experiments, τ1 = 0.2 and τ2 = 0.4. If all nodes are activated, the algorithm can achieve 98.8% accuracy among the
actions it extracted, and 94.2% of the true actions are detected. The performance decreases gracefully when more nodes become
5At time t, if multiple hypotheses pass the rejection threshold τ2, one may heuristically select one based on his/her preference for longer or shorter segments,or other heuristics such as the number of active sensors.
9
unavailable to the global classifier. Our results show that if we can maintain one motion sensor for the upper body (e.g., at
position 2) and one for the lower body (e.g., at position 7), the algorithm can still achieve 94.4% precision and 82.5% recall.
Finally, in average the 8 distributed classifiers that reject invalid local measurements reduce the node-to-station communication
for above 50%. Please refer to the Appendix for the rendering of the segmentation results on the motion sequences.
TABLE IIPRECISION VS. RECALL WITH DIFFERENT SETS OF ACTIVATED SENSORS.
Sensors 2 7 2,7 1,2,7 1- 3, 7,8 1- 8
Prec [%] 89.8 94.6 94.4 92.8 94.6 98.8
Rec [%] 65 61.5 82.5 80.6 89.5 94.2
One may be curious about the relatively low recall on single sensors such as 2 and 7, particularly compared to the results
in Table I. This performance difference is due to the large number of potential outlying segments presented in a long motion
sequence (e.g., see Fig 7). We can further compare the difference using two confusion tables III and IV. We see that a single
node 2 that is positioned on the right wrist performed poorly mainly on two action categories: Stand-Kneel and Upstairs-
Downstairs, both of which involve significant movements of the lower body but not the upper one. This is the main reason
for the low recall in Table II. On the other hand, for the actions that are detected using node 2, our system can still achieve
about 90% accuracy, which clearly demonstrates the robustness of the distributed recognition framework. Similar arguments
also apply to node 7 and other sensor combinations.
TABLE IIICONFUSION TABLE USING SENSORS 1-8.
TABLE IVCONFUSION TABLE USING SENSOR 2.
VI. CONCLUSION AND DISCUSSION
Inspired by the emerging compressed sensing theory, we have proposed a distributed recognition framework to segment
and classify human actions on a wearable motion sensor network. The framework provides a unified solution based on `1-
minimization to classify valid action segments and reject outlying actions on the sensor nodes and the base station. We have
shown through our experiment that a set of 12 action classes can be accurately represented and classified using a set of 10-D
10
LDA features measured at multiple body locations. The proposed global classifier can adaptively adjust the global optimization
to boost the recognition upon available local measurements.
One limitation in the current system is that the wearable sensors need to be firmly fastened at the designated locations.
However, a more practical system/algorithm should tolerate certain degrees of offsets without sacrificing the accuracy. In this
case, the variation of the measurement for different action classes would increase substantially. One open question is what
low-dimensional linear/nonlinear models one may use to model such more complex data, and whether the sparse representation
framework can still apply to approximate such structures with limited numbers of training examples. A potential solution to
this question will be a meaningful step forward both in theory and in practice.
REFERENCES
[1] R. Aylward and J. Paradiso. A compact, high-speed, wearable sensor network for biomotion capture and interactive media. In Proceedings of the
International Conference on Information Processing in Sensor Networks, 2007.[2] L. Bao and S. Intille. Activity recognition from user-annotated acceleration data. In Proceedings of the International Conference on Pervasive Computing,
2004.[3] A. Benbasat and J. Paradiso. Groggy wakeup - automated generation of power-efficient detection hierarchies for wearable sensors. In Proceedings of
International Workshop on Wearable and Implantable Body Sensor Networks, 2007.[4] E. Candes. Compressive sampling. In Proceedings of the International Congress of Mathematicians, 2006.[5] E. Candes and T. Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory,
52(12):5406–5425, 2006.[6] C. Chang and H. Aghajan. Collaborative face orientation detection in wireless image sensor networks. In Proceedings of Distributed Smart Cameras
Workshop, 2006.[7] D. Donoho. Neighborly polytopes and sparse solution of underdetermined linear equations. preprint, 2005.[8] D. Donoho and M. Elad. On the stability of the basis pursuit in the presence of noise. Signal Processing, 86:511–532, 2006.[9] J. Farringdon, A. Moore, N. Tilbury, J. Church, and P. Biemond. Wearable sensor badge & sensor jacket for context awareness. In Proceedings of the
International Symposium on Wearable Computers, pages 107–113, 1999.[10] E. Heinz, K. Kunze, and S. Sulistyo. Experimental evaluation of variations in primary features used for accelerometric context recognition. In Proceedings
of the European Symposium on Ambient Intelligence, 2003.[11] T. Huynh and B. Schiele. Analyzing features for activity recognition. In Proceedings of the Joint Conference on Smart Objects and Ambient Intelligence,
2005.[12] H. Kemper and R. Verschuur. Validity and reliability of pedometers in habitual activity research. European Journal of Applied Physiology, 37(1):71–82,
1977.[13] N. Kern, B. Schiele, and A. Schmidt. Multi-sensor activity context detection for wearable computing. In Proceedings of the European Symposium on
Ambient Intelligence, 2003.[14] I. Kim, J. Shim, J. Schlessman, and W. Wolf. Remote wireless face recognition employing ZigBee. In Proceedings of the Distributed Smart Cameras
Workshop, 2006.[15] A. Klausner, A. Tengg, and B. Rinner. Vehicle classifcation on multi-sensor smart cameras using feature- and decision-fusion. In Proceedings of the
ACM/IEEE International Conference on Distributed Smart Cameras, 2007.[16] P. Lukowicz, J. Ward, H. Junker, M. Stager, G. Troster, A. Atrash, and T. Starner. Recognizing workshop activity using body worn microphones and
accelerometers. In Proceedings of the International Conference on Pervasive Computing, 2004.[17] J. Mantyjarvi, J. Himberg, and T. Seppanen. Recognizing human motion with multiple acceleration sensors. In Proceedings of the IEEE International
Conference on Systems, Man and Cybernetics, 2001.[18] J. Morrill. Distributed recognition of patterns in time series data. Communications of the ACM, 41(5):45–51, 1998.[19] B. Najafi, K. Aminian, A. Parschiv-Ionescu, F. Loew, C. Bula, and P. Robert. Ambulatory system for human motion analysis using a kinematic sensor:
Monitoring of daily physical activity in the elderly. IEEE Transactions on Biomedical Engineering, 50(6):711–723, 2003.[20] S. Pirttikangas, K. Fujinami, and T. Nakajima. Feature selection and activity recognition from wearable sensors. In Proceedings of the International
Symposium on Ubiquitous Computing Systems, 2006.[21] C. Sadler and M. Martonosi. Data compression algorithms for energy-constrained devices in delay tolerant networks. In Proceedings of the ACM
Conference on Embedded Networked Sensor Systems, pages 265–278, 2006.[22] A. Yang, J. Wright, Y. Ma, and S. Sastry. Feature selection in face recognition: A sparse representation perspective. Technical Report UCB/EECS-2007-99,
University of California, Berkeley, 2007.[23] W. Zhang, L. He, Y. Chow, R. Yang, and Y. Su. The study on distributed speech recognition system. In Proceedings of the IEEE International Conference
on Acoustics, Speech, and Signal Processing, pages 1431–1434, 2000.
11
APPENDIX
In this appendix, we provide detailed classification results to demonstrate the accuracy of the proposed algorithm using all
1 - 8 sensor nodes. For clarity, each figure in Fig 9 - 21 only plots the readings from x-axis accelerometers on the 8 nodes
for three motion sequences performed by the three subjects, respectively. The segmentation results are then superimposed. The
black solid boxes indicate the locations of the correctly classified action segments. The red boxes (e.g., in Fig 12 and 13)
indicate the locations of the false classification. One can also observe from the figures that some valid actions are not detected
by the algorithm, e.g., in Fig 20.
The results clearly demonstrate that the proposed algorithm can accurately segment and classify the 12 action classes with
widely different durations. The overall statistics about Precision versus Recall was summarized in Table III.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 9. Segmentation of the slow Stand-Sit-Stand sequences from the three subjects.
12
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 10. Segmentation of the fast Stand-Sit-Stand sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 11. Segmentation of the slow Sit-Lie-Sit sequences from the three subjects.
13
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 12. Segmentation of the fast Sit-Lie-Sit sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 13. Segmentation of the Bend sequences from the three subjects.
14
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 14. Segmentation of the Stand-Kneel-Stand sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 15. Segmentation of the 90◦ Rotate-Right-Left sequences from the three subjects.
15
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 16. Segmentation of the 90◦ Rotate-Left-Right sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 17. Segmentation of the 180◦ Rotate-Right sequences from the three subjects.
16
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 18. Segmentation of the 180◦ Rotate-Left sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 19. Segmentation of the Jump sequences from the three subjects.
17
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 20. Segmentation of the Go-Upstairs sequences from the three subjects.
(a) Subject 1
(b) Subject 2
(c) Subject 3
Fig. 21. Segmentation of the Go-Downstairs sequences from the three subjects.
Journal of Ambient Intelligence and Smart Environments 1 (2009) 1–5 1IOS Press
Distributed Recognition of Human Actions
Using Wearable Motion Sensor Networks 1
Allen Y. Yang a,∗, Roozbeh Jafari b, S. Shankar Sastry a, and Ruzena Bajcsy a
a Department of EECS, University of California, Berkeley
Berkeley, CA 94705, USA
E-mail: {yang,sastry,bajcsy}@eecs.berkeley.edub Department of EE, University of Texas at Dallas
Richardson, TX 75083, USA
E-mail: [email protected]
Abstract. We propose a distributed recognition framework to classify continuous human actions using a low-bandwidth wearable
motion sensor network, called distributed sparsity classifier (DSC). The algorithm classifies human actions using a set of training
motion sequences as prior examples. It is also capable of rejecting outlying actions that are not in the training categories.
The classification is operated in a distributed fashion on individual sensor nodes and a base station computer. We model the
distribution of multiple action classes as a mixture subspace model, one subspace for each action class. Given a new test sample,
we seek the sparsest linear representation of the sample w.r.t. all training examples. We show that the dominant coefficients in
the representation only correspond to the action class of the test sample, and hence its membership is encoded in the sparse
representation. Fast linear solvers are provided to compute such representation via `1-minimization. To validate the accuracy
of the framework, a public wearable action recognition database is constructed, called wearable action recognition database
(WARD). The database is comprised of 20 human subjects in 13 action categories. Using up to five motion sensors in the WARD
database, DSC achieves state-of-the-art performance. We further show that the recognition precision only decreases gracefully
using smaller subsets of active sensors. It validates the robustness of the distributed recognition framework on an unreliable
wireless network. It also demonstrates the ability of DSC to conserve sensor energy for communication while preserve accurate
global classification.
Keywords: action recognition, wearable sensor network, distributed perception, sparse representation, compressive sensing
1. Introduction
Action/activities recognition has been extensively
studied in the past in the literature of computer vision.
Compared with either model-based or appearance-
based vision systems, body sensor networks that we
study in this paper have several distinct advantages: 1.
Body sensor systems do not require to instrument the
environment with cameras or other sensors. 2. Such
1This work was partially supported by ARO MURI W911NF-06-
1-0076, NSF TRUST Center, and the startup funding from the Uni-
versity of Texas and Texas Instruments.*Corresponding author. E-mail: [email protected].
systems also have the necessary mobility to support
persistent monitoring of a subject during her daily ac-
tivities in both indoor and outdoor environments. 3.
With the continuing miniaturization and integration of
mobile processors and wireless sensors, it has become
possible to manufacture wearable sensor networks that
densely cover the human body to record and analyze
very small movements of the human body (e.g., breath-
ing and spine movements) with higher accuracy than
most extant vision systems. Such sensor networks can
be used in applications such as medical-care moni-
toring, athlete training, tele-immersion, and human-
computer interaction (e.g., integration of accelerome-
ters in Wii game controllers and smart phones).
1876-1364/09/$17.00 c© 2009 – IOS Press and the authors. All rights reserved
2 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
Fig. 1. A subject wearing a body sensor network with the numbering
of the sensors superimposed in the image. The sensor system con-
sists of five wireless motion sensors, two on the wrists, one on the
waist, and two on the ankles, respectively.
In traditional sensor networks, the computation car-
ried by the sensor board is fairly simple: Extract cer-
tain local information and transmit the data to a com-
puter server over the network for processing. In this
paper, we propose a new method for distributed pat-
tern recognition. In this system, each sensor node will
be able to classify local, albeit biased, information.
Only when the local classification detects a possible
object/event does the sensor node become active and
transmit the measurement to a network server. 1 On the
server side, a global classifier receives data from the
sensor nodes and further optimizes the classification
upon local sensor decisions. The global classifier can
be more computationally involved than the distributed
classifiers, but it has to adapt to the change of avail-
able network sensors due to local measurement error,
sensor failure, and communication congestion.
1.1. Literature Overview
Past studies on sensor-based action recognition were
primarily focused on single accelerometers [12,15] or
other motion sensors [16,23]. More recent systems
prefer using multiple motion sensors [19,17,14,2,18,
1Studies have shown that the power consumption required to
successfully send one byte over a wireless channel is equivalent
to executing between 1e3 and 1e6 instructions on an onboard
processor[26]. Hence it is paramount in sensor networks to reduce
the communication cost while preserve the recognition performance.
25,1]. Depending on the type of sensor used, an action
recognition system is typically comprised of two parts:
a feature extraction module at the sensor level and a
classification module at the server level.
There are three major directions for feature extrac-
tion in wearable sensor networks. The first direction
uses simple statistics in a motion sequence such as the
max, mean, variance, and energy. The second type of
feature is computed using fixed filter banks such as
FFT and wavelets [23,15]. The third type is based on
classical dimensionality reduction techniques such as
principal component analysis (PCA) and linear dis-
criminant analysis (LDA) [19,18].
In terms of classification on the action features, a
large body of previous work favored thresholding or
k-nearest-neighbor (kNN) due to the simplicity of the
algorithms for mobile devices [23,15,25]. Other more
sophisticated techniques have also been used, such as
decision trees [2,4] and hidden Markov models [18].
For distributed pattern recognition, there exist stud-
ies on distributed speech recognition [31] and dis-
tributed expert systems [22]. One particular problem
associated with most distributed sensor systems is that
each local observation from the distributed sensors is
biased and insufficient to classify all classes. For ex-
ample in our system, the sensors placed on the lower-
body would not perform well to classify those ac-
tions that mainly involve upper body motions, and vice
versa. Consequently, traditional majority-voting type
classifiers may not achieve the best performance glob-
ally.
Due to the unique mobility of wearable sensor net-
works, such systems have been applied to a vari-
ety of applications, especially in the area of human-
computer interaction. One dominant application in the
past has been single action detection for elderly peo-
ple, such as falling [29,9,27,7] and walking [24,3].
There have been other systems that tackle more gen-
eral problems of recognizing multiple different human
actions/activities that would be commonplace in peo-
ple’s daily lives [21,20,18,8]. The algorithm proposed
in this paper falls in the latter category.
1.2. Design of the Wearable Sensor Network
Our wearable sensor network consists of five sensor
nodes placed at different body locations (see Figure 1),
which communicate with a base station attached to a
computer server through a USB port. The sensor nodes
and base station are built using the commercially avail-
able Tmote Sky boards. Tmote Sky runs TinyOS on
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 3
an 8MHz microcontroller with 10K RAM and com-
municates using the 802.15.4 wireless protocol. Each
custom-built sensor board has a triaxial accelerometer
and a biaxial gyroscope, which is attached to Tmote
Sky (shown in Figure 2). Each axis is reported as a
12bit value to the node, indicating values in the range
of ±2g and ±500◦/s for the accelerometer and gyro-
scope, respectively.
Fig. 2. Illustration of a motion sensor node. The sensor board on the
top is a custom-built motion sensor with a triaxial accelerometer and
a biaxial gyroscope. The middle layer contains a Li-ion battery. The
sensor board on the bottom is a standard Tmote Sky network node.
The current hardware design of the sensor con-
tributes certain amounts of measurement error. The ac-
celerometers typically require some calibration in the
form of a linear correction, as sensor output under 1gmay be shifted up to 15% in some sensors. It is also
worth noting that the gyroscopes produce an indica-
tion of rotation under straight line motions. Fortunately
these systematic errors appear to be consistent across
experiments for a given sensor board. However, with-
out calibration to correct them, the errors may affect
the action recognition if different sets of sensors are
used interchangeably in the experiment. 2
To avoid packet collision in the wireless channel,
we use a time division multiple access (TDMA) proto-
col that allocates each node a specific time slot during
which to transmit data. This allows us to receive sensor
data at 30Hz with minimal packet loss. To avoid drift
in the network, the base station periodically broadcasts
a packet to resynchronize the nodes’ individual timers.
The code to interface with the sensors and transmit
data is implemented directly on the motes using nesC,
a variant of C.
2More sophisticated motion sensors do exist in the industry,
which can utilize heterogeneous sensor fusion techniques to self-
calibrate the accelerometer and gyroscope. One example is the Mi-
crostrain Gyro Enhanced Orientation Sensor at: http://www.
microstrain.com/.
1.3. Wearable Action Recognition Database
We have constructed a benchmark database for hu-
man action recognition using the above wearable mo-
tion sensor network, called Wearable Action Recogni-
tion Database (WARD). The purpose of WARD is to
offer a public and relatively stable data set as a plat-
form for quantitative comparison of existing and future
algorithms for human action recognition using wear-
able motion sensors. The database has been carefully
constructed under the following conditions:
1. The database contains sufficient numbers of hu-
man subjects with a large range of age differ-
ences.
2. The designed action classes are general enough
to cover most typical activities that a human sub-
ject is expected to perform in her daily life.
3. The locations of the wearable sensors are se-
lected to be practical for full-fledged commercial
systems.
4. The sampled action data contain sufficient varia-
tion, measurement noise, and outliers in order for
existing and future algorithms to meaningfully
examine and compare their performance.
The WARD database is available for download at:
http://www.eecs.berkeley.edu/~yang/software/
WAR/. The data are sampled from 7 female and 13
male human subjects (in total 20 subjects) with age
ranging from 19 to 75. The current version, version
1.0, includes the following 13 action categories: 1.
Stand (ST). 2. Sit (SI). 3. Lie down (LI). 4. Walk for-
ward (WF). 5. Walk left-circle (WL). 6. Walk right-
circle (WR). 7. Turn left (TL). 8. Turn right (TR). 9. Go
upstairs (UP). 10. Go downstairs (DO). 11. Jog (JO).
12. Jump (JU). 13. Push wheelchair (PU). For more
details about the data collection, please refer to the hu-
man subject protocol included in the WARD database.
The sensor data have been converted and saved in the
MATLAB environment. The database also includes a
MATLAB program to visualize the action data from
the five motion sensors.
1.4. Contribution
We propose a distributed action recognition algo-
rithm using up to five wearable motion sensors. The
work is inspired by an emerging theory of compressive
sensing [5,6]. We assume each action class satisfies a
low-dimensional subspace model. If a linear represen-
tation is sought to represent a valid test sample w.r.t.
4 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
all training samples, the dominant coefficients in the
sparsest representation correspond to the training sam-
ples from the same action class, and hence they encode
the membership of the test sample.
A distributed recognition system on wireless sensor
networks needs to further consider the following is-
sues:
1. How to extract compact and accurate low-dimensional
action features for local classification and trans-
mission over a band-limited network?
2. How to classify the local measurement efficiently
using low-power processors?
3. How to design a classifier to globally optimize
the recognition and adapt to the change of the
network?
4. Whether the accuracy of an action recognition
system is identity independent? That is, a good
classifier should only be sensitive to different ac-
tion classes, but neutral to the subject who per-
forms the actions.
We tackle these problems by proposing a novel
recognition framework consisting of the following
three integrated components: 1. Low-dimensional ac-
tion feature extraction. 2. Fast distributed classifiers
via `1-minimization. 3. An adaptive global classifier
on the base computer. The method can accurately
classify human actions from a continuous motion se-
quence. The local classifiers that reject potential out-
liers can reduce the sensor-to-server communication
to about 50%. One can also choose to activate only
a subset of the sensors on the fly due to sensor fail-
ure or network congestion. The global classifier is able
to adaptively update the optimization process and im-
prove the global classification upon available local de-
cisions. Finally, in the experiment, we examine the
identity-independence performance on a test sequence
by excluding the training samples of the same subject.
Note that a similar algorithm was previously pub-
lished in a manuscript [28]. In comparison, [28] mainly
discusses simultaneous segmentation and classifica-
tion of transient actions, such as from standing to sit-
ting, from sitting to lying down, and bending. In this
paper, we discuss classification of continuous actions.
The preliminary results shown in [28] only contain
recognition results from three human subjects with age
ranging from 28 to 32. In this paper, the system uti-
lizes the much larger WARD benchmark to validate its
performance.
The rest of the paper is organized as follows. Sec-
tion 2 proposes a unified classification algorithm via
a novel sparse representation framework on individual
motion sensors to classify human actions with local
bias. Section 3 further proposes a global classification
algorithm on a base computer that receives action fea-
tures from active sensors in the network and adaptively
boost the recognition upon individual sensor decisions.
Finally, we demonstrate the performance of the overall
algorithm based on the WARD benchmark in Section
4.
2. Classification via Sparse Representation
We first define the problem of distributed action
recognition.
Problem 1 (Distributed Action Recognition) Assume
a set of L wearable sensor nodes with triaxial ac-
celerometers (x, y, z) and biaxial gyroscopes (θ, ρ)are attached to the human body. Denote
aj(t).= (xj(t), yj(t), zj(t), θj(t), ρj(t))
T ∈ R5 (1)
as the measurement of the five readings on node j at
time t, and
a(t).= (aT
1 (t),aT2 (t), · · · ,aT
L(t))T ∈ R5L (2)
collects all L sensors at time t. Further denote
s = (a(1),a(2), · · · ,a(l)) ∈ R5L×l (3)
as an action segment of length l in time.
Given K different classes of human actions, a set
of ni training examples {si,1, · · · , si,ni} are collected
for each ith class, all of which have the same dura-
tion l. Given a new test sequence s, we seek a dis-
tributed algorithm to classify the action into one of the
K categories, or reject the action as an invalid mea-
surement. Finally, given continuous measurements of
different human activities, determine an optimal dura-
tion parameter l to extract training samples and test
samples s.
In this section, our focus should be an action clas-
sification method on each sensor node assuming an
action segment of a fixed duration l. Given sj =(aj(1),aj(2), · · · ,aj(l)) ∈ R
5×l on node j, define a
new vector sSj as the stacking of the l columns of sj :
sSj
.= (aj(1)T ,aj(2)T , · · · ,aj(l)
T )T ∈ R5l. (4)
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 5
We will interchangeably use sj to denote the stacked
vector sSj without causing ambiguity.
Subsequently, we define a full-body action vector v
that stacks the measurement from all L nodes:
v.= (sT
1 , sT2 , · · · , sT
L)T ∈ RD, (5)
where D = D1 + · · · + DL = 5lL.
In this paper, we assume the samples v in an action
class satisfy a subspace model, called an action sub-
space. If the training samples {v1, · · · ,vni} of the ith
class sufficiently span the ith action subspace, given a
test sample y = (yT1 , · · · ,yT
L)T ∈ RD in the same
class i, y can be linearly represented using the training
examples of the same class:
y = α1v1 + · · · + αnivni
⇔
y1
y2
...yL
=
s1
s2
...sL
1
· · ·
s1
s2
...sL
ni
α1
α2
...αni
.(6)
It is important to note that such linear constraint also
holds for each node j in (6):
yj = α1sj,1 + · · · + αnisj,ni
∈ RDj . (7)
In theory, complex data such as human actions typ-
ically constitute more complex nonlinear models. The
linear models are used to approximate such nonlin-
ear structures in a higher-dimensional subspace (see
Figure 3). Notice that such linear approximation may
not produce good estimation of the distance/similarity
metric for the samples on the manifold. However, as
we will show in Example 1, given sufficient samples
on the manifold as training examples, a new test sam-
ple can be accurately represented on the subspace, pro-
vided that any two classes do not have similar subspace
models.
Fig. 3. Modeling a 1-D manifold M using a 2-D subspace V .
To recover label(y), a previous study [30] proposes
to reformulate the recognition using a sparse represen-
tation: Since label(y) = i is unknown, we can rep-
resent y using all the training samples from all Kclasses:
y = (A1, A2, · · · , AK)
x1
x2
...xK
= Ax, (8)
where
Ai = (vi,1,vi,2, · · · ,vi,ni) ∈ R
D×ni (9)
collects all the training samples of class i,
xi = (αi,1, αi,2, · · · , αi,ni)T ∈ R
ni (10)
collects the corresponding coefficients in (6), and A ∈R
D×n where n = n1+n2+· · ·+nK . Since y satisfies
both (6) and (8), one solution of x in (8) should be
x∗ = (0, · · · , 0,xT
i , 0, · · · , 0)T . (11)
The solution is naturally sparse: in average only 1K
terms in x∗ are nonzero.
It is important to note that, on each sensor j in this
section, solution x∗ of (8) is also a solution for the
representation:
yj = (A(j)1 , A
(j)2 , · · · , A
(j)K )x = A(j)
x, (12)
where A(j)i ∈ R
Dj×ni consists of row vectors in Ai
that correspond to the jth node. Hence, x∗ can be
solved either globally using (8) or locally using (12),
provided that the action data measured on each node
are sufficiently discriminant. We will come back to the
discussion about local classification versus global clas-
sification in Section 3. In the rest of this section how-
ever, our focus will be on each node.
One major difficulty in solving (12) is the high di-
mensionality of the action data. In compressive sens-
ing [5,6], one reduces the dimension of a linear system
by choosing a linear projection Rj ∈ Rd×Dj :3
yj
.= Rjyj = RjA
(j)x
.= A(j)
x ∈ Rd. (13)
After projection Rj , typically the feature dimension
d is much smaller than the number n of all train-
ing samples. Therefore, the new linear system (13)
is underdetermined. Numerically stable solutions ex-
3Notice that Rj is not computed on the sensor node. These matri-
ces are computed offline and simply stored on each sensor node.
6 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
ist to uniquely recover sparse solutions x∗ via `1-
minimization[10]:
x∗ = arg min ‖x‖1 subject to yj = A(j)
x. (14)
These routines include (orthogonal) matching pursuit
(MP), basis pursuit (BP), the LASSO.4
In our experiment, we have tested multiple projec-
tion operators including PCA, LDA, locality preserv-
ing projection (LPP) [13], and random project stud-
ied in [30]. We found that 40-D feature spaces us-
ing LPP produces the best recognition in a very low-
dimensional space. Throughout this paper, we will use
40-D LPP features to represent local motions mea-
sured on sensor nodes.5
After the (sparsest) representation x is recovered,
we project the coefficients onto each action subspaces
δi(x) = (0, · · · , 0,xTi , 0, · · · , 0)T ∈ R
n, i = 1, · · · , K.
(15)
Subsequently, the membership of the test sample yj is
assigned to the class with the smallest residual
label(yj) = arg mini
‖yj − A(j)δi(x)‖2. (16)
The overall algorithm deployed on each sensor node
is summarized in Algorithm 1, which is called local
sparsity classifier (LSC).
Example 1 (Classification on Nodes) We demonstrate
the recognition accuracy of LSC on individual nodes
based on the WARD database. First, we look for fast
sparse solvers in the literature. We found that BP [11]
gives the best trade-off between speed, noise tolerance,
and recognition accuracy.
We design the training set and the test set as follows.
For each motion sequence in the WARD database, we
4The implementation of these routines is available in a MATLAB
toolbox called SparseLab: http://sparselab.stanford.
edu.5The choice of an “optimal” low-dimensional feature space is not
the emphasis of this paper. On one hand, a practitioner may easily
replace LPP with other feature spaces without modification of the
algorithm. On the other hand, a previous result in [30] has shown that
the accuracy of sparse representation via `1-minimization converges
among different linear projections, as long as the dimension of the
feature space is sufficiently high. The result renders the choice of a
particular feature space not very significant in solving for a sparse
representation.
Algorithm 1 : Local Sparsity Classifier (LSC).
Input: A set of training samples A(j) = ( sj,1 ··· sj,n ),a test sample yj on a sensor node j, and a linear pro-
jection matrix Rj .
1: Projection: yj = Rjyj .
2: x∗ = arg min ‖x‖1 subject to yj = RjA
(j)x.
3: label(yj) = arg mini=1,··· ,K ‖yj −
RjA(j)δi(x)‖2.
Output: label(yj), action feature yj , and x∗.
randomly sample 10 segments of length l in the train-
ing set. In total, there are 20 × 13 × 5 × 10 = 13000training examples. During the testing, LSC attempts
to classify all continuous segments of length l in the
WARD database. With respect to each subject, the cor-
responding training examples will be excluded from
the training set before classification. Therefore, any
test subject is not present in the training set, and the
recognition is subject independent. In the experiment,
we found that l = 45 is a short action duration that
yields satisfactory performance, which corresponds to
1.5 seconds given the 30 Hz sampling rate.
Figure 4 illustrates an example of sparse represen-
tation x and its corresponding residuals estimated on
the first node (left wrist) of a jumping sequence (Action
12).
Fig. 4. Top: Sparse `1 solution by BP of a jumping motion on the
left wrist node. Bottom: Reconstruction residuals with respect to the
13 action categories. The test sample is correctly classified as Class
12. SCI(x) = 0.335 (see (17))
Table 1 shows the recognition accuracy of LSC.
There should be no surprise that LSC alone based on
single node measurement of human activities does not
produce good classification, as many human activities
engage movements at multiple body parts. For exam-
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 7
ple, nodes at the two ankle positions cannot differenti-
ate walking forward and pushing wheelchair because
the feet engage similar movements for both categories.
In Table 1, we also show the performance of a simple
global classifier: majority voting. If all the local deci-
sions are collected and a majority vote is chosen as the
overall classification of the test action, LSC achieves
90.2% accuracy. This will become a baseline bench-
mark to compare with an adaptive classifier we will
introduce in the next section.
Table 1
Recognition accuracy via LSC on each node. The last column (1–5)
shows the recognition accuracy using majority voting.
Sen # 1 2 3 4 5 1–5
Acc [%] 65.08 61.26 63.9 78.56 77 90.2
Nearest neighbor (NN) is one of the popular meth-
ods used in sensor networks for classification. Table 2
shows the recognition accuracy of NN on the WARD
database. We compare Table 1 and Table 2. Because
the inherent correlation between the distributed mo-
tion sensors are not considered beyond the majority-
voting process, the two algorithms generate very sim-
ilar global recognition accuracy. Using majority vot-
ing, nearest neighbor achieves 90.5%.
Table 2
Recognition accuracy via nearest neighbor on each node. The last
column (1–5) shows the recognition accuracy using majority voting.
Sen # 1 2 3 4 5 1–5
Acc [%] 64.9 59.3 67.4 80.3 76.0 90.5
3. Adaptive Global Recognition
In this section, we introduce an adaptive frame-
work to optimize a global classification based on all
the available distributed sensor data. First, we discuss
an outlier rejection criterion to identify invalid mo-
tion samples measured on the individual sensor nodes.
The invalid samples would not be sent to the global
classifier that we will introduce later. The ability to
locally reject invalid measurement reduces the power
consumption on the sensor nodes to communicate with
the network station, as we will show in Section 4.
Based on the previous sparsity assumption, if yj is
not a valid segment on node j w.r.t. the training ex-
amples A(j), the dominant coefficients of its sparsest
representation x should not correspond to any single
class. We utilize a sparsity concentration index (SCI)
[30]:
SCI(x).=
K · maxj=1,··· ,K ‖δj(x)‖1/‖x‖1 − 1
K − 1∈ [0, 1].
(17)
If the nonzero coefficients of x are evenly distributed
among K classes, then SCI(x) = 0; if all the nonzero
coefficients are associated with a single class, then
SCI(x) = 1. Therefore, we introduce a sparsity
threshold τ1 applied on individual sensor nodes: If
SCI(x) > τ1, the motion sample is a valid local mea-
surement, and its 40-D LPP features y will be sent to
the base station; otherwise, the sample will be ignored.
It is important to note that a local measurement that
is labeled as a valid sample w.r.t. τ1 may not truly
correspond a valid human action when multiple sen-
sor data are jointly considered based on the training
actions defined in the WARD database. For example,
WF, UP, and DO all involve similar upper body move-
ments; on the other hand, if a subject only tries to
mimic a WF motion by moving the upper body but not
the lower body, the movement becomes an invalid ac-
tion when both the upper body data and the lower body
data are jointly considered. Therefore, a global con-
straint is needed to reject such invalid samples, which
will be discussed next.
Suppose at time t, the base station receives L′ action
features from the active sensors (L′ ≤ L). Without loss
of generality, assume these features are from the first
L′ sensors: y1, y2, · · · , yL′ .
Denote
y′ = (yT
1 , · · · , yTL′)T ∈ R
dL′
. (18)
Then the global sparse representation x of y′ satisfies
the following linear system
y′ =
(
R1 ··· 0 ··· 0
.... . .
......
0 ··· RL′ ··· 0
)
Ax = R′Ax = A′x, (19)
where R′ ∈ RdL′
×D is a new projection matrix that
only extracts the action features from the first L′ nodes.
Consequently, the effect of changing active sensor
nodes for the global classification is formulated via the
global projection matrix R′. During the transforma-
tion, the data matrix A and the sparse representation
x remain unchanged. The linear system (13) then be-
comes a special case of (19) where L′ = 1.
8 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
Similar to the outlier rejection criterion on each
node, we introduce a global rejection threshold τ2. If
SCI(x) > τ2 in (19), the most significant coefficients
in x are concentrated in a single training class. Hence
y′ is assigned to that class. Otherwise, the sample will
be rejected as an outlier. The overall algorithm on the
network station is summarized in Algorithm 2, which
is called distributed sparsity classifier (DSC). DSC
provides a unified solution to detect and classify action
segments in a network of body sensors using only two
simple parameters τ1 and τ2.
Algorithm 2 : Distributed Sparsity Classifier
(DSC).
Input: A set of stacked training samples A ={v1, · · · ,vn} from sensors 1, · · · , L, test sample y of
action features measured from L active sensors, and
sparsity parameters τ1, τ2.
1: for all each sensor 1 ≤ j ≤ L do
2: Solve for sparse representation x∗ using Algo-
rithm 1with parameters A(j) and yj .
3: If SCI(x∗) > τ1, send feature vector yj to the
network station.
4: end for
5: Collect all valid features y′, construct correspond-
ing training matrix A′.
6: Solve x∗ = arg min ‖x‖1 subject to y
′ = R′A′x.
7: if SCI(x∗) > τ2 then
8: label(y) = arg mini=1,··· ,K ‖y′ −R′A′δi(x)‖2.
9: else
10: label(y) = −1 (outlier).
11: end if
Output: label(y).
Example 2 (Distributed Sparsity Classifier) Consider
Action 13 in the WARD database, i.e., PU (pushing a
wheelchair). While the upper body motion of this ac-
tion is quite distinct, the lower body motion often re-
sembles several other actions in the database, such as
WF and UP. Figure 5 illustrates the `1 solutions on the
five individual sensor nodes.
First, we observe that the local sparsity classi-
fier (LSC) returns five different labels w.r.t. to the lo-
cal measurement on the five sensors. It shows that
majority-voting type solutions mostly should fail to
correctly classify this motion. Second, using a thresh-
old τ1 against the SCI values of the representations,
we can reject certain number of the local motions as
invalid measurements.
Assume τ1 = 0.1 is selected for all five sensors, then
measurements from Sensors 1 and 2 will be rejected
and DSC solves for a sparse representation using the
three 40-D action features from Sensors 3, 4, and 5.
Figure 6 shows the global `1 solution of (19), and the
full-body motion is correctly classified as from Action
13.
Fig. 6. Top: DSC sparse representation of a sample from action 13
in Figure 5. Assume τ1 = 0.1 and τ2 = 0.08, and Sensors 1 and
2 are rejected. Bottom: Reconstruction residuals with respect to the
13 action categories. The test sample is correctly classified as Class
13.
Notice that at the node level, none of Sensors 3, 4,
and 5 correctly classifies the action based on the avail-
able local observations, because they are also simi-
lar to other actions such as UP, TR, and WR. How-
ever, when the measurements from multiple sensors are
combined in (19) to represent the full-body motion, the
incorrect local decisions are rectified. Such ability is
the main reason that the proposed DSC framework can
outperform other majority-voting type algorithms. We
will examine the performance of DSC in more detail in
Section 4.
The DSC method compares favorably to other clas-
sical methods such as NN and decision trees, because
these methods need to train multiple thresholds and
outlier rejection rules when the number L′ and the set
of available sensors vary in the full-body action vector
y′ = (yT
1 , · · · , yTL′)T . Particularly, a global nearest-
neighbor (GNN) algorithm can be modeled as a special
case of sparse representation. Suppose in (19) there are
L′ active sensors and denote A′ = (v′
1,v′
2, · · · ,v′
n).
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 9
(a) Sparse representation of the left wrist motion. Local
classification label is 13 (PU).
(b) Sparse representation of the right wrist motion. Local
classification label is 4 (WF).
(c) Sparse representation of the waist motion. Local clas-
sification label is 9 (UP).
(d) Sparse representation of the left ankle motion. Local
classification label is 8 (TR).
(e) Sparse representation of the right ankle motion. Lo-
cal classification label is 6 (WR).
Fig. 5. Illustration of a PU motion (action 13) classified on individual sensor nodes. Each LSC estimates a different action category that correlates
to the true action. Compared to Figure 4, these solutions have much lower SCI values.
Then GNN solves for the following sparse representa-
tion of y′:
x∗ = (0, · · · , 0, 1i, 0, · · · , 0)T
subject to i = arg minj ‖y′ − v
′
j‖2.(20)
The optimal solution x∗ for GNN is clearly sparse with
only one nonzero coefficient corresponding to the clos-
est neighbor of y′ in the training set A′. The formula-
tion also generates to k-nearest-neighbors (kNN) and
other similar variations.
Finally, we consider how the change of active nodes
affects `1-minimization and the classification of the
actions. In compressive sensing, the efficacy of `1-
minimization in solving for the sparsest solution x in
(19) is characterized by the `0/`1 equivalence relation
[10,11]. A necessary and sufficient condition for the
equivalence to hold is the k-neighborliness of A′. As
a special case, one can show that if x is the sparsest
solution in (19) for L′ = L, x is also a solution for
L′ < L. Hence, the decrease of L′ leads to possible
sparser solutions of x.
On the other hand, the decrease in available action
features also makes y′ less discriminant. For example,
if we reduce L′ = 1 and only activate a wrist sen-
sor, then the `1-solution x may have nonzero coeffi-
cients associated to multiple actions with similar wrist
motions, albeit sparser. This is an inherent problem
for any method to classify human actions using a lim-
ited number of motion sensors. In theory, if two action
subspaces in a low-dimensional feature space have a
small subspace distance after the projection, the corre-
sponding sparse representation cannot distinguish the
test samples from the two classes. We will demonstrate
in Section 4 that indeed reducing the available mo-
10 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
tion sensors will reduce the discriminant power of the
sparse representation in a lower-dimensional space.
4. Experiment
In this section, we conduct extensive experiments to
examine the performance of the DSC framework us-
ing the WARD database. Two different scenarios are
considered. First, we calculate the classification accu-
racy with different subsets of motion sensors available
in the network. This experiment is intended to verify
that DSC is adaptive to the change of network config-
uration on-the-fly due to real-world conditions such as
sensor failure, battery failure, and network congestion.
Second, we consider the effect of the local outlier re-
jection threshold τ1 to the accuracy of the global clas-
sification: Higher rejection thresholds save power con-
sumption in communication at the expense of less lo-
cal information available to the global classifier, and
vice versa. It is important to note that to measure the
performance under the identity-independence assump-
tion, all training examples of a test subject should be
excluded from the training set during the experiment.
For each motion sequence in the WARD database, we
randomly sample 10 segments of length l = 45 as
training examples.
4.1. Classification with Different Network
Configurations
We first test the performance of DSC by manually
eliminating certain number of available sensors in the
network. Based on the total number L′ of LPP feature
vectors received, DSC is able to update the classifica-
tion criterion (19) on-the-fly and adapts to the poten-
tially adversary condition. Table 3 shows the perfor-
mance of the algorithm, which is quantified by false
positive rate (FPR), verification rate (VR), and active
sensor rate (ASR).6 For all the trials, the outlier rejec-
tion thresholds τ1 and τ2 are set to be 0.08, respec-
tively. The duration l of the test action length is set
to be 45, which corresponds to 1.5 seconds given the
30 Hz sampling rate. When all continuous action seg-
ments of length 45 are classified in the experiment, the
total number of test samples amounts to 500828.
6FPR is the percentage of samples that are either true outliers
falsely classified as inliers or true inliers assigned to the wrong
classes. VR is the percentage of samples that are correctly classified
as inliers. Note that the WARD database does not purposely con-
Table 3
Performance of DSC measured by false positive rate (FPR), verifi-
cation rate (VR), and active sensor rate (ASR).
Sen # 1-5 1,3,4 1,4 1,3 3,4
FPR [%] 7.14 8 11.49 17.97 14.63
VR [%] 94.59 96.84 98.19 95.57 97.28
ASR [%] 91.85 54.82 37.66 35.58 36.76
We compare the performance of DSC to the conven-
tional solution of GNN (20). Since the WARD does
not purposely contain outliers, we did not use any out-
lier rejection rule in searching for nearest neighbors,
which could be difficult to tune when the available ac-
tion features change on-the-fly. Table 4 shows the per-
formance of the algorithm. Compared with Table 2,
we observe that there is no improvement w.r.t. classi-
fication using all five sensors. In fact, the accuracy in
Table 4 is lower than the accuracy of 90.5% in Table
2 using majority-voting. This result demonstrates the
dependency of NN-type algorithms toward the (dense)
distribution of training examples in a high-dimensional
data space. Compared with Table 3, GNN also under-
performs DSC. For example, DSC outperforms GNN
by about 6% using Sensors (1, 3, 4), and about 9% us-
ing Sensors (1, 3).
Table 4
Performance of GNN measured by false positive rate (FPR), verifi-
cation rate (VR), and active sensor rate (ASR).
Sen # 1-5 1,3,4 1,4 1,3 3,4
FPR [%] 10.64 14.54 13.93 26.88 18.27
ASR [%] 100 60 40 40 40
We further analyze the classification between differ-
ent action categories. Table 5 shows a confusion table
of the DSC results using all the five sensors in accu-
racy percentage. The confusion table clearly indicates
several action categories that mostly contribute to the
false positive rate.
1. We observe that three action categories, i.e., ST,
SI, and LI, have the highest misclassification.
Particularly, it is difficult to differentiate between
standing and sitting in the WARD database us-
ing both DSC and NN (whose confusion table is
not shown in this paper). We argue that the prob-
lem is mainly contributed by the choice of the lo-
tain outlying actions, hence FPR is equal to one minus the accuracy
percentage.
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 11
cations of the two low-body sensors at the ankle
locations, because human subjects do not have
to move the ankles to perform both standing and
sitting actions, and inherently the change of the
orientation of the waist sensor is also small be-
tween standing and sitting. To improve the clas-
sification of the three action categories, one so-
lution could be to introduce new sensor locations
around the knees and the thighs.7
2. Between the actions WF, WL, and WR, the al-
gorithm in fact performs better than once would
have expected, because the difference of the three
actions is small. For example, 2.5% of the WF
action is misclassified as WL, 1.6% misclassi-
fied as WR, and furthermore 2.3% misclassified
as PU. These are all actions that are similar in
nature.
3. Despite the similarity of local motions between
PU and several other motions, the recognition
of PU is quite accurate. The last row of Table
5 shows that about 0.1% to 0.3% test samples
are misclassified as 10 of the other 12 categories.
Nevertheless, the true positive rate of PU is above
98%.
4.2. Classification with Different Rejection
Thresholds
In this experiment, we test the effect of different lo-
cal rejection threshold τ1 on the global classification.
During the experiment, the global rejection threshold
τ2 is fixed at 0.08. Table 6 shows the performance
of the DSC algorithm. First, naturally ASR decreases
as τ1 increases. Particularly, compared to ASR =91.85% when τ1 = 0.08, the rate is reduced to 45.58%
when τ1 = 0.18, which means in average less than half
of the sensors transmit action features during the ex-
periment. With more than half of the sensors inactive
in the network to conserve power consumption, the ex-
periment shows that DSC still achieves below 8% FPR
globally, and VR is above 88%. The result corrobo-
rates the design principle of the DSC algorithm that the
distributed classification framework via sparse repre-
sentation is capable of effectively reducing the power
consumption on communication yet at the same time
perserving highly accurate recognition accuracy.
7In a previous study [28], we also suggested that the sensors
placed at the ankle locations tend to provide less action information
than the other conventional locations such as the knees and the waist.
Table 6
Recognition accuracy of DSC with different local rejection thresholds.
τ1 0.08 0.12 0.18
ASR [%] 91.85 72.19 45.58
FPR [%] 7.14 7.58 7.96
VR [%] 94.59 91.03 88.33
5. Conclusion and Discussion
Inspired by the emerging compressive sensing the-
ory, we have proposed a distributed algorithm, i.e., dis-
tributed sparsity classifier (DSC), to classify human
actions/activities on a wearable motion sensor net-
work. The framework provides a unified solution based
on `1-minimization to classify valid action segments
and reject outlying actions on the sensor nodes and the
base station. We have shown through our experiment
that a set of 13 action classes can be accurately repre-
sented and classified using a set of 40-D LPP features
measured at multiple body locations. The proposed
global classifier can adaptively adjust the global opti-
mization to boost the recognition upon available local
measurements. To corroborate the validity of the algo-
rithm, and to safeguard the reproducibility of the sys-
tem performance, we have published an open bench-
mark database called WARD with this paper. The high
recognition accuracy on the WARD database indicates
that DSC should be able to classify other action cat-
egories such as falling, bicycling, and hand motions
with similar high accuracy.
One important observation w.r.t. to the choice of
sensor locations on the human body is that the mo-
tion measurements from the ankle locations may not
discriminate certain categories of upper-body motions
and even lower-body motions. We have suggested to
replace the ankle locations with other locations around
the knees and thighs in order to improve the classi-
fication. Another limitation in the current system and
most other body sensor systems is that the wearable
sensors need to be firmly positioned at the designated
locations. However, a more practical system/algorithm
should tolerate certain degrees of shift without sacri-
ficing the accuracy. In this case, the variation of the
measurement for different action classes would in-
crease substantially. One open question is what low-
dimensional linear/nonlinear models one may use to
model such more complex data, and whether the sparse
representation framework can still apply to approxi-
mate such structures with limited numbers of training
examples. A potential solution to this question will be
12 A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks
Table 5
Confusion table of the 13 action classes for DSC using sensors 1–5
(in percentage).
1 2 3 4 5 6 7 8 9 10 11 12 13
1 (ST) 87.2 10.2 0.7 0 0 0 0.1 1.8 0 0 0 0 0
2 (SI) 25.2 66.8 6.8 0 0 0 0.1 0.1 0 0.1 0 0.1 0.7
3 (LI) 2.6 5.1 91.8 0 0 0 0 0 0 0 0 0.1 0.3
4 (WF) 0 0 0 92 2.5 1.6 0.2 0.2 0.4 0.7 0 0.2 2.3
5 (WL) 0.1 0 0 0.2 97.3 0 0.6 0.3 0.3 0.1 0.1 0.2 1
6 (WR) 0 0 0 0.1 0.1 95.7 0.2 0.4 0.4 0.4 0.5 0.2 2
7 (TL) 0 0 0 0 0.6 0 97 2.3 0 0 0 0 0.1
8 (TR) 0 0 0 0 0 1.6 3.1 95.2 0 0 0 0 0
9 (UP) 0 0 0 0 0 0 0 0 98 0.1 1.6 0.1 0.2
10 (DO) 0 0 0 0.2 0.1 0 0 0 0.1 98.3 0 0.5 0.8
11 (JO) 0 0 0 0 0 0 0 0 0.5 0 99.3 0.1 0.1
12 (JU) 0.1 0 0 0 0 0 0 0 0.3 0.6 0.5 97.9 0.5
13 (PU) 0.3 0.1 0 0.1 0.2 0.1 0.1 0.1 0 0.2 0.2 0.1 98.6
a meaningful step forward both in theory and in prac-
tice.
Acknowledgments
We would like to thank Sameer Iyengar, Victor Shia,
and Posu Yan at the University of California, Berkeley,
Dr. Philip Kuryloski at the Cornell University, Kather-
ine Gilani at the University of Texas at Dallas, Ville-
Pekka Seppa at the Tampere University of Technology,
Finland, and Dr. Marco Sgroi and Roberta Giannanto-
nio at Telecom Italia for their kindly help in design-
ing the wearable motion sensor system and the WARD
database.
References
[1] R. Aylward and J. Paradiso, A compact, high-speed, wearable
sensor network for biomotion capture and interactive media,
Proceedings of the International Conference on Information
Processing in Sensor Networks, 380–389, 2007.
[2] L. Bao and S. Intille, Activity recognition from user-annotated
acceleration data, Proceedings of the International Conference
on Pervasive Computing, 1–17, 2004.
[3] P. Barralon, N. Vuillerme, and N. Noury, Walk detection with a
kinematic sensor: Frequency and wavelet comparison, Proceed-
ings of the 28th IEEE EMBS Annual International Conference,
1711–1714
[4] A. Benbasat and J. Paradiso, Groggy wakeup - automated gener-
ation of power-efficient detection hierarchies for wearable sen-
sors, Proceedings of International Workshop on Wearable and
Implantable Body Sensor Networks, 2007.
[5] E. Candès, Compressive sampling, Proceedings of the Interna-
tional Congress of Mathematicians, 1–20, 2006.
[6] E. Candès and T. Tao, Near-optimal signal recovery from ran-
dom projections: Universal encoding strategies?, IEEE Trans-
actions on Information Theory, vol. 52, No. 12, 5406–5425,
2006.
[7] J. Chen, K. Kwong, D. Chang, J. Luk, and R. Bajcsy, Wear-
able sensors for reliable fall detection, Proceedings of the IEEE
Engineering in Medicine and Biology Conference, 3551–3554,
2005.
[8] T. Choudhury, S. Consolvo, B. Harrison, J. Hightower,
A. LaMarca, L. LeGrand, A. Rahimi, A. Rea, G. Borriello,
B. Hemingway, P. Klasnja, K. Koscher, J. Landay, J. Lester, and
D. Wyatt, The mobile sensing platform: An embedded activity
recognition system, Pervasive Computing, 32–41, 2008.
[9] T. Degen, H. Jaeckel, M. Rufer, and S. Wyss, SPEEDY: A fall
detector in a wrist watch, Proceedings of the IEEE International
Symposium on Wearable Computers, 184–187, 2003.
[10] D. Donoho, Neighborly polytopes and sparse solution of un-
derdetermined linear equations, (preprint) 2005.
[11] , D. Donoho and M. Elad, On the stability of the basis pursuit
in the presence of noise, Signal Processing, vol. 86, 511–532,
2006.
[12] J. Farringdon, A. Moore, N. Tilbury, J. Church, and
P. Biemond, Wearable sensor badge & sensor jacket for con-
text awareness, Proceedings of the International Symposium on
Wearable Computers, 107–113, 1999.
[13] X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang, Face recogni-
tion using Laplacianfaces, IEEE Trans. on Pattern Analysis and
Machine Intelligence, vol. 27, no. 3, 328–340, 2005.
[14] E. Heinz, K. Kunze, and S. Sulistyo, Experimental evaluation
of variations in primary features used for accelerometric con-
text recognition, Proceedings of the European Symposium on
Ambient Intelligence, 252–263, 2003.
[15] T. Huynh and B. Schiele, Analyzing features for activity recog-
nition, Proceedings of the Joint Conference on Smart Objects
and Ambient Intelligence, 159–163, 2005.
A. Yang et al. / Distributed recognition of human activities using wearable motion sensor networks 13
[16] H. Kemper and R. Verschuur, Validity and reliability of pe-
dometers in habitual activity research, European Journal of Ap-
plied Physiology, vol. 37, No. 1, 71–82, 1977.
[17] N. Kern, B. Schiele, and A. Schmidt, Multi-sensor activity con-
text detection for wearable computing, Proceedings of the Euro-
pean Symposium on Ambient Intelligence, 220–232, 2003.
[18] P. Lukowicz, J. Ward, H. Junker, M. Stäger, G. Tröster,
A. Atrash, and T. Starner, Recognizing workshop activity using
body worn microphones and accelerometers, Proceedings of the
International Conference on Pervasive Computing, 18–32, 2004.
[19] J. Mantyjarvi, J. Himberg, and T. Seppanen, Recognizing hu-
man motion with multiple acceleration sensors, Proceedings of
the IEEE International Conference on Systems, Man and Cyber-
netics, 747–752, 2001.
[20] T. Martin, B. Majeed, B. Lee, and N. Clarke, Fuzzy ambient in-
telligence for next generation telecare, Proceedings of the IEEE
International Conference on Fuzzy Systems, 894–901, 2006.
[21] M. Mathie, A. Coster, N. Lovell, and B. Celler, Accelerometry:
Providing an integrated, practical method for long-term, ambu-
latory monitoring of human movement, Physiological Measure-
ment, vol. 25, R1–R20, 2004.
[22] J. Morrill, Distributed recognition of patterns in time series
data, Communications of the ACM, vol. 41, No. 5, 45–51, 1998.
[23] B. Najafi, K. Aminian, A. Parschiv-Ionescu, F. Loew, C. Büla,
and P. Robert, Ambulatory system for human motion analysis
using a kinematic sensor: Monitoring of daily physical activity
in the elderly, IEEE Transactions on Biomedical Engineering,
vol. 50, No. 6, 711-723, 2003.
[24] I. Pappas, T. Keller, S. Mangold, M. Popovic, V. Dietz, and
M. Morari, A reliable gyroscope-based gait-phase detection
sensor embedded in a shoe insole, IEEE Sensors Journal, vol. 4,
No. 2, 268–274, 2004.
[25] S. Pirttikangas, K. Fujinami, and T. Nakajima, Feature selec-
tion and activity recognition from wearable sensors, Proceed-
ings of the International Symposium on Ubiquitous Computing
Systems, 2006.
[26] C. Sadler and M. Martonosi, Data compression algorithms for
energy-constrained devices in delay tolerant networks, Proceed-
ings of the ACM Conference on Embedded Networked Sensor
Systems, 265–278, 2006.
[27] A. Sixsmith and N. Johnson, A smart sensor to detect the falls
of the elderly, Pervasive Computing, 42–47, 2004.
[28] A. Yang, R. Jafari, P. Kuryloski, S. Iyengar, S. Sastry, and
R. Bajcsy, Distributed segmentation and classification of human
actions using a wearable sensor network, Proceedings of the
CVPR Workshop on Human Communicative Behavior Analy-
sis, 2008.
[29] G. Williams, K. Doughty, K. Cameron, and D. Bradley, A smart
fall and activity monitor for telecare applications, Proceedings
of the IEEE International Conference in Medicine and Biology
Society, 1998.
[30] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, Ro-
bust face recognition via sparse representation, (in press) IEEE
Transactions on Pattern Analysis and Machine Intelligence,
2008.
[31] W. Zhang, L. He, Y. Chow, R. Yang, and Y. Su, The study on
distributed speech recognition system, Proceedings of the IEEE
International Conference on Acoustics, Speech, and Signal Pro-
cessing, 1431–1434, 2000.
Distributed Segmentation and Classification of Human Actions
Using a Wearable Motion Sensor Network∗
Allen Y. Yang, Sameer Iyengar,
Shankar Sastry, Ruzena Bajcsy
Department of EECS
University of California, Berkeley
Philip Kuryloski
Department of ECE
Cornell University
Roozbeh Jafari
Department of EE
University of Texas, Dallas
Abstract
We propose a distributed recognition method to classify
human actions using a low-bandwidth wearable motion sen-
sor network. Given a set of pre-segmented motion sequences
as training examples, the algorithm simultaneously segments
and classifies human actions, and it also rejects outlying ac-
tions that are not in the training set. The classification is
distributedly operated on individual sensor nodes and a base
station computer. We show that the distribution of multiple
action classes satisfies a mixture subspace model, one sub-
space for each action class. Given a new test sample, we
seek the sparsest linear representation of the sample w.r.t. all
training examples. We show that the dominant coefficients in
the representation only correspond to the action class of the
test sample, and hence its membership is encoded in the rep-
resentation. We further provide fast linear solvers to compute
such representation via `1-minimization. Using up to eight
body sensors, the algorithm achieves state-of-the-art 98.8%
accuracy on a set of 12 action categories. We further demon-
strate that the recognition precision only decreases grace-
fully using smaller subsets of sensors, which validates the
robustness of the distributed framework.
1. Introduction
We study human action recognition using a distributed
wearable motion sensor network. Action recognition has
been studied to a great extent in computer vision in the past.
Compared with a model-based or appearance-based vision
system, the body sensor network approach has the following
advantages: 1. The system does not require to instrument the
environment with cameras or other sensors. 2. The system
has the necessary mobility to support continuous monitoring
∗Corresponding author: [email protected]. This work
was partially supported by ARO MURI W911NF-06-1-0076, NSF TRUST
Center, and the startup funding from the University of Texas and Texas In-
struments.
of a subject during her daily activities. 3. With the continuing
miniaturization of mobile processors and sensors, it has be-
come possible to manufacture wearable sensor networks that
densely cover the human body to record and analyze very
small movements of the human body (e.g., breathing and
spine movements). Such sensor networks can be used in ap-
plications such as medical-care monitoring, athlete training,
tele-immersion, and human-computer interaction (e.g., inte-
gration of accelerometers in Wii game controllers and smart
phones).
Figure 1. A wireless body sensor system.
In traditional sensor networks, the computation carried by
the sensor board is fairly simple: Extract certain local in-
formation and transmit the data to a computer server over
the network for processing. In this paper, we propose a new
method for distributed pattern recognition. In such system,
each sensor node will be able to classify local, albeit biased,
information. Only when the local classification detects a pos-
sible object/event does the sensor node become active and
transmit the measurement to the server.1 On the server side,
a global classifier receives data from the sensor nodes and
further optimizes the classification. The global classifier can
1Studies have shown that the power consumption required to success-
fully send one byte over a wireless channel is equivalent to executing be-
tween 1e3 and 1e6 instructions on an onboard processor [18]. Hence it is
paramount in sensor networks to reduce the communication cost while pre-
serve the recognition performance.
1
be more computationally involved than the distributed clas-
sifiers, but it has to adapt to the change of available network
sensors due to local measurement error, sensor failure, and
communication congestion.
Past studies on sensor-based action recognition were pri-
marily focused on single accelerometers [8, 10] or other mo-
tion sensors [11, 16]. More recent systems prefer using mul-
tiple motion sensors [1, 2, 9, 12–14, 17]. Depending on the
type of sensor used, an action recognition system is typically
composed of two parts: a feature extraction module and a
classification module.
There are three major directions for feature extraction in
wearable sensor networks. The first direction uses simple
statistics of a signal sequence such as the max, mean, vari-
ance, and energy. The second type of feature is computed
using fixed filter banks such as FFT and wavelets [10, 16].
The third type is based on classical dimensionality reduc-
tion techniques such as principal component analysis (PCA)
and linear discriminant analysis (LDA) [13, 14]. In terms of
classification on the action features, a large body of previ-
ous work favored thresholding or k-nearest-neighbor (kNN)
due to the simplicity of the algorithms for mobile devices
[10, 16, 17]. Other more sophisticated techniques have also
been used, such as decision trees [2, 3] and hidden Markov
models [13].
For distributed pattern recognition, there exist studies on
distributed speech recognition [20] and distributed expert
systems [15]. One particular problem associated with most
distributed sensor systems is that each local observation from
the distributed sensors is biased and insufficient to classify
all classes. For example in our system, the sensors placed
on the lower-body would not perform well to classify those
actions that mainly involve upper body motions, and vice
versa. Consequently, traditional majority-voting type clas-
sifiers may not achieve the best performance globally.
Design of the wearable sensor network. Our wearable
sensor network consists of sensor nodes placed at various
body locations, which communicate with a base station at-
tached to a computer server through a USB port. The sen-
sor nodes and base station are built using the commercially
available Tmote Sky boards. Tmote Sky runs TinyOS on an
8MHz microcontroller with 10K RAM and communicates
using the 802.15.4 wireless protocol. Each custom-built sen-
sor board has a triaxial accelerometer and a biaxial gyro-
scope, which is attached to Tmote Sky (shown in Fig 2).
Each axis is reported as a 12bit value to the node, indicating
values in the range of ±2g and ±500◦/s for the accelerome-
ter and gyroscope, respectively.
To avoid packet collision in the wireless channel, we use a
TDMA protocol that allocates each node a specific time slot
during which to transmit data. This allows us to receive sen-
sor data at 20Hz with minimal packet loss. To avoid drift in
Figure 2. The sensor board with the accelerometer and gyroscope.
The mother board at the back is Tmote Sky.
the network, the base station periodically broadcasts a packet
to resynchronize the nodes’ individual timers. The code to
interface with the sensors and transmit data is implemented
directly on the mote using nesC, a variant of C.
Problem definition. Assume a set of L wearable sen-
sor nodes with triaxial accelerometers and biaxial gyro-
scopes are attached to the human body. Denote al(t) =(xl(t), yl(t), zl(t), θl(t), ρl(t))
T ∈ R5 as the measurement
of the five sensors on node l at time t, and a(t) =(aT
1 (t),aT2 (t), · · · ,aT
L(t))T ∈ R5L collects all sensor mea-
surement. Denote s = (a(1),a(2), · · · ,a(l)) ∈ R5L×l as
an action sequence of length l.Given K different classes of human actions, a set of ni
training examples {si,1, · · · , si,ni} are collected for each ith
class. The durations of the sequences naturally may be differ-
ent. Given a new test sequence s that may contain multiple
actions and possible other outlying actions, we seek a dis-
tributed algorithm to simultaneously segment the sequence
and classify the actions.
Solving this problem mainly involves the following chal-
lenges:
1. Simultaneous segmentation and classification. We seek
simultaneous segmentation and recognition from a long
motion sequence. Furthermore, we also assume that the
test sequence may contain other unknown actions that
are not from the K classes. The algorithm needs to be
robust to these outliers.
2. Variation of action durations. One major difficulty in
segmentation of actions is to determine the duration of
a proper action. In practice, the durations of different
actions vary dramatically (see Fig 3).
Figure 3. Population of different action durations in our data set.
Figure 4. Readings of the x-axis accelerometers (top) and x-axis gyroscopes (bottom) from 8 distributed sensors (shown in different colors)
on two repetitive “stand-kneel-stand” sequences from two subjects as the left and right columns.
3. Identity independence. In addition to the variation of
action durations, different people act differently for the
same actions (see Fig 4). For a test sequence in the ex-
periment, we examine the identity-independent perfor-
mance by excluding the training samples of the same
subject.
4. Distributed recognition. A distributed recognition sys-
tem needs to further consider the following issues: 1.
How to extract compact and accurate low-dimensional
action features for local classification and transmission
over a band-limited network? 2. How to classify the lo-
cal measurement in real time using low-power proces-
sors? 3. How to design a classifier to globally optimize
the recognition and be adaptive to the change of the net-
work?
Contributions of the paper. We propose a distributed ac-
tion recognition algorithm that simultaneously segments and
classifies 12 human actions using up to 8 wearable motion
sensors. The work is inspired by an emerging theory of
compressed sensing and sparse representation [4, 5]. We as-
sume each action class satisfies a low-dimensional subspace
model. We show that a 10-D LDA feature space suffices to
locally represent the 12 action subspaces on each node. If a
linear representation is sought to represent a valid test sam-
ple w.r.t. all training samples, the dominant coefficients in
the sparsest representation correspond to the training sam-
ples from the same action class, and hence they encode the
membership of the test sample. The implementation of the
system consists of three integrated components: 1. Multi-
resolution action feature extraction. 2. Fast distributed clas-
sifiers via `1-minimization. 3. An adaptive global classifier.
The method can accurately segment and classify human ac-
tions from a continuous motion sequence. The local classi-
fiers that reject potential outliers reduce the sensor-to-server
communication to about 50%. One can also choose to ac-
tivate only a subset of the sensors on the fly due to sensor
failure or network congestion. The global classifier is able to
adaptively update the optimization process and improve the
overall classification upon available local decisions.
Finally, the research of action recognition using wearable
sensors in pattern recognition has been hindered to an extent
by a lack of rigorous and public database/benchmark in or-
der to judge the performance and safeguard the reproducibil-
ity of extant algorithms. We intend to address this issue
by constructing and maintaining a public benchmark system
called “Wearable Action Recognition Database” (WARD).
The database will contain more human subjects across multi-
ple age groups, and it will be made available on our website.
2. Classification via Sparse Representation
We first present an efficient action classification method
on each sensor node assuming action sequences are pre-
segmented. Given an action segment of length l from node j,
sj = (aj(1),aj(2), · · · ,aj(l)) ∈ R5×l, define a new vector
sSj as the stacking of the l columns of sj :
sSj
.= (ai(1)T ,ai(2)T , · · · ,ai(l)
T )T ∈ R5l. (1)
We will interchangeably use sj to denote the stacked vector
sSj without causing ambiguity.
Since the length l varies among different subjects and ac-
tions, we need to normalize l to be the same for all the train-
ing and test samples, which can be achieved by linear inter-
polation or FFT interpolation. After normalization, we de-
note the dimension of samples sj as Dj = 5l. Subsequently,
we define a full-body action vector v that stacks the measure-
ment from all L nodes:
v = (sT1 , sT
2 , · · · , sTL)T ∈ R
D, (2)
where D = D1 + · · · + DL = 5lL.
In this paper, we assume the samples v in an action
class satisfy a subspace model, called an action subspace.
If the training samples {v1, · · · ,vni} of the ith class suf-
ficiently span the ith action subspace, given a test sample
y = (yT1 , · · · ,yT
L)T ∈ RD in the same class i, y can be
linearly represented using the training examples of the same
class:
y = α1v1 + · · · + αnivni
⇔
y1
y2
...yL
=
s1
s2
...sL
1
· · ·
s1
s2
...sL
ni
α1
α2
...αni
.(3)
It is important to note that such linear constraint also holds
on each node j: yj = α1sj,1 + · · · + αnisj,ni
∈ RDj .
In theory, complex data such as human actions typically
constitute complex nonlinear models. The linear models are
used to approximate such nonlinear structures in a higher-
dimensional subspace (see Fig 5). Notice that such lin-
ear approximation may not produce good estimation of the
distance/similarity metric for the samples on the manifold.
However, as we will show in Example 1, given sufficient
samples on the manifold as training examples, a new test
sample can be accurately represented on the subspace, pro-
vided that any two classes do not have similar subspace mod-
els.
Figure 5. Modeling a 1-D manifold M using a 2-D subspace V .
To recover label(y), a previous study [19] proposed to
reformulate the recognition using a global sparse represen-
tation: Since label(y) = i is unknown, we can represent y
using all the training samples from all K classes:
y = (A1, A2, · · · , AK)
x1
x2
...xK
= Ax, (4)
where Ai = (vi,1,vi,2, · · · ,vi,ni) ∈ R
D×ni collects all the
training samples of class i, xi = (αi,1, αi,2, · · · , αi,ni)T ∈
Rni collects the corresponding coefficients in (3), and A ∈
RD×n where n = n1 + n2 + · · · + nK . Since y satisfies
both (3) and (4), one solution of x in (4) should be x∗ =
(0, · · · , 0,xTi , 0, · · · , 0)T . The solution is naturally sparse:
in average only 1K
terms in x∗ are nonzero.
On each sensor j, solution x∗ of (4) is also a solution for
the representation:
yj = (A(j)1 , A
(j)2 , · · · , A
(j)K )x = A(j)
x, (5)
where A(j)i ∈ R
Dj×ni consists of row vectors in Ai that
correspond to the jth node. Hence, x∗ can be solved ei-
ther globally using (4) or locally using (5), provided that the
action data measured on each node are sufficiently discrimi-
nant. We will come back to the discussion about local clas-
sification versus global classification in Section 3. In the rest
of this section however, our focus will be on each node.
One major difficulty in solving (5) is the high dimension-
ality of the action data. In compressed sensing [4, 5], one
reduces the dimension of a linear system by choosing a lin-
ear projection Rj ∈ Rd×Dj :2
yj
.= Rjyj = RjA
(j)x
.= A(j)
x ∈ Rd. (6)
After projection Rj , typically the feature dimension d is
much smaller than the number n of all training samples.
Therefore, the new linear system (6) is underdetermined. Nu-
merically stable solutions exist to uniquely recover sparse so-
lutions x∗ via `1-minimization [6]:
x∗ = arg min ‖x‖1 subject to yj = A(j)
x. (7)
In our experiment, we have tested multiple projection op-
erators including PCA, LDA, and random project studied
in [19]. We found that 10-D feature spaces using LDA lead
to best recognition in a very low-dimensional space.
After the (sparsest) representation x is recovered, we
project the coefficients onto each action subspaces
δi(x) = (0, · · · , 0,xTi , 0, · · · , 0)T ∈ R
n, i = 1, · · · , K.(8)
Finally, the membership of the test sample yj is assigned to
the class with the smallest residual
label(yj) = arg mini
‖yj − A(j)δi(x)‖2. (9)
Example 1 (Classification on Nodes) We designed 12 ac-
tion categories in the experiment: Stand-to-Sit, Sit-to-
Stand, Sit-to-Lie, Lie-to-Sit, Stand-to-Kneel, Kneel-to-Stand,
Rotate-Right, Rotate-Left, Bend, Jump, Upstairs, and Down-
stairs. The detailed experiment setup is given in Section 4.
To implement `1-minimization on the sensor node, we
look for fast sparse solvers in the literature. We have tested
a variety of methods including (orthogonal) matching pur-
suit (MP), basis pursuit (BP), LASSO, and a quadratic log-
barrier solver.3 We found that BP [7] gives the best trade-off
between speed, noise tolerance, and recognition accuracy.
Here we demonstrate the accuracy of the BP-based algo-
rithm on each sensor node (see Fig 1 for their locations). The
actions are manually segmented from a set of long motion se-
quences from three subjects. In total there are 626 samples
in the data set. The 10-D feature selection is via LDA. We re-
quire the classification to be identity-independent. The accu-
racy of the classification is shown in Table 1. Fig 6 shows an
example of the estimated sparse coefficients x and its resid-
uals. In terms of the speed, our simulation in MATLAB takes
in average 0.03s to process one test sample on a typical 3G
Hz PC.
2Notice that Rj is not computed on the sensor node. These matrices are
computed offline and simply stored on each sensor node.3The implementation of these routines in MATLAB is available via
SparseLab: http://sparselab.stanford.edu
Table 1. Recognition accuracy on each node over 12 action classes.
Sen # 1 2 3 4 5 6 7 8
Acc [%] 99.9 99.4 99.9 100 95.3 99.5 93 100
Figure 6. Left: Sparse `1 solution by BP for a Stand-to-Sit action
on the waist node. Right: Corresponding residuals. The action is
correctly classified as Class 1. SCI(x) = 0.7 (see (10)).
Example 1 shows that if the segmentation of the actions
is known and there is no other invalid samples, all sensor
nodes can recognize the 12 actions individually with very
high accuracy, which also verifies that the mixture subspace
model is a good approximation of the action data. Neverthe-
less, one may question that in such low-dimensional feature
spaces other classical methods (e.g., kNN and decision tree
methods) should also perform well. In the next section, we
will show that the major advantage of adopting the sparse
representation framework is a unified solution to recognize
and segment valid actions and reject invalid ones. We will
also show that the method is adaptive to the change of avail-
able sensor nodes on the fly.
3. Distributed Segmentation and Recognition
We start by introducing multi-resolution action segmen-
tation on each sensor node. From the training examples,
we can estimate a range of possible lengths for all actions
of interest. We then evenly divide the range into multi-
ple length hypotheses: (h1, · · · , hs). At each time t in a
motion sequence, the node tests a set of s possible seg-
mentations: y(1) = (a(t − h1), · · · , a(t)), · · · ,y(s) =(a(t − hs), · · · , a(t)), as shown in Fig 7.4 With each candi-
date y again normalized to length l, a sparse representation
x is estimated using `1-minimization in Section 2.
Figure 7. Multiple segmentation hypotheses on a wrist sensor at
time t = 150 of a “go downstairs” sequence. h1 is a good segment
while others are false segments. Notice that the movement between
250 and 350 is an outlying action that the subject performed.
Based on the previous sparsity assumption, if y is not a
4Those segmentation hypotheses that overlap with previously detected
actions will be ignored to avoid temporal ambiguity.
valid segmentation w.r.t. the training examples due to either
incorrect t or h, or the real action performed is not in the
training classes, the dominant coefficients of its sparsest rep-
resentation x should not correspond to any single class. We
use a sparsity concentration index (SCI) [19]:
SCI(x).=
K · maxj=1,··· ,K ‖δj(x)‖1/‖x‖1 − 1
K − 1∈ [0, 1].
(10)
If the nonzero coefficients of x are evenly distributed among
K classes, then SCI(x) = 0; if all the nonzero coefficients
are associated with a single class, then SCI(x) = 1. There-
fore, we introduce a sparsity threshold τ1 applied to all sen-
sor nodes: If SCI(x) > τ1, the segment is a valid local
measurement, and its 10-D LDA features y will be sent to
the base station.
Figure 8. A invalid representation (SCI=0.13).
Next, we introduce a global classifier that adaptively op-
timizes the overall segmentation and classification. Sup-
pose at time t and with a length hypothesis h, the base
station receives L′ action features from the active sensors
(L′ ≤ L). Without loss of generality, assume these fea-
tures are from the first L′ sensors: y1, y2, · · · , yL′ . Let
y′ = (yT
1 , · · · , yTL′)T ∈ R
10L′
. Then the global sparse rep-
resentation x of y′ satisfies the following linear system
y′ =
(
R1 ··· 0 ··· 0
.... . .
......
0 ··· RL′ ··· 0
)
Ax = R′Ax = A′x, (11)
where R′ ∈ RdL′
×D is a new projection matrix that only
extracts the action features from the first L′ nodes. Conse-
quently, the effect of changing active sensor nodes for the
global classification is formulated via the global projection
matrix R′. During the transformation, the data matrix A and
the sparse representation x remain unchanged. The linear
system (6) then becomes a special case of (11) when L′ = 1.
Similar to the outlier rejection criterion we previously
proposed on each node, we introduce a global rejection
threshold τ2. If SCI(x) > τ2 in (11), the most significant
coefficients in x are concentrated in a single training class.
Hence y′ is assigned to that class, and its length hypothesis
h provides the segmentation of the action from the motion
sequence.
The overall algorithm on the nodes and on the network
server provides a unified solution to segment and classify ac-
tion segments from a motion sequence using only two simple
parameters τ1 and τ2. Typically τ1 is selected to be less re-
stricted than τ2 in order to increase the recall rate, because
passing certain amounts of false signal to the global classi-
fier is not necessarily disastrous as the signal would be re-
jected by τ2 when the action features from multiple nodes
are jointly considered. The formulation of adaptive classifi-
cation (11) via a global projection matrix R′ and two spar-
sity constraints τ1 and τ2 provides a simple means of reject-
ing outliers from a network of multiple sensors. The method
compares favorably to other classical methods such as kNN
and decision trees, because these methods need to train mul-
tiple thresholds and decision rules when the number L′ and
the set of available sensors vary in the full-body action vector
y′ = (yT
1 , · · · , yTL′)T .
Finally, we consider how the change of active nodes af-
fects `1-minimization and the classification of the actions. In
compressed sensing, the efficacy of `1-minimization in solv-
ing for the sparsest solution x in (11) is characterized by the
`0/`1 equivalence relation [6, 7]. A necessary and sufficient
condition for the equivalence to hold is the k-neighborliness
of A′. As a special case, one can show that if x is the sparsest
solution in (11) for L′ = L, x is also a solution for L′ < L.
Hence, the decrease of L′ leads to possible sparser solutions
of x.
On the other hand, the decrease in available action fea-
tures also makes y′ less discriminant. For example, if we
reduce L′ = 1 and only activate a wrist sensor, then the `1-
solution x may have nonzero coefficients associated to mul-
tiple actions with similar wrist motions, albeit sparser. This
is an inherent problem for any method to classify human ac-
tions using a limited number of motion sensors. In theory,
if two action subspaces in a low-dimensional feature space
have a small subspace distance after the projection, the cor-
responding sparse representation cannot distinguish the test
samples from the two classes. We will demonstrate in Sec-
tion 4 that indeed reducing the available motion sensors will
reduce the discriminant power of the sparse representation in
a lower-dimensional space.
4. Experiment
We validate the performance of the system using a data
set we collected from three male subjects at the age of 28, 30,
and 32, respectively. Eight wearable sensors were placed at
different body locations (see Fig 1). We designed a set of 12
action classes: Stand-to-Sit (StSi), Sit-to-Stand (SiSt), Sit-to-
Lie (SiLi), Lie-to-Sit (LiSi), Stand-to-Kneel (StKn), Kneel-
to-Stand (KnSt), Rotate-Right (RoR), Rotate-Left (RoL),
Bend, Jump, Upstairs (Up), and Downstairs (Down). We
are particularly interested in testing the system under various
action durations. For this purpose, we have asked the sub-
jects to perform StSi, SiSt, SiLi, and LiSi with two differ-
ent speeds (slow and fast), and perform RoR and RoL with
two different rotation angles (90◦ and 180◦). All subjects
were asked to perform a sequence of related actions in each
recording session based on their own interpretation of the ac-
tions. In total there are 626 actions performed in the data set
(see Table 3 for the numbers in individual classes).
Table 2 shows Precision versus Recall of the algorithm
with different active sensor nodes. For all experiments,
τ1 = 0.2 and τ2 = 0.4. When all nodes are activated, the
algorithm can achieve 98.8% accuracy among the actions it
extracted, and 94.2% of the true actions are detected. The
performance decreases gracefully when more nodes become
unavailable to the global classifier. Our results show that if
we can maintain one motion sensor on the upper body (e.g.,
at position 2) and one on the lower body (e.g., at position 7),
the algorithm can still achieve 94.4% precision and 82.5%recall. Finally, in average the 8 distributed classifiers that
reject invalid local measurements reduce the node-to-station
communication for above 50%.
Table 2. Precision vs. recall with different sets of activated sensors.Sensors 2 7 2,7 1,2,7 1- 3, 7,8 1- 8
Prec [%] 89.8 94.6 94.4 92.8 94.6 98.8
Rec [%] 65 61.5 82.5 80.6 89.5 94.2
One may be curious about the relatively low recall on sin-
gle sensors such as 2 and 7. This performance difference
is due to the large number of potential outlying segments
presented in a long motion sequence (e.g., see Fig 7). We
further compare the difference using two confusion tables
3 and 4. We see that a single node 2 that is positioned on
the right wrist performed poorly mainly on two action cate-
gories: Stand-Kneel and Upstairs-Downstairs, both of which
involve significant movements of the lower body but not the
upper one. This is the main reason for the low recall in Ta-
ble 2. On the other hand, for the actions that are detected
using node 2, our system can still achieve about 90% accu-
racy, which clearly demonstrates the robustness of the dis-
tributed recognition framework. Similar arguments also ap-
ply to node 7 and other sensor combinations.
Table 3. Confusion table using sensors 1-8.
Finally, we provide examples of the classification results
on Subject 1 to demonstrate the accuracy of the proposed al-
gorithm using all 1 - 8 sensor nodes. For clarity, each figure
in Fig 9 - 21 only plots the readings from x-axis accelerome-
ters on the 8 nodes. The segmentation results are then super-
Table 4. Confusion table using sensor 2.
imposed. The black solid boxes indicate the locations of the
correctly classified action segments. The red boxes (e.g., in
Fig 14) indicate the locations of false classification. One can
also observe from the figures that some valid actions are not
detected by the algorithm, e.g., in Fig 13.
Figure 9. Segmentation of a slow Stand-Sit-Stand sequence.
Figure 10. Segmentation of a fast Stand-Sit-Stand sequence.
Figure 11. Segmentation of a slow Sit-Lie-Sit sequence.
Figure 12. Segmentation of a fast Sit-Lie-Sit sequence.
5. Conclusion and Discussion
Inspired by the emerging compressed sensing theory, we
have proposed a distributed algorithm to segment and clas-
sify human actions on a wearable motion sensor network.
Figure 13. Segmentation of a Bend sequence.
Figure 14. Segmentation of a Stand-Kneel-Stand sequence.
Figure 15. Segmentation of a 90◦ Rotate-Right-Left sequence.
Figure 16. Segmentation of a 90◦ Rotate-Left-Right sequence.
Figure 17. Segmentation of a 180◦ Rotate-Right sequence.
Figure 18. Segmentation of a 180◦ Rotate-Left sequence.
Figure 19. Segmentation of a Jump sequence.
Figure 20. Segmentation of a Go-Upstairs sequence.
Figure 21. Segmentation of a Go-Downstairs sequence.
The framework provides a unified solution based on `1-
minimization to classify valid action segments and reject out-
lying actions on the sensor nodes and the base station. We
have shown through our experiment that a set of 12 action
classes can be accurately represented and classified using a
set of 10-D LDA features measured at multiple body loca-
tions. The proposed global classifier can adaptively adjust
the global optimization to boost the recognition upon avail-
able local measurements.
One limitation in the current system and most other body
sensor systems is that the wearable sensors need to be firmly
positioned at the designated locations. However, a more
practical system/algorithm should tolerate certain degrees
of shift without sacrificing the accuracy. In this case, the
variation of the measurement for different action classes
would increase substantially. One open question is what low-
dimensional linear/nonlinear models one may use to model
such more complex data, and whether the sparse representa-
tion framework can still apply to approximate such structures
with limited numbers of training examples. A potential solu-
tion to this question will be a meaningful step forward both
in theory and in practice.
References
[1] R. Aylward and J. Paradiso. A compact, high-speed, wearable
sensor network for biomotion capture and interactive media.
In IPSN, 2007.
[2] L. Bao and S. Intille. Activity recognition from user-annotated
acceleration data. In Pervasive, 2004.
[3] A. Benbasat and J. Paradiso. Groggy wakeup - automated
generation of power-efficient detection hierarchies for wear-
able sensors. In Int. Work. on Wearable and Implantable Body
Sensor Networks, 2007.
[4] E. Candes. Compressive sampling. In Proceedings of the In-
ternational Congress of Mathematicians, 2006.
[5] E. Candes and T. Tao. Near-optimal signal recovery from ran-
dom projections: Universal encoding strategies? IEEE Trans.
Information Theory, 52(12):5406–5425, 2006.
[6] D. Donoho. Neighborly polytopes and sparse solution of un-
derdetermined linear equations. preprint, 2005.
[7] D. Donoho and M. Elad. On the stability of the basis pursuit
in the presence of noise. Sig. Proc., 86:511–532, 2006.
[8] J. Farringdon, A. Moore, N. Tilbury, J. Church, and
P. Biemond. Wearable sensor badge & sensor jacket for con-
text awareness. In Int. Symp. on Wear. Comp., 1999.
[9] E. Heinz, K. Kunze, and S. Sulistyo. Experimental evalua-
tion of variations in primary features used for accelerometric
context recognition. In Euro. Symp. on Amb. Intel., 2003.
[10] T. Huynh and B. Schiele. Analyzing features for activity
recognition. In J. Conf. on Smart Objects and Ambient In-
telligence, 2005.
[11] H. Kemper and R. Verschuur. Validity and reliability of pe-
dometers in habitual activity research. European Journal of
Applied Physiology, 37(1):71–82, 1977.
[12] N. Kern, B. Schiele, and A. Schmidt. Multi-sensor activity
context detection for wearable computing. In European Sym-
posium on Ambient Intelligence, 2003.
[13] P. Lukowicz, J. Ward, H. Junker, M. Stager, G. Troster,
A. Atrash, and T. Starner. Recognizing workshop activity us-
ing body worn microphones and accelerometers. In Pervasive,
2004.
[14] J. Mantyjarvi, J. Himberg, and T. Seppanen. Recognizing hu-
man motion with multiple acceleration sensors. In Int. Conf.
on Sys., Man and Cyb., 2001.
[15] J. Morrill. Distributed recognition of patterns in time series
data. Communications of the ACM, 41(5):45–51, 1998.
[16] B. Najafi, K. Aminian, A. Parschiv-Ionescu, F. Loew, C. Bula,
and P. Robert. Ambulatory system for human motion anal-
ysis using a kinematic sensor: Monitoring of daily physical
activity in the elderly. IEEE Transactions on Biomedical En-
gineering, 50(6):711–723, 2003.
[17] S. Pirttikangas, K. Fujinami, and T. Nakajima. Feature selec-
tion and activity recognition from wearable sensors. In Int.
Symp. on Ubi. Comp. Sys., 2006.
[18] C. Sadler and M. Martonosi. Data compression algorithms
for energy-constrained devices in delay tolerant networks. In
ACM Conf. on Emb. Net. Sen. Sys., pages 265–278, 2006.
[19] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust
face recognition via sparse representation. (in press) PAMI,
2008.
[20] W. Zhang, L. He, Y. Chow, R. Yang, and Y. Su. The study on
distributed speech recognition system. In Int. Conf. on Acou.,
Speech, and Sig. Proc., pages 1431–1434, 2000.
DexterNet: An Open Platform for HeterogeneousBody Sensor Networks and Its Applications∗
Philip Kuryloski �,†,◦, Annarita Giani †, Roberta Giannantonio ∆, Katherine Gilani ?,Raffaele Gravina ◦, Ville-Pekka Seppä 2, Edmund Seto ∇, Victor Shia †, Curtis Wang †,
Posu Yan †, Allen Y. Yang †, Jari Hyttinen 2, Shankar Sastry †, Stephen Wicker �, Ruzena Bajcsy †
† Department of EECS, University of California, Berkeley, CA 94720� Department of ECE, Cornell University, Ithaca, NY 14853
∇ School of Public Health, University of California, Berkeley, CA 94720∆ Telecom Italia, Turin, Italy
◦ WSN Lab sponsored by Pirelli and Telecom Italia, Berkeley, CA 947042 Department of Biomedical Engineering, Tampere University of Technology, Tampere, Finland
? Department of EE, University of Texas at Dallas, TX 75080.
ABSTRACT
We design and implement a novel platform, called Dex-terNet, for heterogeneous body sensor networks. Thesystem is motivated by shifting research paradigms tosupport real-time, persistent human monitoring in bothindoor and outdoor environments. The platform adoptsa three-layer, hierarchical architecture to control hetero-geneous body sensors. The first layer, called the bodysensor layer (BSL), deals with design of different wire-less body sensors and their instrumentation on the body.We detail two custom-built body sensors: one measur-ing body motions and the other measuring the ECGand respiratory patterns. At the second layer, calledthe personal network layer (PNL), the wireless bodysensors on a single subject communicate with a mo-bile computer station. The mobile station can be either
∗This work was supported in part by TRUST (The Teamfor Research in Ubiquitous Secure Technology), which re-ceives support from the National Science Foundation (NSFaward number CCF-0424422) and the following organiza-tions: AFOSR (#FA9550-06-1-0244), Cisco, British Tele-com, ESCHER, HP, IBM, iCAST, Intel, Microsoft, ORNL,Pirelli, Qualcomm, Sun, Symantec, Telecom Italia andUnited Technologies. This work was also supported inpart by ARO MURI W911NF-06-1-0076, the Center for In-formation Technology Research in the Interest of Society(CITRIS), and Finnish Funding Agency for Technology andInnovation (Tekes). Corresponding author: P. Kuryloski([email protected]).
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.IPSN ’09 San Francisco, CA USACopyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
a computer or a smart phone that supports Linux OSand the IEEE 802.15.4 protocol. It issues control com-mands to the body sensors and receives and processessensor data measured by the body sensors. These func-tions are abstracted and implemented as an open-sourcesoftware library, called Signal Processing In Node En-vironment (SPINE). A DexterNet network is scalable,and can be reconfigured on-the-fly via SPINE. At thethird layer, called the global network layer (GNL), mul-tiple PNLs communicate with a remote Internet serverto permanently log the sensor data and support higher-level applications. We demonstrate the versatility ofthe DexterNet platform via three applications: avatarvisualization, human activity recognition, and integra-tion of DexterNet with global positioning sensors andair pollution sensors for asthma studies.
Categories and Subject Descriptors
I.2.9 [Artificial Intelligence]: Robotics—Sensors; D.2.11[Software Engineering]: Software Architectures—Domain-specific architectures
General Terms
Design, Experimentation, Physiological Sensing
Keywords
Sensor Networks, Body Sensing, Wearable Action Recog-nition, DexterNet, SPINE
1. INTRODUCTION
Wireless body sensor networks (BSNs) have been anemerging research area in the field of sensor networks inthe past five years. The rapid development is mainly dueto two reasons: 1. Continuing progress in the integra-
tion and miniaturization of sensors, processors, and ra-dio devices. 2. Rising demand for advanced body sensorsystems from pivotal areas of elderly protection and clin-ical patient monitoring to much broader applications inmilitary, preventive healthcare, and consumer electron-ics. Traditional BSNs mainly involve single wearablesensors, such as fall detection [20, 8, 18, 6], walk andgait-phase detection [14, 3], and pulse-oximetry moni-toring [12, 13]. More sophisticated systems may consistof multiple heterogeneous sensors, adopt certain hier-archical architecture for real-time sensor management,and even integrate body sensors with other environmen-tal sensors. Some examples include CodeBlue [10], Mo-biCare [5], and ALARM-NET [21]. These systems in-strument the human body as an active mobile platform,and have the necessary mobility to support persistentmonitoring in people’s normal living environments.
In this paper, we present a novel platform for hetero-geneous body sensor networks called DexterNet. Thedesign principles of DexterNet are manifold:
1. DexterNet supports an open-source on-node sig-nal processing library, namely, SPINE (Signal Pro-cessing In Node Environment) [19]. To our bestknowledge, SPINE is the only open-source librarythat is versatile enough to support heterogeneousbody sensors. Subsequently, higher-level applica-tions using DexterNet can seamlessly control othertypes of body sensors in the future via the SPINElibrary.
2. Harnessing the rich functionalities in SPINE, Dex-terNet supports real-time signal collection and sen-sor management on a network of heterogeneousbody sensors. The configuration of a DexterNetnetwork can also be modified on the fly. We havedesigned and manufactured two different body sen-sors: one measuring body motions and the othermeasuring the ECG and respiratory patterns. Thesystem can also conveniently integrate other com-mercially available sensor nodes via SPINE, suchas SHIMMER and MICAz.
3. To support long-term monitoring of multiple hu-man subjects in both indoor and outdoor envi-ronments, DexterNet adopts a flexible three-layerBSN architecture. A body sensor layer (BSL) dealswith the design of different sensors and their in-strumentation on the body. A personal networklayer (PNL) manages communication between thewireless body sensors and a mobile computer sta-tion. The mobile station can be either a computeror a smart phone that supports Linux OS and theIEEE 802.15.4 protocol. Finally, a global networklayer (GNL) via the Internet permanently logs thesensor data and supports other higher-level appli-cations on one or more secured network servers.
Figure 1 shows the three-layer architecture of Dex-terNet. At the BSL, the system supports two typesof custom-built wireless wearable sensors. The first isa motion sensor board that consists of a triaxial ac-celerometer and biaxial gyroscope. The second is a bio-logical sensor (biosensor) called Wisepla [16], which in-tegrates an electrical impedance pneumography (EIP),an electrocardiogram (ECG), and a triaxial accerome-ter. The sensors then connect with a sensor networkmote to form a wearable sensor mote. Here we choosethe commercially available TelosB board. At the PNL,the body sensors communicate with a Nokia N800 se-ries Internet tablet via a TelosB base-station board. TheSPINE functions installed on the body sensors and theN800 manage the data collection, processing, and trans-mission of the data, and can be controlled via commandsissued from the N800. Finally, at the GNL, multiplesubjects instrumented with body sensors and N800s canremotely connect to a computer server via the Internet,and permanently log the motion and biological informa-tion for higher-level applications.
Figure 1: The three-layer architecture of the DexterNet
system. The first layer is a body sensor layer (BSL),
The second is a personal network layer (PNL), and the
third layer is a global network layer (GNL). The pivotal
component of the system is a Nokia N800 series Internet
tablet at the PNL that communicates both to the BSL
via IEEE 802.15.4 and the GNL via other broadband
wireless channels.
Equipped with the versatile three-layer architectureand the open-source on-node library SPINE, DexterNetpresents a competitive framework to support a varietyof applications in healthcare, military, and consumerelectronics. For example, the fall detection function hasbeen implemented at the BSL level using SPINE on-node functions, and each motion sensor is capable ofoutputting a binary decision of a falling event. Suchfunctions reduce the amount of data needed to transmit
between the nodes and the base station.1 More sophis-ticated applications such as human activity recognitionand reconstruction of a graphical avatar for 3-D visu-alization can be implemented at the PNL level, whichrely on the full-body motion data measured by multiplemotion sensors at different key locations of the body (asshown in Figure 2).
Figure 2: Illustration of the DexterNet system instru-
mented on a wearer. The deployment includes five mo-
tion sensor motes, a Nokia N800 tablet, and a GPS po-
sitioning sensor.
1.1 Related Work
Similar to DexterNet, many existing BSN platformsembrace a hierarchical architecture for real-time sen-sor control and data management. Some representativeplatforms are shown in Table 1. A more comprehensiveliterature overview can be found in [21, 11, 24].
HealthGear [13] is a single-modality sensor networkthat integrates a low-power pulse oximeter with a smartphone via Bluetooth. CodeBlue [10] is a wireless sensorplatform intended for deployment in emergency medicalcare. It integrates a pulse oximeter and a ECG sensorwith PDAs and PCs to enhance seamless transfer of dataamong caregivers. The platform uses IEEE 802.15.4 asthe wireless protocol, and is intended to scale in densenetworks with volatile network conditions.
1Studies have shown that the power consumption requiredto successfully send one byte over a wireless channel is equiv-alent to executing between 1e3 and 1e6 instructions on anonboard processor[15]. Hence it is paramount in sensor net-works to reduce the communication cost while preserve therecognition performance.
WWBAN [11] adopts a three-layer multi-sensor plat-form that is similar to DexterNet. Multiple motion sen-sors and ECG sensors are placed on the human body.They communicate with either a PDA or a PC to pro-vide a transparent interface to the user, and an inter-face to the (remote) medical server using the Internet.However, the system is mainly comprised of proprietarysoftware, and it does not provide an open-source librarysuch as SPINE to support on-node computation anddecision-making.
Finally, ALARM-NET [21] belongs to a group of wire-less sensor networks for assisted living. The focus of thesystem is the integration of body sensors with environ-mental sensor networks in a scalable and heterogeneousarchitecture. ALARM-NET uses MICAz sensors andSTARGATE to relay the information from body sen-sors and environmental sensors to PDAs and PCs in alarge and complex indoor setting using either Bluetoothor the 802.11 protocol.
The rest of the paper is organized as follows: Section 2proposes the overall architecture of DexterNet and ex-plains the relationship among different components ofthe three-layer hierarchy. Based on the hierarchy, Sec-tion 3 first discusses the design and specification of thebody sensors in the bottom layer BSL. Section 4 thendiscusses the open-source SPINE network that providessoftware services and control of both BSL and PNL. Sec-tion 5 showcases three high-level applications: 1. Avatarvisualization. 2. Human activity recognition. 3. Inte-gration of DexterNet with portable air pollution sensorsfor the study of asthma attack. Finally, Section 6 dis-cusses some limitations of the current implementationand future directions.
2. SYSTEM ARCHITECTURE
DexterNet is comprehensive in that it is inclusive ofsensing, distributed processing of sensor data, wirelesscommunication and fusion of data, and serves as a foun-dation for higher-level applications. Although subsetsof these functionalities exist in other systems, often thesame functionalities must be re-implemented in differentexamples in order to produce a complete path for data.We hope that in providing an open system with Dex-terNet, a common platform may arise that results in areversal of this scenario where variations in functionalityare achieved through the use of a common base. Fur-thermore, the diverse nature of our team has driven thedesign requirement that DexterNet provide maximumflexibility and extensibility, with maximum potential forreusability of its components.
The structure of DexterNet is shown in Figure 3. Theopen-source SPINE framework provides the flexibility inconstructing physical components of the system at theBSL and PNL layers. Particularly, SPINE has been de-
Table 1: Comparison of existing body sensor networks with the DexterNet platform.Platforms Sensor Devices Base Devices Node Protocols Open Source Environmental Sensors
HealthGear [13] pulse oximeter smart phone Bluetooth No NoCodeBlue pulse oximeter PC 802.15.4 No No
[10] ECG, SHIMMER PDAWWBAN [11] motion, ECG PC, PDA 802.15.4 No NoALARM-NET pulse oximetry STARGATE Bluetooth No Yes
[21] motion, ECG PDA, PC 802.11 (temperature, light, PIR)DexterNet motion, ECG PDA 802.15.4 Yes Possible via SPINE
EIP, GPS PC (e.g., air pollution sensor)MICAz, SHIMMER
veloped such that there is separation in code of its sens-ing, processing and data transport features. As a result,SPINE is portable across TinyOS mote platforms, andeasily extends to support new sensors through the useof sensor drivers. Support for both the motion sensorand biosensor in DexterNet is provided in this manneron the TelosB.
Figure 3: Architecture of the DexterNet system. The
Body Sensing Layer (BSL) includes motes and attached
sensors. The Personal Network Layer (PNL) includes the
N800 portable base station and associated sensors. The
BSL and PNL are driven by SPINE. The Global Net-
work Layer (GNL) includes our applications built with
the DexterNet system.
The distributed processing facilities of DexterNet arealso provided by the SPINE framework. Currently, itprovides two modules for data processing: one that peri-odically performs feature extraction on sensor data andreports it, and a second which reports chosen featuresconditionally based upon thresholds. New data process-
ing components can also be added to SPINE withoutaffecting sensor or network communication code.
All SPINE functionalities are dynamically configuredover the air, helping to achieve the runtime flexibilityand reconfigurability we desire at the BSL and PNLlayers. SPINE includes a base station component, andallows the use of a Nokia N800 or PC at the PNL layer.The N800 allows the wearable system to be portable,and allows for the integration of GPS and other envi-ronmental sensors. Our experience has shown that inte-gration of various commercial devices, such as the N800,has come with considerable effort. DexterNet aims tomaximize the utility of such efforts.
Each of our applications, including avatar visualiza-tion, action recognition and asthma studies, is built ontop of the SPINE base station API. They each config-ure the appropriate sensors and on-node signal process-ing according to their specific goals and requirements.These applications need not depend on any specific sen-sor mote code or directly interact with the BSL. Thisensures that all applications benefit from enhancementsmade at the BSL and PNL of the system. These canand have included improvements in robustness, capac-ity, and energy consumption. Furthermore, developerscan work simultaneously on application-level software,as well as SPINE software without the tight couplingrequired in traditional application-specific sensor net-work systems. In addition, it is worth noting that theuse of a mobile base station such as the N800 reducesthe burden on each wearable node when implementingprivacy and security preserving features. One can usethe higher capacity of the base station to manage theprivacy requirements across geographic locations and toauthenticate various individuals, a scenario not possiblewith a multi-person BSL or a hybrid body and environ-mental sensor environment.
3. DESIGN OF BODY SENSORS
3.1 Motion Sensors
DexterNet supports the deployment of multiple mo-tion sensor nodes placed at different body locations (see
Figure 2), which communicate with a base station. Thesensor nodes and the base station are built using theTelosB boards. TelosB runs TinyOS on an 8MHz mi-crocontroller with 10K RAM and communicates usingthe 802.15.4 wireless protocol. Each custom-built sensorboard has a triaxial accelerometer and a biaxial gyro-scope, which is attached to TelosB (shown in Figure 4).Each axis is reported as a 12bit value to the node, indi-cating values in the range of ±2g and ±500◦/s for theaccelerometer and gyroscope, respectively. The batterylife of continuous measurement and wireless raw dataoutput is approximately 20 hours.
Figure 4: Illustration of the motion sensor node. The
sensor board on the top is a custom-built motion sensor
with a triaxial accelerometer and a biaxial gyroscope.
The middle layer is a Li-ion battery. The sensor board
on the bottom is a standard TelosB network node.
The current hardware design of the sensor contributescertain amounts of measurement error. The accelerom-eters typically require some calibration in the form ofa linear correction, as sensor output under 1g may beshifted up to 15% in some sensors. It is also worthnoting that the gyroscopes produce an indication of ro-tation under straight-line motions. Fortunately, thesesystematic errors appear to be consistent across experi-ments for a given sensor board. However, without cali-bration to correct them, the errors may affect the actionrecognition if different sets of sensors are used inter-changeably in the experiment. 2
3.2 Biosensors
The biosensor is capable of measuring acceleration,electrocardiogram (ECG), and electrical impedance pneu-mography (EIP) through four small electrodes connectedto the side of the ribcage of the subject, as shown inFigure 5 and 6. The ECG signal is used to derive heartrate and heart rate variability (HRV). The EIP signalis produced by respiration and can be used to derive a
2More sophisticated motion sensors do exist in the industry,which can utilize heterogeneous sensor fusion techniques toself-calibrate the accelerometer and gyroscope. One exampleis the Microstrain Gyro Enhanced Orientation Sensor at:http://www.microstrain.com/.
variety of breathing related parameters like respirationrate, minute ventilation volume, flow/volume curve, andinspiration/expiration times.
Figure 5: Illustration of the biosensor board (top) con-
nected to the TelosB network node (bottom). The middle
layer is a Li-ion battery. The white connector on the left
side is for the four skin electrodes.
Figure 6: The electrode locations are chosen to obtain a
good signal-to-artefact ratio (SAR) in both EIP and ECG
signals. Electrodes in both pairs are placed vertically
next to each other. The front-end pair is placed vertically
right below the pectoralis major and horizontally in the
middle between the side of the body and mid axillary
line. The back-end pair is placed on the same vertical and
horizontal location to create a sensitivity field through
and around the left lung.
Single channel ECG measurement is quite straight-forward. The main challenge is breathing measurementwith EIP technique especially for volumic parameters.So far we have tested the accuracy of the system duringergometer and running exercises [16, 17]. The accuracyof breathing minute volume assessment was degradeddue to intense motion of the body during running, butresults with average relative error of 11% were still ob-tained. Also the effect of different electrode placementson movement error susceptibility has been studied [9].
The biosensor runs real time signal processing algo-rithms that detect events in heart and respiration sig-nals and calculates physiological parameters from them.The requested parameters are sent to the base station
through the SPINE framework. This reduces the amountof network traffic compared with raw data sending andenables using a higher sampling rate needed in accu-rate HRV analysis. The possibility to assess breathing-related parameters separates this sensor from most ofthe similar projects. Cardiac and pulmonary measure-ment together provide data that can be used to derivehigh-level physiological parameters related to physicaland mental state of the subject.
The biosensor is connected to a TelosB and is in thesame form factor as the motion sensor. The batterylife of continuous measurement and raw data output isapproximately 20 hours, similar to the motion sensor.
3.3 Other Compliant Sensors
The heterogenity of DexterNet allows a wide variety ofsensors and motes to be integrated into the system. TheSPINE framework provides support for the Intel SHIM-MER and the MICAz motes as well as any sensors thatcan be attached. The SHIMMER has an onboard ac-celerometer, MicroSD slot, and ADC converters for at-taching external sensors. The MICAz has many sensorsavailable as addons, including sensors such as GPS, hu-midity, barometric pressure, ambient light, sound, mag-netometer, etc.
Our current system has a bluetooth GPS sensor thatdirectly interfaces with SPINE. Since the GPS unit itselfis not a SPINE node, the data integration is done at adifferent layer than SPINE nodes such as the motionsensor and biosensor. The GPS provides longitude andlatitude coordinates primarily in outdoor environmentsat a speed of 1 Hz.
4. THE SPINE FRAMEWORK
SPINE (Signal Processing In Node Environment) 3 isan open-source framework for distributed signal process-ing algorithms in wireless sensor networks (WSNs). Thefunctional architecture of SPINE is shown in Figure 7.It provides a set of on-node services that can be tunedand activated by the user depending on different appli-cation needs. The open-source framework speeds up thedesign of WSN applications through high-level abstrac-tions and provides support to quickly explore imple-mentation tradeoffs through fast prototyping. SPINEalso provides an efficient wireless communication proto-col for dynamic network configuration and management.Most importantly, SPINE allows all applications to im-mediately benefit from changes in the SPINE frameworkthat may improve robustness, security, and energy effi-ciency.
The SPINE framework has two main modules, one forthe sensor node side, and the other for the server/base-3The SPINE software is available for download at http://spine.tilab.com
Figure 7: The SPINE functional architecture.
station side. The node module is developed in TinyOS2.x environment. It provides the following three on-nodeservice components: 1. Communication. 2. Sensing. 3.Signal processing. Accordingly, the source code of themodule is organized in a similar manner. In the com-munication component, SPINE utilizes a time divisionmultiple access (TDMA) protocol to avoid packet colli-sion, which allocates for each node a specific time slotduring which to transmit data. All sensor drivers im-plemented in the sensing component appear similar tothe signal processing and communication components.As a result, any new sensor driver will be immediatelyavailable for all processing components. The signal pro-cessing component is similarly modular. These designfeatures make it easy to extend the SPINE framework,and allow various team members to develop differentparts of the framework simultaneously. SPINE sensingand processing functionalities are dynamically config-ured through over-the-air messaging. This allows eachapplication supported by the system to reconfigure theSPINE network as desired, quickly and easily.
The server module is implemented in Java SE andacts as the coordinator of a sensor network. It consistsof functionalities that activate and control on-node ser-vices depending on the application requirements. Theimplementation instead does not use any TinyOS spe-cific APIs and can be run independently on the underly-ing protocol stack (e.g., the ZigBee network). This hasallowed the use of a Nokia N800 tablet as a handheldbase station for the wearable sensor network. The N800provides a platform for GPS sensing through bluetooth,and Wi-Fi connectivity to allow forwarding of data tothe GNL. The use of a handheld base allows the realiza-tion of a body sensor network which can operate bothinside the home and outdoor, a key feature for support-ing a wide variety of human monitoring applications.
5. APPLICATIONS AND EVALUATION
5.1 Avatar
We first demonstrate an application called Avatar,which uses a network of motion sensors on the humanbody to reconstruct and visualize the wearer’s full-bodymotion in real time. The application can be used to re-motely monitor and assess the well being of elderly peo-ple living alone. It can also be used in tele-healthcare forphysicians to remotely record and visualize the move-ments of patients. Avatar provides much of the sameinformation about activity that could be captured byvideo, but does so providing a considerably higher levelof privacy for the monitored person. This is quite im-portant because it is unlikely that the average personwould be willing to accept continuous video surveillanceof their home. Additionally, Avatar has the benefit ofbeing derived from wearable sensors, and so is portable.For the purpose of visualizing motion, a configuration offive nodes (one on each leg, one on each arm and one onthe torso) are the minimum number of sensors required,which will be used in this section. To provide finer mea-surement of the full-body movement, more sensor nodescan be worn by a person.
Avatar makes use of the Java Monkey Engine (jME)[1] and physics plug-in [2] to render and animate a graph-ical human avatar. jME allows us to create an underly-ing skeleton with joints, then use sensor data to contin-uously change the orientation of this skeleton.
Through SPINE, each node estimates the pitch androll of its orientation in space and reports this pair ofvalues to the base station. The orientation in space ofa single sensor node is computed based on the appar-ent direction of gravity as seen by the sensor board’saccelerometer. When considered as a vector, the ac-celerometer will read the vector sum of gravity and ac-celeration resulting in movement of the sensor board.Under relaxed motion, motion component of the vectoris less than 10% of the magnitude of the gravity vector.As a result, this motion component is neglected and wecontinuously interpret the direction of the accelerom-eter vector as the direction of gravity. Although thesensor board’s gyroscope would presumably be benefi-cial in separating the gravity and motion vector compo-nents, in practice the gyroscopes indicate rotation evenwhile the sensor is moved in a strictly translational (andnon-rotational) path. As a result, the gyroscopes arecurrently ignored. This method is stateless and doesnot accumulate error, as would occur if accelerometeror gyroscope data were integrated to estimate velocityand displacement.
A snapshot of the output from Avatar is shown inFigure 8 with the physics skeleton in view. The yellowbars indicate the axes of the various skeleton joints, withgravity vectors are shown in red. At every frame, theorientation of the skeleton is compared to the data from
the sensor nodes. A simulated force is then applied toeach sensed body part to push it toward the orienta-tion reported by the sensor. The force is such that thephysics skeleton tracks the motion of the wearer, but islimited by the joints of the skeleton.
Figure 8: A screen shot for the Avatar with overlaid
image of the wearer.
5.2 Action Recognition
In addition to using graphical avatars to visualize andanalyze human poses and movements, another applica-tion of DexterNet is human action/activity recognition.Traditionally, human action recognition has been ex-tensively studied in computer vision using camera sen-sors placed in an (indoor) environment where humanusers reside. Compared with these high-power, high-bandwidth camera systems, body sensor networks suchas DexterNet have several distinct advantages: 1. Bodysensor systems do not require instrumenting the envi-ronment with cameras or other sensors. 2. Body sensorsystems also have the necessary mobility to support per-sistent monitoring of a subject during her daily activi-ties in both indoor and outdoor environments. 3. Withthe continuing miniaturization and integration of mobileprocessors and wireless sensors, it has become possibleto manufacture body sensor systems that can denselycover the human body to record and analyze very smallmovements (e.g., breathing and spine movements) withhigher accuracy than most extant vision systems. Suchaction recognition systems have been used in applica-tions such as medical-care monitoring, athlete training,tele-immersion, and human-computer interaction. Fora detailed survey of the literature, the reader is referredto [23].
We have constructed an open-source benchmark databasefor human action recognition using the DexterNet sys-tem called Wearable Action Recognition Database (WARD).
The purpose of WARD is to offer a public and relativelystable data set for quantitative comparison of existingand future algorithms for human action recognition us-ing body motion sensors. The database has been care-fully constructed under the following conditions:
1. The database contains sufficient numbers of humansubjects with a large range of age differences.
2. The designed action classes are general enough tocover most typical activities that a human subjectis expected to perform in her daily life.
3. The locations of the wearable sensors are selectedto be practical for full-fledged commercial systems.
4. The sampled action data contain sufficient varia-tion, measurement noise, and outliers in order forexisting and future algorithms to meaningfully ex-amine and compare their performance.
The data are sampled from 7 female and 13 malesubjects (in total 20 subjects) with age ranging from19 to 75. For more details about the data collection,please refer to the human subject protocol included inthe WARD database. The database also includes aMATLAB program to visualize the action data mea-sured from the five motion sensors (Figure 9).4
Figure 9: A MATLAB program that interfaces with the
TelosB base station via the series port. The program can
receive, record, and replay accelerometer and gyroscope
data from a network of motion sensors.
We have proposed a distributed recognition algorithmto classify human actions using the low-bandwidth mo-tion sensors [22, 23]. These actions include transient ac-tions, e.g., bending, lying down, and standing up; andcontinuous actions, e.g., walking, running, turning, andgoing upstairs. The algorithm classifies human actionsusing a set of training motion sequences as prior exam-ples. It is also capable of rejecting outlying actions thatare not in the training categories. The classification isoperated in a distributed fashion on individual sensornodes and a base station computer. More importantly,
4The WARD database is available for download at: http://www.eecs.berkeley.edu/~yang/software/WAR/.
the algorithm is robust and adaptive to the change of ac-tive sensors in a body network on-the-fly due to eithersensor failure or network congestion. The recognitionprecision only decreases gracefully using smaller sub-sets of active sensors. The accuracy of the framework isvalidated using the WARD database.
5.3 Public Health
DexterNet has many applications within the field ofpublic health, where the ability to objectively moni-tor the activity patterns of users may improve under-standing of exposures to environmental hazards such asair pollution that are associated with asthma attacks,chronic obstructive pulmonary disease (COPD), cardio-vascular disease, as well as premature mortality. Theaddition of the biosensor data provides a mechanismto monitor physiological responses to such exposures inreal time that may be predictive of severe disease events(e.g., an asthma attack).
The inclusion of geographic location data from theGPS is also important for such applications. In the past,spatial epidemiologic studies have relied upon rathercrude measures of location when describing a person’sexposure to environmental hazards such as air pollution.For example, some studies have simply used residentiallocations as a proxy for a person’s location. But in re-ality, individuals are mobile and have activity patternsthat may include time away from home, at work, run-ning chores, and exercising and playing. A system whichallows for continual monitoring of an individual’s loca-tion may greatly improve the assessment of exposuresfor such epidemiologic studies.
To evaluate the DexterNet system for such applica-tions, we conducted a field experiment in which the sys-tem was used to collect and process an integrated set ofdata related to an individual’s outdoor experience. Theexperiment consisted of a series of prescribed walks. Aconvenient sample of six adults (five male and one fe-male) were asked to walk a 2.4 km route. The walkincluded sections that were uphill, downhill, and flat,as well as sections that were along a busy roadway, adowntown commercial/retail area, as well as a calmerpath through a university campus. Over the course ofthe walk, various sensor data were logged, including tri-axial accelerometry and biaxial gyroscopy (at the leftwrist, waist, and left ankle positions), GPS location,and air pollution (airborne particulate matter ≤ 2.5 umin size, PM2.5). The motion data were logged at 30 Hz.GPS was logged at 1 Hz. The air pollution data werelogged separately using a Met One Aerocet 531, a hand-held particle counter that takes 2-minute samples con-tinuously during the walk. These data were combinedand processed to ascertain specific information on theindividual’s experience (e.g., assessing the magnitude of
physical activity in certain geographic locations, or theair pollution in each location that heavy activity oc-curred).
As an example, Figure 10 illustrates the GPS traceof the walking route. The application determines thechanges in elevation during the walk from the GPS data.A motion sensor at the waist was used to derive en-ergy expenditure using the Generalized Linear Model[7]. The breathing minute ventilation is derived fromthe biosensor EIP signal [17]. Heart rate is obtainedfrom the biosensor ECG signal using a simple R-peakdetection algorithm.
The GPS data were also used to map PM2.5 con-centrations from the Aerocet monitor during the walk(Figure 11). Data from three participants illustratesless spatial variability in pollutant concentrations thaninterday variability. We note that one of the days (theright panel of Figure 11) corresponds to a “Spare theAir Day”, a day when an elevated air pollution warning(typically for ozone rather than PM2.5) was issued bythe Bay Area Air Quality Management District to thegeneral public. From these data it is possible to derivean individual’s average and cumulative air pollution ex-posures and physiologic response for use in long-termepidemiologic studies.
6. DISCUSSION AND FUTURE
DIRECTIONS
In this paper, we have discussed DexterNet, a novelplatform for heterogeneous body sensor networks. Thekey tenets of DexterNet are twofold: 1. It promotesan open-source sensor environment that supports lim-ited on-node computation, robust sensor communica-tion, and online reconfigurable network management.2. The platform is versatile enough to support a varietyof existing body sensors and other future sensors thatcomply with the SPINE specifications. Through a hier-archy of three network layers, it resolves the dependencyof higher-level applications toward the implementationof wireless body sensors and communication protocols.
One advantage of the DexterNet system is its low costcompared to other existing commercial systems that aremore expensive and do not necessarily support open-source development. Currently, our system is limitedby the choice of the off-the-shelf components (e.g., theTelosB mote and the N800), which in their current stagesof development may not offer the most convenient form-factor and attractive packaging to make large-scale andlong-term use practical. However, the limitation can beeasily addressed by migrating the components to othercommercial components at the expense of increasingcost for manufacture. We are currently exploring newsolutions to improve our system for future use.
There are numerous potential services that may be
implemented through DexterNet, especially in the areaof preventive healthcare. For example, it is possible tocommunicate the data to an electronic medical recordsor personal health records system, such that a historyof a person’s activity and exposures may be used to im-prove diagnosis of health conditions. It is also possiblethrough the classification algorithms described to iden-tify conditions that are predictive of asthma attacks andwarn users to reduce physical activity and/or move in-doors. Such systems can create maps of microscale airpollution when they are deployed in sufficiently largenumbers. Currently, only regional air pollution mapsare available from the sparsely located fixed-site moni-toring that regulatory agencies implement.
The hierarchical design of DexterNet also provides at-tractive solutions to protect the wearer’s privacy, whichis mandated by the 1996 Health Insurance Portabilityand Accountability Act (HIPAA). Work on private-keyand public-key cryptography schemes for sensor net-works is applicable, but must be integrated into an ap-propriate authentication and authorization framework.In addition to using cryptography to protect the pri-vacy of data, it is important to consider other secu-rity attacks, such as injection of anomalous data andillegal data exfiltration (e.g., covert channels commu-nications [4]). Authentication, key establishment, ro-bustness to denial-of-service attack, secure routing, andnode capture are some of the security challenges in wire-less sensor networks. In the case of BSN, these issuesappear even more serious given the limited bandwidth,power supply, storage and computational resources ofthe platform. When implementing privacy and securitypreserving features becomes critical to certain high-levelapplications, the use of a mobile computer station suchas the N800 at the personal network layer reduces theburden on each wearable node.
Acknowledgments
The authors would like to thank Dr. Marco Sgroi atthe WSN Lab Berkeley, Dr. Yuan Xue at the Vander-bilt University, and Dr. Roozbeh Jafari at the Univer-sity of Texas, Dallas, for their valuable suggestions andliterature references.
7. REFERENCES
[1] Java monkey engine(http://www.jmonkeyengine.com/), September2008.
[2] jmephysics (https://jmephysics.dev.java.net/),September 2008.
[3] P. Barralon, N. Vuillerme, and N. Noury. Walkdetection with a kinematic sensor: Frequency andwavelet comparison. In Proceedings of the 28th
0 5 10 15 20 25 3040
60
80
100
120
me
ters
minutes
0 5 10 15 20 25 3013
14
15
16
17
KJ
pe
r m
inu
te
minutes
0 5 10 15 20 25 3080
100
120
140
160beats
per
min
ute
minutes
0 5 10 15 20 25 300
10
20
30
40
minutes
liters
/min
ute
Elevation
Energy Expenditure
Heart RateBreathing Minute Ventilation
Figure 10: GPS trace of campus walk with derived information from GPS, motion sensor and biosensor. Circles on
the map indicate elapsed time in minutes.
IEEE EMBS Annual International Conference,pages 1711–1714, 2006.
[4] National Computer Security Center. A guide tounderstanding covert channel analysis of trustedsystems. In NCSC-TG-030, Covert ChannelAnalysis of Trusted Systems (Light Pink Book)States Department of Defense (DoD) RainbowSeries, 1993.
[5] R. Chakravorty. A programmable servicearchitecture for mobile medical care. InProceedings of the IEEE International Conferenceon Pervasive Computing and CommunicationsWorkshop, 2006.
[6] J. Chen, K. Kwong, D. Chang, J. Luk, andR. Bajcsy. Wearable sensors for reliable falldetection. In Proceedings of the IEEE Engineeringin Medicine and Biology Conference, pages3551–3554, 2005.
[7] K. Chen and M. Sun. Improving energyexpenditure estimation by using a triaxialaccelerometer. Journal of Applied Physiology,83:2112–2122, 1997.
[8] T. Degen, H. Jaeckel, M. Rufer, and S. Wyss.SPEEDY: A fall detector in a wrist watch. InProceedings of the IEEE International Symposiumon Wearable Computers, pages 184–187, 2003.
[9] O. Lahtinen, V-P. Seppa, J. Vaisanen, andJ. Hyttinen. Optimal electrode configurations forimpedance pneumography during sports activities.In Proceedings of the 4th European Congress forMedical and Biomedical Engineering, 2008.
[10] D. Malan, T. Fulford-Jones, M. Welsh, andS. Moulton. CodeBlue: An ad hoc sensor networkinfrastructure for emergency medical care. InProceedings of the International Workshop onWearable and Implantable Body Sensor Networks,
Figure 11: GPS traces of campus walks for 3 participants, with geocoded PM2.5 measurements (circles), suggesting
less spatial variability than interday variability. Note the right panel illustrates the data for a “Spare the Air Day”,
a day when an elevated air pollution warning was issued by the Bay Area Air Quality Management District to the
general public.
2004.
[11] A. Milenkovic, C. Otto, and E. Jovanov. Wirelesssensor networks for personal health monitoring:Issues and an implementation. (in press)Computer Communications, 2006.
[12] M. Moron, E. Casilari, R. Luque, and J. Gazquez.A wireless monitoring system for pulse-oximetrysensors. In Proceedings of the 2005 SystemsCommunications, 2005.
[13] N. Oliver and F. Flores-Mangas. HealthGear: Areal-time wearable system for monitoring andanalyzing physiological signals. In Proceedings ofthe International Workshop on Wearable andImplantable Body Sensor Networks, pages 61–64,2006.
[14] I. Pappas, T. Keller, S. Mangold, M. Popovic,V. Dietz, and M. Morari. A reliablegyroscope-based gait-phase detection sensorembedded in a shoe insole. IEEE Sensors Journal,4(2):268–274, 2004.
[15] C. Sadler and M. Martonosi. Data compressionalgorithms for energy-constrained devices in delaytolerant networks. In Proceedings of the ACMConference on Embedded Networked SensorSystems, pages 265–278, 2006.
[16] V-P Seppa, J. Vaisanen, P. Kauppinen,J. Malmivuo, and J. Hyttinen. Measuringrespirational parameters with a wearablebioimpedance device. In Proceedings of the 13thInternational Conference on ElectricalBioimpedance, 2007.
[17] V-P. Seppa, J. Vaisanen, O. Lahtinen, andJ. Hyttinen. Assessment of breathing parametersduring running with a wearable bioimpedance
device. In Proceedings of the 4th EuropeanCongress for Medical and Biomedical Engineering,2008.
[18] A. Sixsmith and N. Johnson. A smart sensor todetect the falls of the elderly. PervasiveComputing, pages 42–47, 2004.
[19] The SPINE Team. The spine manual version 1.2.Technical report, Telecom Italia Lab, 2008.
[20] G. Williams, K. Doughty, K. Cameron, andD. Bradley. A smart fall and activity monitor fortelecare applications. In Proceedings of the IEEEInternational Conference in Medicine and BiologySociety, 1998.
[21] A. Wood, G. Virone, T. Doan, Q. Cao, L. Selavo,Y. Wu, L. Fang, Z. He, S. Lin, and J. Stankovic.ALARM-NET: Wireless sensor networks forassisted-living and residential monitoring.Technical report, Department of ComputerScience, University of Virginia, 2006.
[22] A. Yang, R. Jafari, P. Kuryloski, S. Iyengar,S. Sastry, and R. Bajcsy. Distributedsegmentation and classification of human actionsusing a wearable sensor network. In Proceedings ofthe CVPR Workshop on Human CommunicativeBehavior Analysis, 2008.
[23] A. Yang, R. Jafari, S. Sastr, and R. Bajcsy.Distributed recognition of human actions usingwearable motion sensor networks. Submitted toJournal of Ambient Intelligence and SmartEnvironments, 2008.
[24] J. Yick, B. Mukherjee, and D. Ghosal. Wirelesssensor network survey. Computer Networks,52(12):2292–2330, 2008.