People Tracker Report
-
Upload
rasool-reddy -
Category
Documents
-
view
217 -
download
0
Transcript of People Tracker Report
-
7/28/2019 People Tracker Report
1/16
1
TRACKING PEOPLE FROM ASTATIONARY CAMERA
Marc Kelly Robins, Arvind Antonio de Menezes Pereira and Abhay V NadkarniDepartment of Electrical Engineering, USC
ABSTRACT
An algorithm for tracking people based on the color statistics of their attire is described. People are identifiedas moving objects using a combination of background subtraction and edge detection. Encompassing of
people within tracking boxes is based on histograms of large blob masses. The tracking aspect isimplemented using the unique color statistics of a particular individual. Two major methods of using colorstatistics are discussed. The system showed excellent results with two people in a frame for method I. MethodII showed better results in terms of speed, robustness and consistency.
INTRODUCTION
Applications in tracking people via real-time systems are prevalent in modern society, ranging
anywhere from surveillance to event recognition. Following the research of McKenna et al.
[2], we developed an algorithm for real-time people tracking using the C6416 DSK by Texas
Instruments. First, adaptive background subtraction is performed using both changes in
intensity and edge detection. Then we aggregate neighboring foreground pixels to constitute
moving objects. Finally, individual people are identified by their unique chromatic features
and followed from one frame to the next. During the design of the system certain assumptions
were made about the environment. Since the algorithm begins with foreground segmentation,
the camera must remain stationary. Due to the low frame rate, motion must be moderately
slow and continuous. Furthermore, the background should stay in effect placid. Implementing
this system on the C6416 involved certain limitations. For example, the C6416 uses fixed
point processing, which requires careful attention to numerical precision in parts of the
algorithm. Also, the video equipment available for this project incurred random frames of
impulsive noise, which we found insurmountable. The following report presents our
algorithm, code optimizations, and final results.
GUIDING PRINCIPLES USED IN ALGORITHM DESIGN
1. People moving in the video sequence: In a video sequence, our algorithm identifies people
by motion detection as opposed to feature detection (Corner detection, SIFT) [4,5] or template
matching [6]. These methods are quite computationally intensive when compared to our
simplistic method. In our method, we detect a moving person by noting changes in the mean
-
7/28/2019 People Tracker Report
2/16
2
values of pixels corresponding to the entire frame. Once the values cross a particular threshold,
we consider that a person has moved.
2. People occupy a significant portion of the frame: Size is used as one criterion to reject non-
human motion and ambient noise. The algorithm should reject any small movement as noise.
Only substantial motion is considered as a person moving. For example, leaves moving in a
video sequence are not considered as motion.
3. People may stop moving for a while: Our algorithm should not rely completely on motion
detection. Temporal tracking of people is a desired trait. Tracking in our algorithm is done
using a score based mechanism wherein a person who stays in a frame has a fixed score or
incremented score. People exiting the frame gradually get a reduced score and are slowly
phased out of the screen.
4. Collectedcolor Statistics can be used for further filtering and tracking: A person is uniquely
identified by what he wears. Each person would have a unique mean chrominance value based
on what he/she is wearing. This value is used for distinguishing people from one another in the
frame.
APPROACH
We have worked on two methods to perform people tracking. Method I uses information
available in the entire video sequence. The framework and algorithm is discussed in the next
few sections. Method II is used on a sub-sampled video sequence since we had the aim of
improving the tracking response. The initial stages for both are similar and differ only in the
methodology of finding the color statistics. Method I uses a simplistic approach of finding the
color statistics once a person is encompassed within a box. Method II has a more sophisticated
approach and works outside the box, i.e. it searches for similar chromatic content outside the
box encompassing a person in a lateral fashion within consecutive frames.
-
7/28/2019 People Tracker Report
3/16
3
DESCRIPTION OF THE ALGORITHM
Shown below is a flowchart of Method I with its corresponding stages. The algorithm is
explained in detail in a stage based approach in the following sections.
Fig 1. Flowchart for Method I.
-
7/28/2019 People Tracker Report
4/16
4
Stage 1: Adaptive Background Subtraction
Adaptive background subtraction is done to isolate the foreground from the background . Since
were tracking people in this project, we consider them to be a part of the foreground. The
means and variances of each pixel is recursively estimated using the equations from Mckenna
et al.[2,p 4 ]
t+1= t + (1-) zt+1 (1)
2t+1=(2t+(t+1-t+1)2)+ (1-)(zt+1-t+1)2 (2)
where t+1 refers to the weighted mean of the pixel at time t+1 , t refers to the mean value of
the pixel in the previous frame andzt+1 gives the current value of the pixel. Equation 2 gives a
similar version of the recursive update for the variance. The coefficient ofis chosen to be 0.8
for our project. The background subtraction in our case is done based on the values of
luminance i.e. in the Y channel. If the difference between the current value of the pixel and its
temporal mean value i.e. Yt+1 - Yt+1 is greater than three times Yt+1 , then the pixel is
considered to be a part of the foreground. Pixels that do not show changes larger than three
standard deviations are considered background pixels [2,p 4].The adaptive backgroundsubtraction also includes edge detection based on a sobel mask. The edge detection gives us the
silhouette of a person. A similar method of updating first and second order statistics is
recursively found out to determine moving edges within the sequence of frames. Sobel edge
detection was used since we thought it would be the fastest kind of edge detection. The
background subtraction in [2] is done using RGB channels. In our method, we had YCbCr as
the luminance and chrominance components. We used only the Y channel for background
subtraction because we found that the use of chroma channels was computationally intensive
while not providing an information gain that would justify their requirements. The results that
we achieved were quite good with just the Y channel.
Stage 2: Finding the Region of Interest
Our background subtraction routine is not expected to produce a complete silhouette of a
person. Instead it must combine segments of objects in the foreground, such as a head and
-
7/28/2019 People Tracker Report
5/16
5
lower torso, to infer the presence of a person. McKenna et al. [2, p. 47] use connected
components to aggregate foreground pixels in the image. According to Jain et al. [3, p. 44],
connected component algorithms usually form a bottleneck in a binary vision system.
Therefore in order to maintain low processing overhead we chose to implement a simpler
method using histograms.
First, the foreground pixels are projected onto the x-axis to create a simple histogram of
horizontal motion. Small gaps in the foreground silhouette of a person are common; therefore
the histogram is smoothed using a window of length 8. Scanning from left to right, the system
looks for separable hills in the histogram to distinguish as different people. This yields a
bound for the left and right sides of each region of interest. Within the horizontal ROI, the
system creates a vertical histogram of the foreground pixels. In a similar fashion, this
histogram allows us to compute a lower and upper bound on the ROI.
The mass of an object, or the total area of foreground pixels in a ROI, serves as the chief
discriminant between people and other moving objects. In other words, if the mass of an
object is too small the ROI is discarded. The remaining objects are considered candidates for
moving people. Many authors choose to recognize skin pigment or skeletal forms of the
human body for further distinction between people and other objects, however in our
environment the only moving objects are people.
Stage 3: Matching People based on Nearest Neighbors
After we have a set of candidate target regions (boxes) which we would like to consider as
people, we can use the color information from the chroma channels (cb and cr) to help
uniquely identify each person, so that we can track them more accurately. In order to do this,
we compute two 16-tuple vectors, containing the averaged cb and cr values for 16 smaller
boxes within the region of interest for each target box.
Our algorithm requires that we initialize a similar set of vectors for each person we have
correctly identified in the previous stages. To ensure that we do this reliably, we begin
computing color-statistics only after at least 20 frames have elapsed. Such a time-frame allows
the back-ground subtraction results to settle down to more accurately detect human forms in
the image sequences. We store the location of each person in a structure which associates the
locations as well as the color vectors for the ROI for that person we would like to track.
Given a set of people we are interested in tracking, and a set of candidate targets, we can then
-
7/28/2019 People Tracker Report
6/16
6
perform a nearest neighbors classification by which we attempt to match each candidate target
to the nearest tracked person in the color-statistics sense. We do this by using an absolute
difference distance measure between the color-statistics of each candidate targets color
statistics and those of each person we have been tracking.
For each candidate box, we compute its distance to the closest tracked boxes using the
following distance measure.
If the minimum distance dcb and dcrbetween the box and all the other tracked boxes is
smaller than a minimum threshold distance, we assign this box the same id as that of the
tracked person which it matched. We use this threshold mechanism to ensure that we are
reasonably confident that this is almost definitely the person we have been tracking upto that
point of time.
A further improvement that we have performed attempts to reduce the variance in the colors
statistics for the persons we are trying to track. Here we try to look more at the center of the
person rather than the overall region.
Stage 4: Updating the Feature Vectors of Known Individuals
To facilitate the arrival of new people in the frame, and to forget people who leave for more
than several seconds, the system must learn to update the feature vectors of known people.
When a new person arrives he or she is given an initial score. For every successive frame
containing this person, his or her score is incremented until the maximum score is reached. For
every frame that does not contain this person, his or her score is decreased. When a score
reaches zero the feature vector for that person is removed from memory, and that person is
forgotten. Similarly, when a person arrives who does not match the known feature vectors he
or she is added to memory. In this way new people are recognized by the system, and absent
people are forgotten.
Due to the non-uniform distribution of light across the room, our system must learn to adapt
when the chromatic values of a known person slightly change. Assuming the chromatic values
-
7/28/2019 People Tracker Report
7/16
7
of a person are still recognizable, a weighted average is taken with the latest chromatic feature.
This operation weights the old chromatic features more heavily than the new values. In our
design of the system we kept this weight parameter on a GEL slider, usually at 0.1 or 0.2
depending on the lighting conditions.
People having scores above a particular threshold are tracked and have boxes drawn around
them with a corresponding color for a person.
RESULTS FORMETHOD I.
The results we obtained for method 1 are shown below. There were two main demos that were
displayed. Demo 0 shows the output for the adaptive background subtraction so that one gets
an idea of what the output looks like after this stage. Demo 1 shows the actual output that we
obtained from method 1. If there are at most two people in a frame, the algorithm is able to
track people quite well. In our final demo, when two people entered the frame, there were
boxes that were drawn around them .The boxes had a unique color assigned to them via which
we identified a particular person. When people crossed each others paths, the algorithm was
still able to track people well and maintained the same box color assigned to the person. Our
results were not good for three people since the foreground and background seem to be almost
similar and thereby prevented reliable tracking as compared to that for the case when there
were two people. The frame rate is very slow in this algorithm (1 fps).
-
7/28/2019 People Tracker Report
8/16
8
Fig 2. Output of Demo 0.
Fig 3, Output of Demo 1.
METHOD II:AFORWARD-BACKWARD APPROACH
A fusion of color statistics and adaptive background subtraction
MOTIVATION
Our previous method relied heavily upon adaptive background subtraction and that
people should move within each frame to be detected. This while being a practical method
for detecting people, tends to fail when people stop moving. Often people enter the frame
and stop moving, thus resulting in a gradual fading out of the outline of each person
being tracked, until we lose that person. This is obviously a problem, and we would like to
use a way to work around having to rely solely upon motion detection to locate people.
Using the colors from peoples clothing to track them is certainly not a very new idea, as
we did use it to track people from frame to frame even in our preceding tracking method.
The interesting change which we felt would help us track people involved computing
color-statistics within the region which we believe contains a person (based on change-
detection as we have done earlier), and then to use this information for each tracked
person to look around their current locations to determine their exact locations in the
frame. This can be done by performing a coarse scan around the last known locations of
each tracked person in the frame, and then finding the location that produces the
minimum distance between the color-statistics !om the scan and those for that person.
This will naturally speed up the process of searching for the person and produce a much
needed increase in our output frame-rate.
-
7/28/2019 People Tracker Report
9/16
9
Another feature that we might want to add to our algorithm is the ability to update the
color statistics for each tracked person. As a person moves from a dimly lit area to one
which is brighter, the color spectrum that is reflected from him/her results in a change in
both the chrominance and luminance values. This results in a variation in the underlying
color statistics that we have gathered for that person initially. Merging the changes
gradually into that model will help us adjust to varying intensities more robustly.
This brings us to the question - When should we update the color statistics for a person?
We think a good solution would be to collect statistics from the frame by looking in a
region which should ideally show smaller variance both spatially and temporally, as well as
one which is easy to compare with statistics we might gather in the future. The easiest way
to know we have found a person is to look for a large moving blob. We have already
discussed how to develop such a detector and we can employ this to locate a person who is
moving. By looking in the center of this region and averaging the values over a decent area,
we can form a feature vector with color statistics from the two chroma channels -cb and
cr. However, we would certainly want to avoid updating these statistics initially because we
would like our Background subtraction algorithm to settle down to a stable state, at which
point of time only people (modulo camera measurement/acquisition noise) cause changes
in our frames.
Is there another situation which we might want to be weary of? We believe the answer is
yes! Although a large blob is a good indicator of a well localized person in the image
(assuming it really is a person), things could get complicated if there are two people who
are moving in the view and cross each other. This results in occlusion, and if we attempt to
update color-statistics at this point of time, we will be introducing errors in the values for
both people, which is something we certainly want to avoid.
Background subtraction is susceptible to trail-effects when a person moves rapidly
between subsequent frames. This results in poorly localized boxes, producing erroneouscolor-averages because we might pick up cb and crvalues which are part of the background
through which the person just passed. A solution to this would involve modeling the speed
of the persons movement, which would allow us to know when the movement might be
producing a trail-effect, during which color-updates should be avoided. At the time of
writing this report we are not going to attempt solving this problem.
-
7/28/2019 People Tracker Report
10/16
10
DESCRIPTION OF THE ALGORITHM
Figure 4 contains aflowchart of the algorithm which we employ in this second method of
tracking people. The motion detection stage based on adaptive background subtraction
has been discussed in the previous algorithm. We use the next stage from the previous
algorithm which allows us to locate regions in the image which contain movements to
detect people who are moving.
These locations are used to update color statistics for each person as well as to update the
locations of each tracked person during the fusion stage.
Fig 4 : Flowchart for the Fused people tracker.
Since many of the intial stages from the new algorithm are essentially the same as those
used earlier, we will only cover those sections which are functionally different here.
-
7/28/2019 People Tracker Report
11/16
11
Stage 3: Color Updates
In this stage, we compute the color statistics for each location that has been detected by
the motion-detection stage. We compare these statistics with those of each tracked
person whose statistics we have already computed. If we find that these are very similar
and that the location of this person is very close to the previous known location of the
tracked person, we merge the statistics with those of that person. A special case that we
take care of is to ensure that we do not merge statistics for boxes that are very far away
from each other in the image to take care that two people wearing similar clothing are
tracked individually as the case should be. If no match has been found between the people
we have been tracking and the present one, we will want to add this person to the list of
people we have been tracking and store the persons color statistics and location in the
structure.
Stage 4 : Finding the location of people based on their color statistics
We scan around and at the last known location of each tracked person, and compute a
SAD (sum of absolute differences) score between the averaged values of each scanned
location and those from the tracked person. This gives us a set of scores, and we assume
that the location with the lowest score is the most likely to be the location of the person
based on a similar color search in the image. Another advantage of this approach is it
constrains the search space to a smaller area which results in faster algorithm execution. If
not, we would have to scan a much larger area of the image, which would increase the
computational requirements. To ensure that we are not dealing with occlusions we make
sure that we do not merge color-statistics of the present tracked person with its previous
statistics if the locations of another tracked person with very similar color-statistics is less
than a certain threshold. Occluded color statistics are not very helpful in estimating the
location of the person in the frame reliably.
Stage 5 : Fusion of color statistics and Background subtraction
The tracked people are maintained in an array that contains the color statistics vectors
associated with that person, an id for the person, a score and aflag that is used to indicate
if the person is being tracked. We employ a scoring mechanism such as that described
earlier to allow new people to be added to those already being tracked as well as to forget
people who have left the cameras view. The algorithm implemented in the code which we
are submitting at the time of writing this report is assumed to implicitly fuse the
-
7/28/2019 People Tracker Report
12/16
12
information from the outputs of our motion detection and color statistics matcher
modules through the updation of the location of tracked people. We would like to use a
more intelligent method of performing this fusion in the future.
\
RESULTS FORMETHOD IIThe results we obtained in method II , are as shown below. The algorithm per se performed
better in terms of speed and accuracy. The sub-sampling of the frames sped up the entire
process quite significantly. The tracking is more consistent, but has a jittery output since we
perform a coarse sampling around the tracked location. However, we have very few cases of
sporadic boxes appearing on the screen that belong to a person who is being tracked. In
general, this method seemed more robust and consistent than method I. This algorithm is very
sensitive to occlusions, which we have not handled well. When people pass each other, theircolor statistics may get updated with values belonging to the other person and this prevents the
algorithm from subsequently tracking at least one of these people accurately. Also, noisy
frames sometimes cause the background itself to be tracked. The biggest advantage of this
algorithm is that it can track people who are stationary after they have been captured by the
tracker.
-
7/28/2019 People Tracker Report
13/16
13
Figure 5: Display for Method II
OPTIMIZATIONS
THE SOBEL-EDGE DETECTION ROUTINE
The Sobel edge transform is a pixel-wise operation that consumes many cycles. Memory
latencies and matrix operations are two areas we optimized. The benefits of restructuring the
Sobel routine were noticeable immediately, despite the nondeterministic behavior of memory
access.
In external memory, images are stored one row after another. Therefore fetching pixels in row
order will reduce access time. To improve efficiency, the outer loop iterates across each row
while the inner loop iterates across each column.
The Sobel transform operates on each pixel and its eight adjacent pixels. Instead of fetching
nine array elements on every iteration we want to store the reusable pixels on the stack and
only fetch three new pixels. The following examples shows how we restructured the routine
and unrolled the Sobel transform.
-
7/28/2019 People Tracker Report
14/16
14
Original Code Better Memory Access and Loop
Unrolling
for y, 0 to Image Height
for x, 0 to Image Width
/* Sobel Transform */
edge1 = 0
edge2 = 0
for i, in {0, 1, 2}
for j, in {0, 1, 2}
pixel = Image [x+i][y+j]
edge1 += pixel * sobel1[i][j]
edge2 += pixel * sobel2[i][j]
end
end
/* More processing */
endend
for y, 0 to Image Height
/* Store first nine pixels in local variables
*/
a11 = Image[0][y]
a12 = Image[1][y]
a13 = Image[2][y]
...
a33 = Image[2][y+2]
for x, 0 to Image Width
/* Sobel Transform */
edge1 = a11 + 2*a12 + a13 - ... - a33
edge2 = a11 + 2*a21 + a31 - ... - a33
/* Shift pixel values, fetch new pixel
*/
a11 = a12a12 = a13
a13 = Image[x+3][y]
...
a33 = Image[x+3][y+2]
/* More processing */
end
end
Table 1: This table demonstrates the sort of optimizations made to improve memory access and operational cost of the Sobel-Edge Transform
SUBSAMPLING OF THE INPUT VIDEO SEQUENCE
The overall efficiency and speed of the output increased greatly when we used a sub-sampled
version of the input video sequence in the Y channel. Without sub-sampling we were able to
obtain a frame rate of just about 1frame per second. With sub-sampling, our frame rate went up
to almost 4Hz , thereby giving us a much better display feedback. Down-Sampling about half
the number of pixels required for the Y Channel gave us better results in method II in terms of
speed than in method I.
FRAME REJECTION
There was a problem of stray boxes showing up during the display of the demo. This happened
because every 4 frames there was a shift in the frame sequence that used to occur. This shift in
the frame sequence created an illusion to the algorithm that somethings moving in the frame
sequence. We tried to reject the bad frame by finding out the difference between the values
-
7/28/2019 People Tracker Report
15/16
15
edges in each frame sequence. Using a Sum of Absolute Differences measure and thresholding,
we rejected frames that had a SAD value of more than 6500 for a given set of pixels.
Unfortunately the method didnt really work well and sometimes even good frames were
rejected.
CONCLUSIONS
From results I and II , we can conclude that each method has its own set of pros and cons.
Although the output from method I produces a few stray boxes due to noisy frames, it localizes
a person who is moving very accurately, and can track people even when they pass each other
in the frame. We feel this algorithm would give us very promising results if we could get it to
run faster than the appalling 1fps, we get from the system at this time. Execution of such an
algorithm at higher speeds will allow us to constrain the location of tracked boxes (as a person
cannot be expected to move very far from his previous location if the next frame was captured
a short interval later). This would enable us to filter out many stray boxes, while also
producing a more useful and robust tracking output.
If we had to implement this in practical setting on a more constrained platform such as one
based on the TMS320C6416 DSK, method II would seem to show better performance
especially in a video surveillance and other security applications where the speed and accuracy
of the tracking is quite important. The idea of searching laterally outside a box assigned to a
particular person in method II worked well and gave the added stability required for a tracking
algorithm. Coarse scanning to localize each tracked person also helps speed up algorithm
execution. These improvements in speed help us apply temporal filtering to constrain the
movements of each person, which in turn helps us improve our tracking consistency.
To be practically feasible however, this algorithm needs to be strengthened with the capability
to handle occlusions. We have not been able to implement a reliable method of handling these
problems due to occlusion at this time, and would like to develop a strategy that overcomes this
crippling problem.
To sum up, the first algorithm is accurate and quite good at tracking people even when they
cross over. However, it is susceptible to noisy camera images, has a very slow frame rate and
can not track people who have stopped moving. The second algorithm overcomes these
-
7/28/2019 People Tracker Report
16/16
16
disadvantages from the first algorithm, but fails to track people who occlude each other. In the
future we would like to address this issue, which will help us realize a more practical people
tracking system which can be made to work on relatively constrained real-time systems.
References
[1] Jain, R., Kasturi, R., Schunck, B. G., Machine Vision, McGraw-Hill Inc., 1995.
[2] McKenna, S. J., Jabri, S., Duric, Z., Wechsler, H., Rosenfeld, A., Tracking Groups of
People, Computer Vision and Image Understanding: CVIU, Vol. 80, No. 1, p. 42-56, 2000.
[3] Moeslund, T. B., Granum, E., A Survey of Computer Vision-Based Human Motion
Capture, Computer Vision and Image Understanding: CVIU, Vol. 81, No. 3, p. 231-268, 2001.
[4] Bouchrika, I., Nixon,M.S., People Detection and Recognition using Gait for Automated
Visual Surveillance , Crime and Security, 2006. The Institution of Engineering and
Technology Conference (ICDP -2006)
[5] Negre, A. Tran, H. Gourier, N. Hall, D. Lux, A. Crowley, J. L., Comparative Study of
People Detection in Surveillance Scenes, LECTURE NOTES IN COMPUTER SCIENCE, 2006,
NUMB 4109, pages 100-108
[6]Gavrila, D., Pedestrian detection from a moving vehicle. ECCV, 2000,vol. II, pp. 3749.
http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4123728http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4123728http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4123728http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4123728