Event-Driven Dynamic Platform Selection for Power-Aware...

Event-Driven Dynamic Platform Selection forPower-Aware Real-Time Anomaly Detection in Video

Calum G. Blair1 and Neil M. Robertson2

1Institute for Digital Communications, University of Edinburgh, Edinburgh, UK2Visionlab, Heriot-Watt University, Edinburgh, UK

[email protected]

Keywords: FPGA, GPU, Anomaly Detection, Object Detection, Algorithm Mapping.

Abstract: In surveillance and scene awareness applications using power-constrained or battery-powered equipment, per-formance characteristics of processing hardware must be considered. We describe a novel framework formoving processing platform selection from a single design-time choice to a continuous run-time one, greatlyincreasing flexibility and responsiveness. Using Histogram of Oriented Gradients (HOG) object detectors andMixture of Gaussians (MoG) motion detectors running on 3 platforms (FPGA, GPU, CPU), we characteriseprocessing time, power consumption and accuracy of each task. Using a dynamic anomaly measure based oncontextual object behaviour, we reallocate these tasks between processors to provide faster, more accurate de-tections when an increased anomaly level is seen, and reduced power consumption in routine or static scenes.We compare power- and speed- optimised processing arrangements with automatic event-driven platform se-lection, showing the power and accuracy tradeoffs between each. Real-time performance is evaluated on aparked vehicle detection scenario using the i-LIDS dataset. Automatic selection is 10% more accurate thanpower-optimised selection, at the cost of 12W higher average power consumption in a desktop system.

1 INTRODUCTION

Opportunities for the use of advanced computer vi-sion algorithms have increased to include surveillanceand monitoring applications. We consider the de-ployment of such algorithms in a scenario with lim-ited electrical power available for processing, such assurveillance from a mobile vehicle which may be run-ning on battery power. Timely and accurate detectionsof objects and events are still important in such cir-cumstances. Field Programmable Gate Arrays (FP-GAs) and Graphics Processing Units (GPUs) offerlarge improvements in speed and power compared toa reference implementation on a standard CPU. Thisis increasingly relevant given the growing trend forsmart cameras with processing capability, and the de-velopment of mobile general-purpose GPUs. Each al-gorithm implementation on one of these architectureswill have its own performance characteristics — pro-cessing time, power consumption and algorithm ac-curacy — and will exist at a certain point in a multi-dimensional design space.

Using a heterogeneous system which containsboth FPGAs and GPUs offers increased flexibility, al-lows decisions about the optimal platform to use to

be moved from design time to run-time, and allowsthis decision to be updated in response to changingcharacteristics within the image or environment. Inthis paper, we use detection of parked vehicles insurveillance video as an anomaly detection applica-tion. First, we define an anomaly measure based onthe motion of objects within a video. Using a hetero-geneous system to process video, we select process-ing platforms — and hence alter system power, speedand accuracy characteristics — based on this anomalylevel. We show how platform selection changes inresponse to changing anomaly levels, and comparethe performance of this dynamically-mapped systemagainst a static one optimised for power or speed. Tosummarise, the contributions of this work are: we de-scribe our system consisting of multiple implementa-tions of various object detection algorithms runningacross various accelerators in a heterogeneous hard-ware platform, demonstrate dynamic selection be-tween these implementations in response to eventswithin a scene, and describe the associated tradeoffsbetween power consumption and accuracy in a real-time system. This paper is laid out as follows: afterdescribing relevant work in this section, §2 describeseach algorithm used in the system and their perfor-

mance characteristics. Once an overall anomaly levelis obtained, the mapping procedure described in §3selects the next set of implementations. We discussmethodology in §4, results are presented in §5 andfollowed by analysis of the findings in §6. We con-clude with pointers to future work in §7.

We consider first the architectural background ofthis work, and then place it in the context of otherparked vehicle detection work evaluated on the samedataset. FPGAs and GPUs are two of the most com-mon ways of improving performance of compute-intensive signal processing algorithms. In general,FPGAs offer a large speedup over reference imple-mentations and draw low power because they in-stantiate logic which closely matches the applica-tion. However, their use of arbitrary-precision fixed-point arithmetic can impact accuracy slightly, andthey require specialised knowledge to program, of-ten incurring longer development times (Bacon et al.,2013). In contrast, GPUs have a software program-ming model, using large arrays of floating-point unitsto process data quickly, at the expense of power con-sumption. Their ubiquitous presence in PCs and lap-tops has made uptake very common in recent years,and the availability of Nvidia’s CUDA language forboth desktop and (in the near future) mobile and tabletdevices greatly increases the potential for pervasivevision computing.

When building a system to perform a process-ing task, we consider each of the characteristics ofFPGA and GPU and select the system which bestsuits our application, for example prioritising fast run-time over low power consumption. This has beendone for various smaller algorithms, such as colourcorrection and 2-D convolution, as in (Cope et al.,2010). This design-time selection can also be done ata low level using an optimisation algorithm as shownby (Bouganis et al., 2009), although this is usuallymore concerned with FPGA area vs. latency trade-offs. As an example, Figure 1 shows design spaceexploration for power and time characteristics of im-plementation combinations used in this paper, with awell-defined Pareto curve at the left-hand side. To theauthors’ knowledge, little work has been done in thearea of power-aware algorithm scheduling at runtime;the most appropriate is arguably (Yu and Prasanna,2002), although this does not consider more than twoperformance characteristics or deal with changing op-timisation priorities over time.

The UK Home Office supplies the i-LIDSdataset (Home Office Centre for Applied Science andTechnology, 2011) used in §5, and this has been usedfor anomaly detection by other researchers. (Albiolet al., 2011) identify parked vehicles in this dataset’s

0 50 100 150 200 250 300 350 400 450 500 550 600 650180

190

200

210

220

230

processing time (ms)

syst

empo

wer

cons

umpt

ion

(W)

ped-ggg ped-cff ped-gff ped-gfg ped-cfcped-ccc car-ggg car-cfc car-gfg car-ccc

Figure 1: Design space exploration for power vs. time trade-offs for all possible combinations of car, pedestrian and mo-tion detector implementations. Pedestrian implementationsare labelled by colour, and car implementations by shape.

PV3 scenario with precision and recall of 0.98 and0.96 respectively. However, their approach is consid-erably different from ours, in that they:- (a) requireall restricted-parking lanes in the image to be manu-ally labelled first, (b) only evaluate whether an objectin a lane is part of the background or not, and (c) donot work in real-time or provide performance infor-mation. In addition, their P and R results do not de-note events, but rather the total fraction of time duringwhich the lane is obstructed. Due mainly to limita-tions within our detectors, our accuracy results alonedo not improve upon this state-of-the-art, but giventhe points noted above, we are arguably trying to ap-proach a different problem (anomaly detection underpower and time constraints) than Albiol et al.

2 SYSTEM COMPONENTS

A flow diagram for the high-level frame processingframework is shown in Figure 2. It works on of-fline videos and dynamically calculates the numberof frames to drop to obtain real-time performance at25FPS. (We do not count video decode time or timefor output image markup and display as part of pro-cessing time.) The ‘detection algorithms’ block gen-erates bounding boxes with type information, whichis used by the remaining algorithm stages. (An ex-panded version of this step showing all combinationsis shown further on in Figure 4.) This section givesdetails of each algorithm in Table 1, i.e. the detectionalgorithms run on the image, and method for calcu-lating image anomaly level. The object detectors andbackground subtractor were by far the most computa-tionally expensive algorithm stages, so each of thesehad at least one accelerated version available. We usea platform containing a 2.4GHz dual-core Xeon CPU,a nVidia GeForce 560Ti GPU, and a Xilinx ML605

performancedata

algorithm→ platform

mappingimpl. search priority

selectionlog event andsave image

videoframe

detectionalgorithms

detectionmerging object tracking trajectory

clusterscluster

anomalousnessanomaly

thresholding

locationcontext update

objectanomalousness

Mapping generation

Anomaly detection

System output

Figure 2: Frame processing and algorithm mapping loop in anomaly detection system.

host x86 CPUGPU on-card

memory

FPGA host memory GPUPCIe PCIe

Figure 3: Arrangement of accelerated processors withinheterogeneous system. Each processor can access host mainmemory over PCI express, and the two accelerators haveprivate access to on-card memory.

Table 1: Algorithms and implementations used in the sys-tem.

Algorithm Implementation(s)Ped. Detection: FPGA GPU CPUHOG (§2.1)Car Detection: FPGA GPU CPUHOG (§2.2)Motion Segment: GPUMoG (§2.3)Tracking: Kalman CPUFilter (§2.5)Trajectory CPUClustering (§2.6)Bayesian Motion CPUContext (§2.7)

board with a XC6VLX240T FPGA, as shown in Fig-ure 3. Implementation performance characteristicsare shown in Table 2 where appropriate.

2.1 Pedestrian Detection with HOG

We use various accelerated versions of the His-tograms of Oriented Gradients pedestrian detector de-scribed in (Dalal and Triggs, 2005); this algorithm is

10−2

10−1

100

0.40

0.50

0.64

0.80

1

false positives per image (FPPI)

mis

s ra

te

62% cff61% gff59% gfg59% cfc53% ccc52% ggg48% ggg−kernel46% Orig−HOG

Figure 5: DET curve (False positives per image against missrate) for HOG pedestrian detector.

well-understood and has been characterised on vari-ous platforms. Each version is split into three compu-tationally expensive parts (image resizing, histogramgeneration and support vector machine (SVM) classi-fication), and these are implemented across differentprocessing platforms (FPGA, GPU and CPU) as de-scribed by (Blair et al., 2013). These mappings areshown in the left hand side of Figure 4. The follow-ing mnemonics are used in this Figure and elsewhere:a single implementation is referred to by the archi-tecture each part runs on, e.g. running HOG with re-sizing on CPU, histogram generation on FPGA andclassification on CPU is cfc. Each implementationhas different power consumption, speed and accuracycharacteristics, summarised in Table 2. These couldbe switched between dynamically at runtime with noperformance penalty. We show a False Positives PerImage (FPPI) curve for each implementation in Fig-ure 5. Detector errors (e.g. in Figure 6) affect overallaccuracy dramatically. Errors in the training phase

image detections

cpuoperation: scale histograms classify group scale histograms classify group extract

BBs

gpu op-eration: scale histograms classify scale histograms classify update

maskimageopening

fpga op-eration:

histograms classify histograms

Pedestrian Detection (HOG) Car Detection (HOG) Motion Detection (MOG2)

Figure 4: All possible mappings of image processing algorithms to hardware. Multiple possible implementation steps withinone algorithm are referred to by the platform they run on. Running HOG with a resize step on GPU, histograms on FPGAand classification on GPU is referred to throughout as gfg. Data transfer steps not shown.

Figure 6: False positives (blue rectangle representing pedes-trian) from object detectors affected training and testing per-formance, both directly and through learned context mea-sures.

10−2

10−1

100

0.40

0.50

0.64

0.80

1

false positives per image (FPPI)

mis

s ra

te

94% cfc93% gfg89% ggg89% ccc

Figure 7: DET curve for HOG car detector.

also affect the clusters and context heatmaps in §2.6and §2.7.

2.2 Car Detection with HOG

The HOG algorithm was again selected for car de-tection due to the presence of existing implemen-tations running in near-realtime across multiple ar-

chitectures. The CPU and GPU implementations ofHOG on OpenCV were modified to use the parame-ters in (Dalal, 2006) for car detection. A FPGA ver-sion was also implemented by modifying the cfc andgfg versions in (Blair et al., 2013).

These implementations were trained on data fromthe 2012 Pascal Visual Object Classes Challenge (Ev-eringham et al., 2009). The SVM was learned as de-scribed in (Dalal and Triggs, 2005). An FPPI curveis in Figure 7, generated from all positive images inPascal-VOC at scale factor 1.05. This shows HOG-CAR is not as accurate as the pedestrian version; thisis probably due to the smaller dataset and wider vari-ation in training data. The detector is only trainedon cars, although during testing it was also possibleto detect vans and trucks. All designs (pedestrianHOG with histogram and window outputs, and carHOG with histogram outputs) were implemented onthe same FPGA, with detectors running at 160MHz.Overall resource use was 52%.

2.3 Background Subtraction with MOG

The Mixture of Gaussians (MOG) algorithm was usedto perform background subtraction and leave fore-ground objects. The OpenCV GPU version usesZivkovic’s implementation (Zivkovic, 2004). Con-tour detection was performed to generate boundingboxes as shown in Figure 8(a). As every boundingbox was passed to one or more computationally ex-pensive algorithms, early identification and removalof overlaps led to significant reductions in processingtime. Bounding boxes with ≥ 90% intersection werecompared and the smaller one was discarded; i.e. wediscard Bi if:

Bi∩B j

area(B j)≥ 0.9 & area(Bi)< area(B j) . (1)

Occasionally heavy camera shake, fast adjustment ofthe camera gain, or fast changes in lighting conditions

(a) Motion detection bybackground subtraction invideo.

(b) False motion regionsfrom camera shake and il-lumination changes.

Figure 8: Bounding box extraction from Mixture-of-Gaussians GPU implementation.

would cause large portions of the frame to be falselyindicated to contain motion, as shown in Figure 8(b).When this occurred, all bounding boxes for that framewere treated as unreliable and discarded.

2.4 Detection Merging

Object detections were generated from two sources:a direct pass over the frame by HOG, or by detec-tions on a magnified region of interest triggered bythe background subtractor. Motion regions were ex-tracted, magnified by 1.5× then passed to both HOGversions. This allowed detection of objects smallerthan the minimum window size. Candidate detectionsfrom global and motion-cued sources were filteredusing the ’overlap’ criterion from the Pascal VOCchallenge (Everingham et al., 2009): a0 = area(Bi ∩B j)/area(Bi ∪ B j) . Duplicates were removed if thetwo object classes were compatible and a0(Bi,B j) >0.45. Regions with unclassified motion were stillpassed to the tracker matcher to allow identificationof new tracks or updates of previously-classified de-tections.

2.5 Object Tracking

A constant-velocity Kalman filter was used to smoothall detections. These were projected onto a groundplane before smoothing, as shown in Figure 9.

As trajectory smoothing and detection matchingoperate on the level of abstraction of objects ratherthan pixels or features, the number of elements to pro-cess is low enough, and the computations for each oneare simple enough that this step is not considered as acandidate for parallelization.

2.6 Trajectory Clustering

The trajectory clustering algorithm used is a reimple-mentation of that described by (Piciarelli and Foresti,2006), used for detection of anomalies in traffic

Figure 9: Object tracking on an image transformed onto thebase plane. Green, blue and orange circles represent car,pedestrian and undetermined (motion-only) clusters respec-tively.

flow. The authors apply the algorithm to fast-movingmotorway traffic, whereas the i-LIDS scenes havemore object classes (pedestrians and vehicles), severalscene entrances and exits, greater opportunities forocclusion, and long periods where objects stop mov-ing and start to be considered part of the background.Starting with a trajectory Ti = (t0, t1, . . . , tn) consist-ing of smoothed detections over several frames, wematch these to and subsequently update a set of clus-ters. Each cluster Ci contains a vector of elements c jeach with a point x j,y j and a variance σ j. Clusters arearranged in trees, with each having zero or more chil-dren. One tree (starting with a root cluster) thus de-scribes a single point of entry to the scene and all ob-served paths taken through the scene from that point.For a new (unmatched) trajectory Tu, all root clustersand their children are searched to a given depth andTu is assigned to the closest C if a Euclidean distancemeasure d is below a threshold. If Tu does not matchanywhere, a new root cluster is created. For T previ-ously matched to a cluster, clusters can be extended,

Figure 10: Trajectory clustering via learned object clus-ters, re-projected onto camera plane. Green, blue and or-ange tracks represent cars, pedestrians and undetermined(motion-only) objects respectively.

(a) vx,ped (b) vy,ped (c) vx,car (d) vy,car

Figure 11: Ground-plane motion intensity maps built us-ing movement from different object classes in PV3. In (d),on-road vertical motion away from (blue) and toward thecamera (red) is distinct and clearly defined. Velocity scaleis in pixels per frame.

points updated or new child clusters split off as newpoints are added to T. As with §2.4, clustering op-erates on a relatively small number of objects and isnot computationally expensive, so was not consideredfor acceleration. Learned class-specific object clus-ters are shown in Figure 10, projected back onto thecamera plane.

2.7 Contextual Knowledge

Contextual knowledge relies on known informationabout the normal or most common actions within thescene (Robertson and Letham, 2012). Position andmotion information can capture various examples ofanomalous behaviour: for example stationary objectsin an unusual location, or vehicles moving the wrongway down a street. Unsupervised learning based onthe output of the object classifiers during training se-quences was used to learn scene context.

Type-specific information about object presenceat different locations in the base plane was captured

by recording the per-pixel location of each base-transformed bounding box. Average per-pixel veloc-ity v in x− and y−directions was obtained from theobject trackers. For an object at base plane location(x . . .x′, y . . .y′) with x−velocity vx and update rateα = 0.0002,

v(x...x′, y...y′) = (1−α)v(x...x′, y...y′)+αv . (2)

This is shown in Figure 11. For most conceivable traf-fic actions, presence and motion information is appro-priate; however, this fails to capture more complexinteractions between people. We therefore focus onbehaviour involving vehicles.

2.8 Anomaly Detection

Based on §2.7 and §2.6 we can define an anomalousobject as one which is present in an unexpected area,or one which is present in an expected area but movesin an unexpected direction or at an unexpected speed.A Bayesian framework is used to determine if an ob-ject’s velocity in the x and y directions should be con-sidered anomalous, based on the difference betweenit and the known average velocity v in that region. Wedefine the probability of an anomaly at a given pixelp(A|D), given detection of an object at that pixel:

p(A|D) =p(D|A)p(A)

p(D|A)p(A)+ p(D|A)p(A), (3)

where the prior probability of an anomaly anywherein the image, p(A), is set to a constant value. p(D|A),the likelihood of detecting an event at any pixel in thepresence of an anomaly, is constant (i.e. we assumethat an anomaly can occur with equal probability any-where within the image), and p(A) = 1− p(A).

p(D|A) is a measure based on the learned valuesfor vx or vy. It expresses the similarity of the observedmotion to mean motion at that pixel, and returns val-ues between (0.01,0.99). These are based on the dis-tance between, and relative magnitudes of, v and v:

dv =

sgn(v)max(C|v|, |v|+C), if sgn(v)

== sgn(v)&|v|> |v|

sgn(v)min(−|v|/2, |v|− c), otherwise.(4)

Here, C and c are forward and reverse-directional con-stants. dv is then used to obtain a linear equation for v,with a gradient of a = (0.01−0.99)/(dv− v). Finally,b is obtained in a similar manner, and v is projectedonto this line to obtain a per-pixel likelihood that v isnot anomalous:

y = av+b , (5)

p(D|A) = max(0.01,min(0.99,y)) . (6)

Equation 3 now gives an object anomaly measureUOx. This is repeated for y−velocity data to obtainUOy.

This is combined with UC, an abnormality mea-sure for the cluster linked to that object. When atrajectory moves from one cluster to one of its chil-dren, leaves the field of view, or is lost, the numberof transits through that cluster is incremented. Forany trajectory T matched to a cluster Cp with chil-dren Cc1 and Cc2, the number of trajectory transitionsbetween Cp and all Cc is also logged, updating a fre-quency distribution between Cc1 and Cc2. These twometrics (cluster transits and frequency distributions ofparent–to–child trajectory transitions) allow anoma-lous trajectories to be identified. If Ci is a root node,UC(Ci) = (1+transits(Ci))

−1 . Otherwise if Ci is oneof n child nodes of Cp,

UC(Ci) = 1−transitions(Cp→Ci)

n∑j=1

(transitions(Cp→C j)). (7)

An overall anomaly measure Ui for an object iwith age τ is then obtained:

Ui = woΣ

τi1 UOx

τi+wo

Στi1 UOy

τi+wcUCi , (8)

and Umax is updated via Umax = max(Umax,Ui) . TheUO measures are running averages over τ. An objecthas to appear as anomalous for a time threshold tAbefore affecting Umax. All w are set to 10, with theanomaly detection threshold set at 15; two detectorsare thus required to register an object as anomalousbefore it is logged.

3 DYNAMIC MAPPING

The mapping mechanism selects algorithm imple-mentations used to process the next frame. This runsevery time a frame is processed, allowing selectionof a new set of implementations M in response tochanging scene characteristics. M can be any com-bination of paths through the algorithms in Figure 4;Figure 1 shows system power vs. time for all suchcombinations. Ten credits are allocated between thethree priorities P used to influence the selection of M:power consumption wp, processing time wt and de-tection accuracy wε. Here we assume that a higherlevel of anomaly in the scene should be respondedto with increased processing resources to obtain moreinformation about the scene in a timely fashion, withfewer dropped frames, and at the expense of lowerpower consumption. Conversely, frames with low orzero anomaly levels cause power consumption to be

Table 2: Performance characteristics (Processing Time(ms), System Power (W) and Detection Accuracy (log-average miss rate (%))) for various algorithm implementa-tions on 770×578 video. Baseline power consumption was147W .

Algo. Impl. Time Power Accuracy(ms) (W) (%)

HOG ggg 17.6 229 52(PED) cff 23.0 190 62

gff 27.5 186 61gfg 39.0 200 59cfc 117.3 187 59ccc 282.0 191 53

HOG ggg 34.3 229 89(CAR) cfc 175.6 189 94

gfg 60.0 200 92ccc 318.0 194 89

MOG GPU 8.1 202 N/A

scaled back, at the expense of accuracy and process-ing time. Realtime processing is then maintained bydropping a greater number of frames. Prioritisationcould be either manual or automatic. When auto-matic prioritisation was used, (i.e. the system was al-lowed to respond to scene events by changing its map-ping) the speed priority was increased to maximumwhen Umax ≥ 15. A level of hysteresis was built inby maximising the power priority when Umax < 12.After every processed frame, if tprocess > FPS−1, thendtprocess×FPSe frames are skipped to regain a real-time rate.

Once a list of candidate algorithms (HOG-PED,HOG-CAR and MOG) and a set of performance pri-orities is generated, implementation mapping is run.This uses performance data for each algorithm im-plementation, shown individually for all algorithmsin Table 2. Exhaustive search is used to generate andevaluate all possible mappings, with a cost function:

Ci = wpPi +wtti +wεεi , (9)

where all w are controlled by P. Estimated power,runtime and accuracy is also generated at this point.The lowest-cost mapped algorithms are then run se-quentially to process the next frame and generate de-tections.

4 EVALUATION METHODS

Clusters and heatmaps were initialised by training ontwo 20-minute daylight clips from the i-LIDS train-ing set. Each test sequence was then processed in ap-proximately real time using this data. Clusters andheatmaps were not updated between test videos. The

homography matrices used to register each video ontothe base plane were obtained manually. Tests wererun three times and referred to as follows, using theprioritisation settings described in §3:

speed speed prioritised, auto prioritisation off;power power prioritised, auto prioritisation off;auto anomaly-controlled auto prioritisation on.

For each run, all anomalous events (defined as ob-jects having U > 15 for more than a fixed time limittA = 10 or 15 seconds) were logged and comparedwith ground-truth data. Power consumption for oneframe was estimated by averaging the energy usedto process each implementation over runtime for thatframe. Average power consumption over a sequenceof frames was obtained in the same manner.

A modified version of the i-LIDS criteria was usedto evaluate detection performance. Events count astrue positives if they are registered within 10 secondsof the start of an event. For parked vehicles, i-LIDSconsiders the start of the event to be one full minuteafter the vehicle is parked, whereas we consider thestart of the event to be when it parks. We may thusflag events before the timestamp given in the groundtruth data. The time window for matching detectionsis thus 70 or 75 seconds long. The i-LIDS criteriaonly require a binary alarm signal in the presence ofan anomaly; however, we require anomalous tracks tobe localised to the object causing the anomaly. Thedetection in Figure 12(d) is thus incorrect; the whitevan is stopped and should be registered as an anomaly,but the car on the left is flagged instead. This countsas one false-negative and one false-positive event.

5 RESULTS

i-LIDS uses the F1 score (F1 = (α+ 1)rp/(r+αp))for comparing detection performance. α is set at 0.55for real-time operational awareness (to reduce false-alarm rate), or 60 for event logging (to allow loggingof almost everything with low precision).

Table 3 shows precision, recall and F1 scores forall “parked vehicle” events in the PV3 scenario fortA = 10 and 15 seconds. Night-time sequences haveno events but still generate a large proportion of falsepositives, so this also shows results for daylight-only(“day” and “dusk” clips) separately.

Table 4 shows performance details for all threepriority modes. The t f rame column shows overallexecution time (including overheads) relative to to-tal source video length. The processing time col-umn does not include overheads. It assumes that araw video frame is already present in memory, and

Table 3: Detection performance (precision and recall) forparked vehicle events on i-LIDS PV3. F1-scores are shownfor operational awareness (OA) and event logging (EL).

Priority p r F1,OA F1,ERfor tA = 10 secondspower 0.083 0.133 0.0957 0.1317speed 0.105 0.194 0.1254 0.1913auto 0.102 0.194 0.1226 0.1912for tA = 10 seconds, daylight onlypower 0.121 0.148 0.1294 0.1475speed 0.130 0.214 0.1510 0.2118auto 0.125 0.214 0.1466 0.2115for tA = 15 secondspower 0.118 0.065 0.0910 0.0652speed 0.381 0.258 0.3259 0.2594auto 0.222 0.133 0.1797 0.1342for tA = 15 seconds, daylight onlypower 0.167 0.065 0.1067 0.0652speed 0.500 0.258 0.3752 0.2601auto 0.286 0.133 0.2030 0.1345

Table 4: Processing performance for all prioritisationmodes, showing percentage of frames skipped, ( f skip), pro-cessing time including (t f rame) and excluding overheads(twork) compared to source frame time (tsrc), and mean esti-mated power above baseline (P∗work). Idle/baseline power is147W.

Priority f skip t f rame/ twork/ P∗work(%) tsrc (%) tsrc (%) (W)

power 75.8 125.4 87.9 49.1speed 59.5 124.3 81.7 72.8auto 66.5 127.8 83.4 61.9daylight onlypower 75.5 125.4 87.9 49.1speed 60.1 124.2 82.0 78.4auto 66.0 122.0 83.4 61.8

frame-by-frame video output is not required (i.e. onlyevents are logged). In this case, the system runs fasterthan realtime, with the percentage of skipped framesshown in the f skip column. The slower power-optimised priority setting causes more frames to beskipped.

Figure 12 shows example detections logged af-ter tA seconds. While some true positives are de-tected (up to 50% in the best case), many false nega-tives and false positives are present, and have variouscauses. False positive are often caused by the back-ground subtractor erroneously identifying patches ofroad as foreground, caused by the need to acquireslow-moving or waiting traffic in the same region.False negatives are, in general, caused by poor perfor-mance of the object classifiers or background subtrac-tor. Directly failing to detect partially occluded ob-

(a) TP (b) TP

(c) FN (d) FN, FPFigure 12: True detections and failure modes of anomalydetector on i-LIDS PV3. (a), (b): true positives in varyinglocations. (c): false negative caused by occlusion. (d) istreated as a false negative and a false positive as the detectoridentifies the car on the left instead of the van parked besideit.

(a) High-quality video (b) i-LIDS videoFigure 13: Dataset quality impacts quality of detections: ina separate video (a), video quality allows classification ofmost objects as either pedestrian (blue circle) or car (greencircle). In i-LIDS, (b), detections often remain as uncate-gorised motion (orange circle).

jects (as in Figure 12(c)), or stationary, repeated falsedetections in regions overlapping the roadside whichare generated during training can both cause anoma-lies to be missed. Poorer video quality also reducesdetection accuracy, as shown in Figure 13.

6 DISCUSSION

Given that the system must run in real-time (as wedrop frames to ensure a close-to-realtime rate), themain tradeoffs available here are accuracy and power;optimising for time allows more frames to be pro-cessed, which in combination with the natural in-creased detection accuracy of the ggg detectors, in-creases p and r. This is borne out by the data in Ta-ble 3.

45 50 55 60 65 70 75 800

0.1

0.2

0.3

0.4

system power consumption above baseline (W)

F 1-s

core

(α=

0.55)

power, tA = 10 sec power, tA = 10 sec, daylightpower, tA = 15 sec power, tA = 15 sec, daylightspeed, tA = 10 sec speed, tA = 10 sec, daylightspeed, tA = 15 sec speed, tA = 15 sec, daylightauto, tA = 10 sec auto, tA = 10 sec, daylightauto, tA = 15 sec auto, tA = 15 sec, daylight

Figure 14: F1−scores for operational awareness (α = 0.55)against power consumption, for various time thresholds.

The key figures in Table 4 are mean esti-mated power consumption above baseline; there is29W range in average power consumption betweenhighest-power and lowest-power priority settings.When the system was run with speed prioritised withthe FPGA turned off, power consumption was 208W ,or 62W above baseline. The average power forauto-prioritised mode with the FPGA on was 61.9W ,but this is dependent on the dataset used; datasetswith fewer moving objects would have a lower av-erage power consumption than this. As Figure 14shows, running the system in automatic prioritisationmode allows an increase in accuracy of 10% over thelowest-power option for a cost of 12W in averagepower consumption. A further 17% gain in F1−score(from auto to speed) costs an extra 17W above base-line. These results show a clear relationship betweenpower consumption and overall detection accuracy.

6.1 Comparison to State-of-the-Art

Some previous work has considered four clips madepublicly available from i-LIDS, known as AVSS andcontaining 4 parked-vehicle events, classed as easy,medium, hard and night. The only work we are awareof which evaluates the complete i-LIDS dataset is (Al-biol et al., 2011). Using spatiotemporal maps andmanually-applied lane masks per-clip to denote areasin which detections are allowed, Albiol et al. are ableto improve significantly on the precision and recallfigures given above, reaching p− and r−values of 1.0for several clips in PV3. They do not provide perfor-mance information but note that they downscale theimages to 240× 320 to decrease evaluation time. Intheir work they discuss the applicability and limita-tions of background subtractors for detecting slow-moving and stopped objects, as well as discussingthe same difficulties with the i-LIDS data previously

seen in Figure 13. As in §1, we also consider real-time implementation and power consumption, with anend goal of a fully automatic system. Unlike Albiolet al.’s requirement for manual operator intervention,we only need to re-register videos between clips toovercome camera movement, which could be doneautomatically. If we only consider our detector ac-curacy compared to those of other researchers work-ing on this data, we cannot improve upon existing re-sults. However, we are tackling the novel problemof power-aware anomaly detection rather than offlinelane masking, and any accuracy measurement mustbe traded against other characteristics, as Figure 14shows. The most obvious way to improve these re-sults would be to use the currently best-performingobject classifiers, but implementing multiple hetero-geneous versions of these would take prohibitivelylong to develop.

7 CONCLUSION

We have described a real-time system for performingpower-aware anomaly detection in video, and appliedit to the problem of parked vehicle detection. We areable to select between various algorithm implementa-tions based on the overall object anomaly level seenin the image, and update this selection every frame.Doing this allows us to dynamically trade off powerconsumption against detection accuracy, and showsbenefits when compared to fixed power- and speed-optimised versions. Although the accuracy of our de-tection algorithms (HOG) alone is no longer state-of-the-art, future improvements include introduction ofmore advanced object classifiers (such as (Benensonand Mathias, 2012)) and a move to a mobile chipset,which would reduce idle power consumption whilemaintaining both speed and accuracy, with the trade-off of increased development time.

ACKNOWLEDGEMENTS

We gratefully acknowledge the work done by ScottRobson at Thales Optronics for his FPGA implemen-tation of the histogram generator for the CAR-HOGdetector.This work was supported by the Institute for SystemLevel Integration, Thales Optronics, the Engineer-ing and Physical Sciences Research Council (EPSRC)Grant number EP/J015180/1 and the MOD Univer-sity Defence Research Collaboration in Signal Pro-cessing.

REFERENCES

Albiol, A., Sanchis, L., Albiol, A., and Mossi, J. M. (2011).Detection of Parked Vehicles Using SpatiotemporalMaps. IEEE Transactions on Intelligent Transporta-tion Systems, 12(4):1277–1291.

Bacon, D., Rabbah, R., and Shukla, S. (2013). FPGA Pro-gramming for the Masses. Queue, 11(2):40.

Benenson, R. and Mathias, M. (2012). Pedestrian detectionat 100 frames per second. In Computer Vision andPattern Recognition (CVPR), 2012 IEEE Conferenceon, pages 2903–2910.

Blair, C., Robertson, N. M., and Hume, D. (2013). Char-acterising a Heterogeneous System for Person Detec-tion in Video using Histograms of Oriented Gradi-ents: Power vs. Speed vs. Accuracy. IEEE Journalon Emerging and Selected Topics in Circuits and Sys-tems, 3(2):236–247.

Bouganis, C.-S., Park, S.-B., Constantinides, G. A., andCheung, P. Y. K. (2009). Synthesis and Optimizationof 2D Filter Designs for Heterogeneous FPGAs. ACMTransactions on Reconfigurable Technology and Sys-tems, 1(4):1–28.

Cope, B., Cheung, P. Y., Luk, W., and Howes, L. (2010).Performance Comparison of Graphics Processors toReconfigurable Logic: A Case Study. IEEE Transac-tions on Computers, 59(4):433–448.

Dalal, N. (2006). Finding People in Images andVideos. PhD Thesis, Institut National Polytechniquede Grenoble / INRIA Grenoble.

Dalal, N. and Triggs, B. (2005). Histograms of orientedgradients for human detection. In Computer Visionand Pattern Recognition, 2005. CVPR 2005., pages886– 893. IEEE Computer Society.

Everingham, M., Gool, L., Williams, C. K. I., Winn, J.,and Zisserman, A. (2009). The Pascal Visual ObjectClasses (VOC) Challenge. International Journal ofComputer Vision, 88(2):303–338.

Home Office Centre for Applied Science and Technology(2011). Imagery Library for Intelligent Detection Sys-tems (I-LIDS) User Guide. UK Home Office, 4.9 edi-tion.

Piciarelli, C. and Foresti, G. (2006). On-line trajectory clus-tering for anomalous events detection. Pattern Recog-nition Letters, 27(15):1835–1842.

Robertson, N. and Letham, J. (2012). Contextual PersonDetection in Outdoor Scenes. In Proc. European Con-ference on Signal Processing (EUSIPCO 2012).

Yu, Y. and Prasanna, V. (2002). Power-aware resource al-location for independent tasks in heterogeneous real-time systems. In Ninth International Conference onParallel and Distributed Systems, 2002. Proceedings.,pages 341–348. IEEE Comput. Soc.

Zivkovic, Z. (2004). Improved adaptive Gaussian mixturemodel for background subtraction. In Proceedings ofthe 17th International Conference on Pattern Recog-nition, 2004. ICPR 2004., volume 2, pages 28–31.IEEE.

Event-Driven Dynamic Platform Selection for Power-Aware...

Documents

Transcript of Event-Driven Dynamic Platform Selection for Power-Aware...