Star Challenge – multimedia search competition 2008
NUS.SIGIR groupLuong Minh Thang & Zhao Jin
WING group meeting – 12 Sep, 2008
04/21/23 1
Agenda
• About StarChallenge• Approaches
– Audio system– Video system
• Results
04/21/23 2
Let’s start with a clip on Tai Chi!
The Star Challenge
• International Competition organized by Singapore A*STAR
• Focus on Multimedia Search by Voice and Video
• Prize: – Free Trip to Singapore (blah!) – USD 100,000 (!!!)
The Tasks
• Voice Search– AT1: Search by IPA (International Phonetic Alphabet)– AT2: Search by Example– AT3: Search for recurrent voice segments
• Video Search– VT1: Search by (single) Query Image– VT2: Search by Video Shot– VT3: Scene/Event Categorization
AT3 and VT3 replaced by integrated search in the end
Timeline
• Mar 31: Registration Deadline– Registered as adMIRer– 5 members from NUS-SIGIR – 56 teams registered in total
• June 18: 1st Knockout Round– AT1+AT2– 8 Teams qualified
Timeline
• July 18: 2nd Knockout Round– VT1+VT2– 7 Teams qualified
• September 4: Qualifying Race– All four tasks with Integrated Search– Only 5 Teams would qualify
• October 23: Grand Final– On-site evaluation
Audio system – general approach
• Use MFCC - well reflects speech• Use local alignment to align 2 sequences of
audio & query• Using spectrogram, we cut up long audio into
small segments for better matching. Short demo
04/21/23 8
Audio system – system overview
04/21/23 9
Test audio files
Speech recognizer
Audio feature extractor
Query audio files
Query-test similarity matrix
Index dataQuery text
Query MFCC vectors
Lucene indexing
Test MFCC vectors
Test text
Alignment & matching Lucene matching
Results
Heuristic fusion
Audio system – Handle IPA
• " i n t r ^ s t r ei t”: IPA query• Translate to CMU phonemes:
IH N T R AH S T R EY T
• INTEREST: IH N T R AH S T• RATE: R EY T• Query text: input to text
module directly synthezied to audio file for audio module
04/21/23 10
i n t r ^ s ei a: @ o:IH N T R AH S EY AA AE AO
au ai b tS d TH e e: ei auAW AY B CH D DH EH ER EY AW
Audio system – overall performance
• Not have complete statistics yet, but AT2 (query by example) ~ 30-40% MAP, AT1 ~ 10 %
• Let’s listen to a few queries …
04/21/23 11
SQ017.wav SQ029.wavSQ023.wav SQ036.wav
Video system – VT1 categories• 1. Crowd (>10 people) • 2. Building with sky as backdrop,
clearly visible • 3. Mobile devices including
handphone/PDA • 4. Flag • 5. Electronic chart, e.g. stock charts,
airport departure chart • 6. TV chart Overlay, including graphs,
text, powerpoint style • 7. Person using Computer, both visible
• 8. Track and field, sports • 9. Company Trademark, including
billboard, logo • 10. Badminton court,
04/21/23 12
• 11. Swimming pool, sports • 12. Closeup of hand, e.g. using
mouse, writing, etc • 13. Business meeting (> 2 people),
mostly seated down, table visible • 14. Natural scene, e.g. mountain,
trees, sea, no pple • 15. Food on dishes, plates • 16. Face closeup, occupying about
3/4 of screen, frontal or side • 17. Traffic Scene, many cars, trucks,
road visible • 18. Boat/Ship, over sea, lake • 19. PC Webpages, screen of PC
visible • 120. Airplane
Video system - examples
04/21/23 13
16. Face closeup
2. Building with sky backdrop
9. Company trademark
3. Mobile devices
Video system – VT2 categories• 1. People entering/exiting door/car • 2. Talking face with introductory caption • 3. Fingers typing on a keyboard • 4. Inside a moving vehicle, looking outside • 5. Large camera movement, tracking an
object, person, car, etc • 6. Static or minute camera movement,
people(s) walking, legs visible • 7. Large camera movement, panning
left/right, top/down of a scene • 8. Movie ending credit • 9. Woman monologue • 10. Sports celebratory hug
04/21/23 14
Video system – general approach
04/21/23 15
classifiers
Classified cateogry
Test files
Category filteringQuery category
Filtered test files
MatchingQuery file
Matched test files
Video system - Training data size
04/21/23 16
Category Size
1 15
2 47
3 33
4 6
5 5
6 18
7 30
8 3
9 23
10 3
Category Size
11 21
12 130
13 11
14 21
15 3
16 42
17 6
18 10
19 43
20 3
• Dev = 10% labelled data, Train = 90% labelled data• Size varies significantly across different categories
Development data statistics
Train key frames + categories
Layout extractorEdge extractor Face detectorColor extractor
Color classifier Face classifierEdge classifier Layout classifier
Color histogram (HSV, RGB)
Segmentation info
Num faces, size, positionsEdge histogram
Dev key frames
Multi-class SVM training
Color recall /categories
Layout recall /categories
Facerecall /categories
Edge recall /categories
Video system – classifier training
Uses as weights
04/21/23 18
face edge layout hsv rgb lab1 0.02
2 0.21 0.61
3 1 0.15
456 0.77 0.55 0.41 0.33
7 0.26 0.83
89 0.26 0.17 0.4 0.5
10 0.33 1
11 0.76 0.71 0.81 0.35
12 0.3 0.57
13 0.27 0.33
14 0.14 0.52 0.25 0.18
1516 0.23 0.16 0.11 0.14
1718 0.18
19 0.34 0.48 0.34 0.28
20
Classifer recall/categories
• Uses as weights when fusing all different classifier
• No miror analysis & n-fold testing yet
Color histogram (HSV, RGB)
Segmentation info
Num faces, size, positionsEdge histogram
motion histogram; camera & object
motion
Test Key frames
Classifier merger (weights from dev data)
Color classifier Face classifierEdge classifier Layout classifier
Video system – Category filtering & Matching
Layout extractorEdge extractor Face detectorColor extractorMotion
extractor
Test video
Category filtering
Query category
Filtered key frames
Heuristic category filtering
Filtered video
Matching
Query video/frames
Results
Video system – motion 1
04/21/23 20
Camera: panning left Camera: panning up
Object motion: moving Object motion: static
Video system – motion 2
04/21/23 21
• Check if most vector ~ 0 static motion• Otherwise, filter all small motion vectors• Categories motion vectors into circle bins • histogram. + main vector motion• If main vector motion dominates camera
motion panning left, right, up, down• To detect zooming, find a focus block/point• Object motion is derived after removing
camera motion
Conclusion
• We have built up a full-function system within a short time and in an ad-hoc manner
• There are plenty of place for performance improvement and detailed analysis.
04/21/23 22
Q & A?
•Thank you !!!
04/21/23 23
Top Related