HYP Progress Update By Zhao Jin. Outline Background Progress Update.

19
HYP Progress Update By Zhao Jin

Transcript of HYP Progress Update By Zhao Jin. Outline Background Progress Update.

HYP Progress Update

By Zhao Jin

Outline

• Background

• Progress Update

Background

• Query (Text-based)– The set of keywords to be entered into the

system to retrieve the desired information or resources

– Main category• Traditional IR • Web (ex. Google)• OPAC (ex. LINC)• Video (ex. TRECVID)

Background

• Query Analysis– To analyze the pattern and hidden information

in the queries

– To efficiently classify and support such queries.

Progress update

• Mid-May to Early June– Background reading– Around 30 to 40 papers on various topic– Summarizing of key points in the paper

Progress update

• Mid-June to late-June– Log analysis

• BBC Video Query• NUS OPAC Query

– Background reading on OPAC and TRECVID

Progress update

• July to now– Follow up on two main topics

• Query classification and division on content-based and feature-based keywords (OPAC)

• Identifying ASR-oriented keywords in a video query (TRECVID)

– Background reading on MARC, wordnet and LOC subject heading

Progress update

• Plan for the near future– Refine and experiment with the current ideas

– Log analysis

– Background reading (Textbook & Related paper)

– Preparation for implementation

Q&A?

End of progress update

• Thank you for your attention!

Two types of keywords

• Content-Based Keyword (CBK)– The keywords that concern what the item is

about– Ex. title, subject heading, etc

• Feature-Based Keyword (FBK)– The keywords that concern the features of the

item.– Ex. author, publisher, genre, medium

Benefits

• Benefits:– Faster retrieval – More precise retrieval– Help in relevance ranking

Possible implementation

• Possible implementation: – term co-occurrence for concept division

– list of special words and machine learning for FBK and CBK division

– wordnet for classification among CBKs

Possible implementation

• Possible implementation: – CL and IL search algorithms for actual

searching with CBKs.

– list of special words and machine learning for classification among FBKs.

– Marc record search algorithms for actual searching with FBKs.

Back

Means to retrieve shots

• Example:– To find shots of “Bill Clinton”

• Face recognition

• Closed-caption

• Automatic Speech Recognition (ASR)

Metrics

• Common VS Special (In reality) – How common in reality is the concept

represented by the keyword.

• Generic VS Specific – How generic is the concept represented by the

keyword.

Metrics

• Concrete VS Abstract – Whether the keyword represented is concrete

or abstract

• Topic frequency (Low VS High) – How often the keyword becomes (closely

related to) a topic.

Metrics

• Formal VS Informal – Whether the keyword is in formal or informal

language

• Written VS spoken – Whether the keyword is in spoken or written

language

Metrics

• Feature-level VS Content-level – Whether the keyword is about the feature of

the video (ex. camera motion) or the content of the video

Back