Taking up the Gaokao Challenge: An Information Retrieval Approach

13
Taking up the Gaokao Challenge: An Information Retrieval Approach Gong Cheng , Weixi Zhu, Ziwei Wang, Jianghui Chen, Yuzhong Qu National Key Laboratory for Novel Software Technology Nanjing University, China Websoft

Transcript of Taking up the Gaokao Challenge: An Information Retrieval Approach

Page 1: Taking up the Gaokao Challenge: An Information Retrieval Approach

Taking up the Gaokao Challenge:

An Information Retrieval Approach

Gong Cheng, Weixi Zhu, Ziwei Wang, Jianghui Chen, Yuzhong Qu

National Key Laboratory for Novel Software TechnologyNanjing University, China

Websoft

Page 2: Taking up the Gaokao Challenge: An Information Retrieval Approach

What is Gaokao?• National Higher Education Entrance Examination,

a.k.a. China’s SAT, but• being more difficult• testing more subjects

• Chinese literature, Mathematics, English language, and• Humanities (History, Geography, Political Education)

or Natural Sciences (Physics, Chemistry, Biology)

Page 3: Taking up the Gaokao Challenge: An Information Retrieval Approach

Why is Gaokao difficult for the computer?• Questions in Gaokao challenge QA systems:

• multiple sentences to understand• domain-specific expressions to parse (e.g., quotes, formulas, maps)• unspecified, seemingly unbounded resources to search

Page 4: Taking up the Gaokao Challenge: An Information Retrieval Approach

Overview of the Approach

Recollecting Relevance Knowledge

Drawing Evidence ReasoningHuman:

Page 5: Taking up the Gaokao Challenge: An Information Retrieval Approach

Overview of the Approach

Recollecting Relevance Knowledge

Drawing Evidence ReasoningHuman:

Computer:

Page 6: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 1: Retrieving Pages• Retrieving Concept Pages

(Step 1/2: leftmost longest title matching)The Effectiveness of Confucianism chapter of the Xunzi said:

(The King of Zhou) ruled the land and founded seventy-one (feudal) states, of which fifty-three (governors) were from the Ji family.

It shows that in the Zhou Dynasty, feoffments were mainly granted to relatives. The King of Zhou would invest those relatives with

Page 7: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 1: Retrieving Pages• Retrieving Concept Pages

(Step 2/2: context-based disambiguation)

Feudalism may refer to:• Feudalism (in China), existed during the Shang and the Zhou dynasty, and was replaced by

centralization of authority during and after the Qin dynasty…• Feudalism (in Europe), prevailed in the Middle Ages from the 5th to the 15th century…• Feudalism (in Japan), originated from ritsuryo in the Heian period from the 8th to the 12th

century…

The Effectiveness of Confucianism chapter of the Xunzi said:(The King of Zhou) ruled the land and founded seventy-one (feudal) states, of which fifty-three (governors) were from the Ji family.

It shows that in the Zhou Dynasty, feoffments were mainly granted to relatives. The King of Zhou would invest those relatives with

matching a disambiguation page Context helps disambiguation.

Page 8: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 1: Retrieving Pages• Retrieving Quote Pages

(exact content match)The Effectiveness of Confucianism chapter of the Xunzi said:

(The King of Zhou) ruled the land and founded seventy-one (feudal) states, of which fifty-three (governors) were from the Ji family.

It shows that in the Zhou Dynasty, feoffments were mainly granted to relatives. The King of Zhou would invest those relatives with

matching the content of a page

a quote

Page 9: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 2: Ranking and Filtering Pages• Centrality-based Ranking

(centrality = cosine similarity)

• Three kinds of vector space:• words in a page• links in a page• categories of a page

center of retrieved pages

Page 10: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 2: Ranking and Filtering Pages• Domain-based Filtering

(within historical categories,i.e., categories whose names contain the word history, and their descendant categories)

• Relevance-based Ranking(relevance to the stem and options)

Page 11: Taking up the Gaokao Challenge: An Information Retrieval Approach

Stage 3: Assessing Options• truth of an option

= extent to which question/pages can entail it= extent to which pages from the stem can entail it+ extent to which pages from it can entail the stem(entailment = relevance measurement)

Page 12: Taking up the Gaokao Challenge: An Information Retrieval Approach

Experiments• Real-life questions in Gaokao or mock Gaokao• QS-A: 123 questions, answerable based on Wikipedia• QS-B: 454 questions, out of the scope of Wikipedia

QS-A QS-B43.09% 31.28%

Correctly answered questions

Page 13: Taking up the Gaokao Challenge: An Information Retrieval Approach

Details can be found in our poster!