The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis...

The LINDI ProjectLinking Information for New

Discoveries

UIs for building and reusing hypothesis seeking strategies.

Statistical language analysis techniques for extracting propositions

Two Main Thrusts:

LINDI: Target Components

1. Special UI for retrieving appropriate docs

2. Language analysis on docs to detect causal relationships between concepts

3. Probabilistic representation of concepts and relationships

4. UI + User: Hypothesis creation

Design Goals of LINDI UI

Support for the development of extended search strategies1. Text filtering and manipulation tool

to help the development of strategies2. Text visualization and analysis tool

to help the formulation of hypotheses

The User Interface A general search interface should

support– History– Context– Comparison– Operators: Intersection, Union, Slicing– Operator Reuse– Visualization (where appropriate)

We have an initial implementation It needs lots of work

Scenario: Explore Functions of a Gene

Objective– Determine the functions of a newly

sequenced Gene X. Known facts

– Gene X co-expresses (activated in the same cell) with Gene A, B, C

– The relationship of Gene A, B, C with certain types of diseases (from medical literature)

Question– What types of diseases are Gene X related

to?

Medical Literature

Explore Functions of New Gene X

Possible FunctionFor Gene-X

Gene-A

Key

wo

rds

Key

wo

rds

Gene-B

Keywords

Keywords

Slide adapted from K. Patel

Slicing

Gene-C

Key

wo

rds

Projection

Keywords

Intersection

Mapping

Query

Query

Architecture of LINDI UI

Data Layer Annotation Layer User Interface Layer

Data Layer Purpose

– Hide different formats of text collections Components

– Data: Abstractions representing records of a text collection

– Operations: performed on the data Data

– A set of records– Each record is a set of tuples with types

Operations– union, intersection, projection, mapping

Annotation Layer

Purpose– Associate data set with operations

that produced them (history)– History is a first class object

Advantage– Streamline a sequence of operations– Reuse operations– Parameterize operations

User Interface

This version completed Aug 10, 2000– Designed by Marti Hearst and Hao Chen– Code written by Hao Chen

Direct manipulation of information objects and access operations– Query– Intersection– Union– Mapping– Slicing

Record and reuse of past operations Parameterization of operations Streamlining of operations

Initial Palette

Query Structure Determined by Collection Type

Query Operation Results

Projection Operation and Subsequent Results

Parameterized Query: Repeat operations with different values

GC

GB

GA

Intersection over Projected Attribute

Example Interaction with UI Prototype

1 Query on Gene names2 Project out only mesh headings3 Intersect the results4 Map to create a ranking5 Slice out the top-ranked.

Second Version of UI LINDI Miner Circa May 2002

– Designed by Marti Hearst– Implemented by Melody Ivory

Emphasize reusing results of prior text analysis

See lindi-miner.ppt

The Language Analysis Component

Goal: Extract Propositions from Textand Make Inferences

Why Extract Propositions from Text?– Text is how knowledge at the

propositional level is communicated– Text is continually being created and

updated by the outside world

Example: Etiology

Given – medical titles and abstracts– a problem (incurable rare disease)– some medical expertise

find causal links among titles– symptoms– drugs– results

Traditional Semantic Grammars

Example (Burton & Brown 79)

– Interpreting “What is the current thru the CC when the VC is 1.0?”

<request> := <simple/request> when <setting/change><simple/request> := what is <measurement><measurement> := <meas/quant> <prep> <part><setting/change> := <control> is <control/value><control> := VC

– Resulting semantic form is:(RESETCONTROL (STQ VC 1.0) (MEASURE CURRENT CC))

Example:Statistical Semantic

Grammar To detect causal relationships

between medical concepts– Title:

Magnesium deficiency implicated in increased stress levels.

– Interpretation: <nutrient><reduction> related-to

<increase><symptom>

– Inference:» Increase(stress, decrease(mg))

Statistical Semantic Grammars

Empirical NLP has made great strides– But mainly applied to syntactic structure

Semantic grammars are powerful, but– Brittle – Time-consuming to construct

Idea:– Use what we now know about statistical NLP

to build up a probabilistic grammar

The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis...

Documents

Transcript of The LINDI Project Linking Information for New Discoveries UIs for building and reusing hypothesis...