Automatic Annotation of text for Bibliometrics...
Transcript of Automatic Annotation of text for Bibliometrics...
Automatic Annotation of text for Bibliometrics Use
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov
University of Paris-SorbonneLaLICC Laboratory
FRANCE
May 12, 2006
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Problematic
What is the basic problem we are trying to solve?Bibliometrics means literally ”book measurement”.How to proceed to qualify relations between authors?
What was the technical solution strategy?We use Contextual Exploration developed by LaLICClaboratory.
What were the basic elements of the research andprogram approach?Identify and categorize automatically semantic relations byusing EXCOM platform (Multilingual COntextualEXploration).
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Problematic
What is the basic problem we are trying to solve?Bibliometrics means literally ”book measurement”.How to proceed to qualify relations between authors?
What was the technical solution strategy?We use Contextual Exploration developed by LaLICClaboratory.
What were the basic elements of the research andprogram approach?Identify and categorize automatically semantic relations byusing EXCOM platform (Multilingual COntextualEXploration).
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Problematic
What is the basic problem we are trying to solve?Bibliometrics means literally ”book measurement”.How to proceed to qualify relations between authors?
What was the technical solution strategy?We use Contextual Exploration developed by LaLICClaboratory.
What were the basic elements of the research andprogram approach?Identify and categorize automatically semantic relations byusing EXCOM platform (Multilingual COntextualEXploration).
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Definition
Pritchard, 1969
The definition and purpose of bibliometrics is to shed light on theprocess of written communications ... by means of counting andanalyzing the various facets of written communication.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Laws of Bibliometrics
Lotka’s Law
Lotka’s Law describes the frequency of publication by authors in agiven field.
Bradford’s Law
Bradford’s Law serves to librarians in determining the number ofcore journals in any given field.
Zipf’s Law
Zipf’s Law is often used to predict the frequency of words within atext.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
but ...
However, these treatments ignore the qualitative meanings ofthe bibliography references. For example, authors are”ranked” by how many of her articles have been publish
Bibliographic coupling, co-citation, data mining, ImpactFactor...
Specific search engine like CiteSeer
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
but ...
However, these treatments ignore the qualitative meanings ofthe bibliography references. For example, authors are”ranked” by how many of her articles have been publish
Bibliographic coupling, co-citation, data mining, ImpactFactor...
Specific search engine like CiteSeer
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
but ...
However, these treatments ignore the qualitative meanings ofthe bibliography references. For example, authors are”ranked” by how many of her articles have been publish
Bibliographic coupling, co-citation, data mining, ImpactFactor...
Specific search engine like CiteSeer
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
ProblematicDefinitionLaws of BibliometricsBibliometricProtocol
Protocol
Use FA to find bibliographic references
Indicator allows us to locate textual segment
EXCOM annotate automatically and semantically this textualsegment
We localize automatically bibliographic references to find textualsegment with interesting information
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Linguistic approach
Now, we have to determinate which type of relation could befound inside scientific articles. This work was done by J.P. Desclesand Y. Krushkov in 2005.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Categorization
Point of view : The first category is the point of view. It isused extensively in the corpus. Using the point of view, wecan assert our opinion on the question.
Comparison : The second category which we are interested inis the comparison. We often compare the works of theresearchers. If the comparison is neutral, we have to use thecontextual exploration to determine if there is a case ofresemblance or disparity.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Categorization
Point of view : The first category is the point of view. It isused extensively in the corpus. Using the point of view, wecan assert our opinion on the question.
Comparison : The second category which we are interested inis the comparison. We often compare the works of theresearchers. If the comparison is neutral, we have to use thecontextual exploration to determine if there is a case ofresemblance or disparity.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Categorization
Information : The category of information is large. Itincludes subcategories such as the hypothesis, the analysisand the result.
Definition : The sentences which contain definitions areimportant.
Appreciation : In this category, the author gives judgementin a positive or negative way about another author.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Categorization
Information : The category of information is large. Itincludes subcategories such as the hypothesis, the analysisand the result.
Definition : The sentences which contain definitions areimportant.
Appreciation : In this category, the author gives judgementin a positive or negative way about another author.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Categorization
Information : The category of information is large. Itincludes subcategories such as the hypothesis, the analysisand the result.
Definition : The sentences which contain definitions areimportant.
Appreciation : In this category, the author gives judgementin a positive or negative way about another author.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
CategorizationTree Representation
Tree of categorization
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
ISO 690:1987 Documentation – Bibliographic references –Content, form and structureStandardized:
Numerical Type [1]
Alpha-numerical Type [AUT90], [AUT 97], [AUT 94, AUT96a], [AUT 96b], [AUT’99]
Another Type (Author et al., 1997, Author et al., 1999)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
ISO 690:1987 Documentation – Bibliographic references –Content, form and structureStandardized:
Numerical Type [1]
Alpha-numerical Type [AUT90], [AUT 97], [AUT 94, AUT96a], [AUT 96b], [AUT’99]
Another Type (Author et al., 1997, Author et al., 1999)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
ISO 690:1987 Documentation – Bibliographic references –Content, form and structureStandardized:
Numerical Type [1]
Alpha-numerical Type [AUT90], [AUT 97], [AUT 94, AUT96a], [AUT 96b], [AUT’99]
Another Type (Author et al., 1997, Author et al., 1999)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
Unstandardized:
Author (1968,1976,1982)
Author (1958: 274-275)
(Author1, 1944, p.311 ; Author2,1956, p.316)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
Unstandardized:
Author (1968,1976,1982)
Author (1958: 274-275)
(Author1, 1944, p.311 ; Author2,1956, p.316)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
International Standards
Unstandardized:
Author (1968,1976,1982)
Author (1958: 274-275)
(Author1, 1944, p.311 ; Author2,1956, p.316)
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
International StandardsFA et bibliographic references
FA et bibliographic references
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Platform
A major objective for EXCOM system developped by LaLICCLaboratory is to explore the semantics of text for enhancinginformation extraction and retrieval through automatic annotationof semantic relations.
1 Regex: Annotation of low level which allows the recognition ofsome complex textual expressions which can be described by afinite state machine.
2 Structure: Use of annotations as linguistic markers.
3 Contextual Exploration: It is in this module that ourproposition shows how to extend the XML technology and howit can be used in a system of semantic annotation of texts.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Platform
A major objective for EXCOM system developped by LaLICCLaboratory is to explore the semantics of text for enhancinginformation extraction and retrieval through automatic annotationof semantic relations.
1 Regex: Annotation of low level which allows the recognition ofsome complex textual expressions which can be described by afinite state machine.
2 Structure: Use of annotations as linguistic markers.
3 Contextual Exploration: It is in this module that ourproposition shows how to extend the XML technology and howit can be used in a system of semantic annotation of texts.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Platform
A major objective for EXCOM system developped by LaLICCLaboratory is to explore the semantics of text for enhancinginformation extraction and retrieval through automatic annotationof semantic relations.
1 Regex: Annotation of low level which allows the recognition ofsome complex textual expressions which can be described by afinite state machine.
2 Structure: Use of annotations as linguistic markers.
3 Contextual Exploration: It is in this module that ourproposition shows how to extend the XML technology and howit can be used in a system of semantic annotation of texts.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Architecture
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules
Developed andimplemented rules forautomatic annotationof categories
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules : Method subcategory annotation
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
EXCOM Rules
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
Result
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
Result
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
Result
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
EXCOM PlatformEXCOM ArchitectureEXCOM RulesApplication
Result
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
New concept: bibliosemantic
Bibliometric BibliosemanticApproach Statistical Linguistic
Methode Frequency Semantic
Word Textual segment
Result Quantitative Qualitative
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
Summary
1 Use FA to find bibliographic references
2 Bibliographic references, call indicator, are use to locatetextual segment
3 Categorize semantically and automatically textual segmentusing EXCOM platform
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
Summary
1 Use FA to find bibliographic references
2 Bibliographic references, call indicator, are use to locatetextual segment
3 Categorize semantically and automatically textual segmentusing EXCOM platform
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
Summary
1 Use FA to find bibliographic references
2 Bibliographic references, call indicator, are use to locatetextual segment
3 Categorize semantically and automatically textual segmentusing EXCOM platform
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
And next ?
1 Enlarging size of corpus
2 Enlarging language of corpus
3 Establish Semantic Authors Citation Network
4 Establish categorization of scientific publication
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
And next ?
1 Enlarging size of corpus
2 Enlarging language of corpus
3 Establish Semantic Authors Citation Network
4 Establish categorization of scientific publication
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
And next ?
1 Enlarging size of corpus
2 Enlarging language of corpus
3 Establish Semantic Authors Citation Network
4 Establish categorization of scientific publication
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
And next ?
1 Enlarging size of corpus
2 Enlarging language of corpus
3 Establish Semantic Authors Citation Network
4 Establish categorization of scientific publication
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
Bibliography
Descles, J.P.Exploration contextuelle et semantique : un systeme expertqui trouve les valeurs semantiques des temps de l’indicatifdans un texte, p.371-400Knowledge modeling and expertise transfert, 1991.
Descles, J.P.Systeme d’exploration contextuelle, p.215-232Co-texte et calcul du sens, 1997.
Krushkov, Y.L’exploration contextuelle des appariements entre lesreferences bibliographiques et les passages textuels dans uncorpus de textes linguistiques.Master, University of Paris IV Sorbonne, 2005.
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use
IntroductionPart I: Linguistic approachPart II: Technical approach
Part III: Informatic implementationConclusion
New conceptSummaryBibliographyQuestions?
I am hungry, not you ?So, small and fast questions.It’s time for lunch !
M.Bertin, J.P. Descles, B. Djioua and Y. Krushkov Automatic Annotation of text for Bibliometrics Use