2nd "NameGame" APE-INV workshop 1
Mr. JOTL: A User Friendly Matching Software
Stéphane Lhuillery, Julio Raffo & Fernando Lladós
December 2010
2nd "NameGame" APE-INV workshop 2
Outline
• Background• Objectives & Rationale• Results• User Friendly Software
– Concept– Alpha test
• Further steps
December 2010
2nd "NameGame" APE-INV workshop 3
Background
• Automatic patent retrieval is becoming compulsory due to the size of data sets.
• Growing literature looking at this NameGame:– On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall,
2006; Thoma et al. 2007.
– On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc.
• Our ESF Project outcomes:– New matching best practices
– APE-INV database
December 2010
2nd "NameGame" APE-INV workshop
Minimize False positive(=higher precision)
Minimize False negative(=higher recall)
Objectives of the NameGame
December 2010 4
?
MaximizingTrue positives
2nd "NameGame" APE-INV workshop
Rationale behind: A three step game
December 2010 5
2nd "NameGame" APE-INV workshop
Examples on matching (EPFL)
6December 2010
2nd "NameGame" APE-INV workshop
Examples on filtering (EPFL)
7December 2010
2nd "NameGame" APE-INV workshop 8
What we learned so far?
• General– Matching algorithms are not perfect, but improve
considerably the results.
• Cleaning step– Data origin changes substantially the data preparation
process
• Matching step– There is a hierarchy pattern across algorithms, although
specific to each particular case
• Filtering step– Supplementary data availability enhances or constraints
the disambiguation process
December 2010
2nd "NameGame" APE-INV workshop 9
Why to create a user friendly software?
December 2010
PATSTAT /APE-INV Database
Survey
PATVAL
EU FWProgram SCOPUS
ISI Thomson
2nd "NameGame" APE-INV workshop 10
Concept behind Mr. JOTL
• Intuitive for beginner users• Flexible on inputs and its preparation• Fair variety of standard matching processes• Adaptable on the disambiguation filters• But soundly customizable for advanced users• Conceived and coded to be expanded in the
future by multiple developers
December 2010
2nd "NameGame" APE-INV workshop 11
From concept to real• (ok for the moment just an alpha!)
December 2010
Inputs
IPTS, Sevilla May 2010. 12
13IPTS, Sevilla May 2010.
Parsing
Matching
IPTS, Sevilla May 2010. 14
Disambiguation
IPTS, Sevilla May 2010. 15
SSM
2nd "NameGame" APE-INV workshop 16
LET’S TEST IT!
December 2010
2nd "NameGame" APE-INV workshop 17
Technical notes• OS supported (so far):
– Windows XP, Vista, Seven (Server & x64)
• Coded in C sharp– Pros:
• Free Development Environment• Low cost of entry• Large Developer community
– Cons:• Proprietary language and libraries• Less performing memory management
• Libraries needed: Scintella: open source lexer, syntax highlighter
• Customizable code:– C sharp & VBA
• Suggested environment for future development:– Visual Studio (Express version is free to use)– Mono in Linux
December 2010
2nd "NameGame" APE-INV workshop 18
Further developments
• Full coding existing algorithms.• Testing performance against large dataset
(>Million records).• Pre-setting standard routines (as XML).• Drafting documentation (+Video).• Proof-testing with first time users (at EPFL).
December 2010
2nd "NameGame" APE-INV workshop 19
Openness and its governance
• How to share it?– GitHub?– Forums
• How to develop a dynamic sharing community?
December 2010
2nd "NameGame" APE-INV workshop 20
Thank you!
December 2010
Top Related