Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley...
![Page 1: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/1.jpg)
Recognizing Recordsfrom the Extracted Cells
of Microfilm Tables
Kenneth M. TubbsDavid W. Embley
Brigham Young University
Supported by NSFSupported by NSF
![Page 2: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/2.jpg)
MotivationMotivation
![Page 3: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/3.jpg)
MotivationMotivation
• Millions want microfilm informationMillions want microfilm information– 1880 census on-line, end of October1880 census on-line, end of October– 3 million hits per hour on familysearch.org3 million hits per hour on familysearch.org
• Acquiring information from microfilmAcquiring information from microfilm– Expensive and time consumingExpensive and time consuming– 2.5 million rolls, 20,000 extractors, 100 hours per 2.5 million rolls, 20,000 extractors, 100 hours per
year: requires 104 yearsyear: requires 104 years• Finding a way to automate: big win!Finding a way to automate: big win!
![Page 4: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/4.jpg)
DifficultiesDifficulties
• DDifferent layouts and styles ifferent layouts and styles
• Different types of dataDifferent types of data
• Sometimes ambiguousSometimes ambiguous
• Type-written labels (OCR)Type-written labels (OCR)
• Hand-written data (?)Hand-written data (?)
![Page 5: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/5.jpg)
Objective: Identify RecordsObjective: Identify Records
• Ontological as well as geometric constraintsOntological as well as geometric constraints• Layout of handwritten valuesLayout of handwritten values• Layout of empty cellsLayout of empty cells
Given a zoned image of a microfilm table, exploit:Given a zoned image of a microfilm table, exploit:
Output field coordinates (labeled with respect to Output field coordinates (labeled with respect to the ontology) and organized into recordsthe ontology) and organized into records
![Page 6: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/6.jpg)
AlgorithmAlgorithm
SQL Insert Statements
SQL Insert Statements
XML Input File(Preprocessed Microfilm Image)
Genealogical Ontology
InputInput OutputOutputMethodMethod
Generate ConfidenceGenerate
Confidence
EnforceConstraints
EnforceConstraints
VerifyResultsVerifyResults
![Page 7: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/7.jpg)
““Training” SetTraining” Set
• 25 Tables from 5 different microfilm rolls25 Tables from 5 different microfilm rolls• Used to:Used to:
– Identify relationships between table cells Identify relationships between table cells
– Create genealogical ontologyCreate genealogical ontology
– Define features to extractDefine features to extract
– Generate rules (constraints)Generate rules (constraints)
![Page 8: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/8.jpg)
Input: Microfilm TableInput: Microfilm Table
![Page 9: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/9.jpg)
Input: Microfilm TableInput: Microfilm Table
![Page 10: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/10.jpg)
Input: Microfilm TableInput: Microfilm Table
Input FeaturesInput Features
1.1. Coordinates of each cellCoordinates of each cell
2.2. Printed text for label cellsPrinted text for label cells
3.3. Cell empty or notCell empty or not
![Page 11: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/11.jpg)
Input: Microfilm TableInput: Microfilm Table
<<index index sourcesource="="0444770/0444770_2.gif0444770/0444770_2.gif"" ontologyontology="="ontology.xmlontology.xml">">
<<cellcell rectrect="="7,131,62,2617,131,62,261"" printed_textprinted_text="="Dwelling-houses number in the order Dwelling-houses number in the order of visitation.of visitation."" emptyempty="="00" />" />
<<cellcell rectrect="="61,132,118,26061,132,118,260"" printed_textprinted_text="="Families number in order of Families number in order of visitation.visitation."" emptyempty="="00" />" />
<<cellcell rectrect="="119,132,436,261119,132,436,261"" printed_textprinted_text="="The Name of every Person whose The Name of every Person whose usual place of abode on the first day of June, 1840, was in this usual place of abode on the first day of June, 1840, was in this family.family."" emptyempty="="00" />" />
<<cellcell rectrect="="62,260,120,29562,260,120,295"" printed_textprinted_text="="22"" emptyempty="="00" />" />
<<cellcell rectrect="="118,260,436,298118,260,436,298"" printed_textprinted_text="="33"" emptyempty="="00" />" />
<<cellcell rectrect="="7,458,62,4977,458,62,497"" printed_textprinted_text=""="" emptyempty="="11" />" />
. . .. . .
![Page 12: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/12.jpg)
Genealogical OntologyGenealogical Ontology
![Page 13: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/13.jpg)
Genealogical OntologyGenealogical Ontology
![Page 14: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/14.jpg)
Genealogical OntologyGenealogical Ontology <<OntologyOntology>>
<<ObjectSetObjectSet id id="="00"" name name="="PersonPerson"" syn syn=""="" lex lex="="00"/>"/>
<<ObjectSetObjectSet id id="="11"" name name="="FamilyFamily"" syn syn="="familiesfamilies"" lex lex="="00"/>"/>
<<ObjectSetObjectSet id id="="22"" name name="="EventEvent"" syn syn=""="" lex lex="="00"/>"/>
<<ObjectSetObjectSet id id="="33"" name name="="AgeAge"" syn syn="="age birthdayage birthday"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="44"" name name="="RelationshipRelationship"" syn syn="="relationship relationrelationship relation"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="55"" name name="="Full NameFull Name"" syn syn="="full name whom whofull name whom who"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="66"" name name="="First NameFirst Name"" syn syn="="first given christianfirst given christian"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="77"" name name="="Middle Name(s)Middle Name(s)"" syn syn="="middle initialmiddle initial"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="88"" name name="="Last NameLast Name"" syn syn="="last surnamelast surname"" lex lex="="11"/>"/>
<<ObjectSetObjectSet id id="="99"" name name="="Title(s)Title(s)"" syn syn="="titletitle"" lex lex="="11"/>"/>
. . .. . .
![Page 15: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/15.jpg)
Generate Confidence Generate Confidence MatricesMatrices
• Relationships between pairs of cellsRelationships between pairs of cells
• Confidence values between 0 and 1Confidence values between 0 and 1
Generate Confidence
Generate Confidence
![Page 16: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/16.jpg)
RelationshipsRelationshipsGenerate Confidence
Generate Confidence
• Label cell describes value cellsLabel cell describes value cells
• Value cells in same row or columnValue cells in same row or column
• Label cells form a multi-level label Label cells form a multi-level label
• Label cells correspond to object setsLabel cells correspond to object sets
• Value factoring and nested valuesValue factoring and nested values
![Page 17: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/17.jpg)
Label Cell and Value CellLabel Cell and Value Cell
A continuous path between a label A continuous path between a label cell and a value cellcell and a value cell
Generate Confidence
Generate Confidence
Label Label
Confidence =Confidence =
1 If a path exists1 If a path exists
0 If no path exists0 If no path exists
![Page 18: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/18.jpg)
Label Cell and Value CellLabel Cell and Value Cell
Preferences for label – value Preferences for label – value orientationsorientations
Generate Confidence
Generate Confidence
Label Orientation Confidence
Above 1
Left .75
Right .5
Below .25
Label
![Page 19: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/19.jpg)
Label Cell and Value CellLabel Cell and Value Cell
Compare the height or width of each Compare the height or width of each label cell with each value celllabel cell with each value cell
Generate Confidence
Generate Confidence
LabelLabelOROR
1100Not SimilarNot Similar SimilarSimilar
![Page 20: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/20.jpg)
Value Cell and Value CellValue Cell and Value Cell(Same Row)(Same Row)
A continuous, A continuous, horizontalhorizontal path exists path exists between a pair of value cellsbetween a pair of value cells
Generate Confidence
Generate Confidence
Confidence =Confidence =
1 If a path exists1 If a path exists
0 If no path exists0 If no path exists
![Page 21: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/21.jpg)
Value Cell and Value Cell Value Cell and Value Cell (Same Column)(Same Column)
A continuous, A continuous, verticalvertical path exists path exists between a label cell and a value cellbetween a label cell and a value cell
Generate Confidence
Generate Confidence
Confidence =Confidence =
1 If a path exists1 If a path exists
0 If no path exists0 If no path exists
![Page 22: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/22.jpg)
Value Cell and Value CellValue Cell and Value Cell(Geometrically Similar )(Geometrically Similar )
Compare height and widthCompare height and width
Generate Confidence
Generate Confidence
1100Not SimilarNot Similar SimilarSimilar
![Page 23: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/23.jpg)
Multi-level LabelsMulti-level Labels
• Distance between the midpoints Distance between the midpoints
• A line through the midpointsA line through the midpoints
• Share a common borderShare a common border
Generate Confidence
Generate Confidence
![Page 24: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/24.jpg)
Match Label Cells to Object SetsMatch Label Cells to Object Sets
• Location of matched wordsLocation of matched words
• Order of matched wordsOrder of matched words
Generate Confidence
Generate Confidence
Full NameFull Name
LocationLocation
DayDay
FamilyFamily
Object SetsObject Sets
![Page 25: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/25.jpg)
Enforce ConstraintsEnforce Constraints
• Rules for geometric and ontological constraintsRules for geometric and ontological constraints
• Examples:Examples:– Same-type value cells have the same dimensions.Same-type value cells have the same dimensions.
– A family can’t have 100 members.A family can’t have 100 members.
• Iterate over the rules, seeking convergenceIterate over the rules, seeking convergence
Generate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 26: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/26.jpg)
Similar Value CellsSimilar Value CellsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 27: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/27.jpg)
Similar Value CellsSimilar Value CellsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
LowerLowerConfidenceConfidence
![Page 28: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/28.jpg)
Similar Value CellsSimilar Value CellsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 29: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/29.jpg)
Combine AggregationsCombine AggregationsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 30: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/30.jpg)
Multi-level LabelsMulti-level LabelsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 31: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/31.jpg)
FactoringFactoring
• Observed cardinality in microfilm tableObserved cardinality in microfilm table
• Expected cardinality in genealogy ontologyExpected cardinality in genealogy ontology
Generate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
Check Cardinality ConstraintsCheck Cardinality Constraints
![Page 32: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/32.jpg)
Observed CardinalityObserved CardinalityGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints [First Name] per [Family] = [First Name] per [Family] = 4545 / / 99 = = 4.674.67
. . .. . .
![Page 33: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/33.jpg)
Expected CardinalityExpected Cardinality
[First Name] per [Family] = 4.8 * 1 * 1 = [First Name] per [Family] = 4.8 * 1 * 1 = 4.84.8
Generate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
![Page 34: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/34.jpg)
Ontological SimilarityOntological SimilarityGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints Increase Confidence of Label Increase Confidence of Label
to Object Set Mappingsto Object Set Mappings
![Page 35: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/35.jpg)
Same Microfilm RollSame Microfilm RollGenerate
Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
Average Confidence Values Across TablesAverage Confidence Values Across Tables
![Page 36: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/36.jpg)
Verify ResultsVerify ResultsGenerate Confidence
Generate Confidence
EnforceConstraints
EnforceConstraints
VerifyResults
VerifyResults
![Page 37: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/37.jpg)
DatabaseDatabase
Full NameFull Name …
Generate Confidence
Generate Confidence
ApplyRules
ApplyRules
VerifyResults
VerifyResults
…
INSERT INTO Person (Full Name) VALUES INSERT INTO Person (Full Name) VALUES
('('335,114,521,172335,114,521,172')') INSERT INTO Person (Full Name) VALUES INSERT INTO Person (Full Name) VALUES
('('335,173,521,231335,173,521,231')') …
SQL Statements Insert Value Cell CoordinatesSQL Statements Insert Value Cell Coordinates
![Page 38: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/38.jpg)
““Training” Set ResultsTraining” Set Results
RelationshipRelationship PrecisionPrecision RecallRecall AccuracyAccuracy
Label Cell Describes Label Cell Describes
Value CellValue Cell100%100% 100%100% 100%100%
Value Cells in Same Value Cells in Same Row or ColumnRow or Column
100%100% 100%100% 100%100%
Multilevel LabelsMultilevel Labels 100%100% 100%100% 100%100%
Label Cells – Object Label Cells – Object Set MatchesSet Matches
100%100% 100%100% 100%100%
FactoringFactoring 74.45%74.45% 100%100% 84.65%84.65%
SQL FieldsSQL Fields 99.42%99.42% 100%100% 99.71%99.71%
![Page 39: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/39.jpg)
Ambiguous FactoringAmbiguous Factoring
![Page 40: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/40.jpg)
ExperimentsExperiments
• 75 tables from 15 different microfilm rolls75 tables from 15 different microfilm rolls
• Precision, recall, and accuracyPrecision, recall, and accuracy– Populated SQL fieldsPopulated SQL fields– Each relationshipEach relationship
![Page 41: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/41.jpg)
Test Set ResultsTest Set Results
RelationshipRelationship PrecisionPrecision RecallRecall AccuracyAccuracy
Label Cell Describes Label Cell Describes
Value CellValue Cell100%100% 98.12 %98.12 % 98.12 %98.12 %
Value Cells in Same Value Cells in Same Row or ColumnRow or Column
100%100% 100%100% 100%100%
Multilevel LabelsMultilevel Labels 100%100% 99.67%99.67% 99.82%99.82%
Label Cells – Object Label Cells – Object Set MatchesSet Matches
84.98%84.98% 92.76%92.76% 88.1888.18%%
FactoringFactoring 100%100% 93.40%93.40% 93.47%93.47%
SQL FieldsSQL Fields 93.20%93.20% 92.41%92.41% 92.15%92.15%
![Page 42: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/42.jpg)
Factoring over Several Factoring over Several Tables Improved ResultsTables Improved Results
![Page 43: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/43.jpg)
Some Long Label NamesSome Long Label NamesCaused ConfusionCaused Confusion
State here the particular ReligionState here the particular Religionor Religious Denomination,or Religious Denomination,
to which each persons belongs.to which each persons belongs.[Members of Protestant Denomina-[Members of Protestant Denomina-tions are requested not to describetions are requested not to describe
themselves by the vague termthemselves by the vague term‘‘Protestant,’ but to enter theProtestant,’ but to enter the
name of the Particular Church,name of the Particular Church,Denomination, or Body, to whichDenomination, or Body, to whichthey belong.] they belong.]
![Page 44: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/44.jpg)
Ambiguous ColumnsAmbiguous ColumnsCaused ConfusionCaused Confusion
Full NameFull Name
![Page 45: Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.](https://reader036.fdocuments.us/reader036/viewer/2022081516/56649d6d5503460f94a4d8c0/html5/thumbnails/45.jpg)
Conclusions
• Identified records in microfilm tables– Geometric and ontological properties– Evidence matrices & corroboration rules
• Accuracy: ~92%
http://www.rdhd.byu.eduhttp://www.fht.byu.edu