Post on 09-Jan-2016
description
Problems with Non-roman Character (Korean)Searching
Prepared by Young Ki Lee Senior Cataloging Specialist Korean/Chinese Team RCCD Library of Congress
Topics to be covered
1.Non-roman script (Korean) searching under CJK data fields without spacing2.No Unified index (Normalization) between Hangul (Korean) and Hancha (Chinese character)3.Microsoft Korean IME4.Display of search results5.CJK Compatibility Database
Title Word Search for Search (: the border):-the number of hits on this ti: search is 363
-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / : / : //, : /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Search9
Title Word Search for Search (: the border):-the number of hits on this ti: search is 360
-the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)-the records which have the word in any position in the title fields (includes between subfields) are picked up by System, such as : / : / : / : //, : /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Title Word Search for
Search (: the border):
-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as
= / = / = / = //, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as =
=
= = , = , etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = /
= //, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / =//, = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
Title Word Search for Search (: the border):-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993) -the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / =// = /, etc.-In Voyager (currently with space), same search (tkey ) retrieves only 9 hits
7
Title Word Search for Search (: the border):
-the number of hits on this ti: search is 360 -the ratio of relevant hits only 13 % (13 out of 99) in the 1st group (Books 1970-1993)-the records which have the word in any position in the title fields (includes between subfields) are retrieved, such as = / = / = / = //, = /, etc.
-In LC Online Catalog: (currently with space), title word search retrieves only 9 hits
Title Word Search for Search ( : philology):-In OCLC, the number of hits on ti: search is 308-the ratio of relevant hits is only 37% (36 out of 95) in the first group (Books 1900-1991)-Includes
= = / = = / , = /, etc.
-In Voyager (currently with space), same search (tkey ) retrieves 32 hits
Title Word Search for
Search ( : name of ancient Korean country)
retrieves irrelevant records, such as =/////CD-ROM = CD-ROM///// = // = //////
= // 5= ///5//////// = ///, etc.
2
4
7
Kochoson8
komunso1
Komunso2
Komunso3
Title Word Search for
( : Korean Economy): ti: search
-search : the number of hits 300 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search
Title Word Search for
( : Korean Economy): ti: search
-search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search
Title Word Search for
( : Korean Economy): ti: search -search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search
Title Word Search for
( : Korean Economy): ti: search
-search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,490
Title Phrase search for : ti= search
Title Word Search for
( : Korean Economy): ti: search
-search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3 -search : the number of hits 0 -search Hanguk kyongje : the number of hits 1,499
Title Phrase search for : ti= search
Title Phrase Search for
( : Korean Economy): ti: search
-search : the number of hits 295 -search : the number of hits 652-search : the number of hits 3-search : the number of hits 0-search Hanguk kyongje : the number of hits 1,490-search # : the number of hits : 461 (ti: AND ti: )
Title Phrase search for : ti= search
Search ti: nodongja or or or
Search ti: nodongja or or or
Korean IME Problems 1. Personal name search with invalid character from Korean IME
-Search in pn: : 0 hit. (F9E1) is invalid character from Korean IME-Search in pn: : 157 hits. (674E) is valid MARC21 character
2. Title search with invalid character from Korean IME
-Search in ti: : 0 hit. (F941) is invalid character from Korean IME-Search in ti: : 21,393 hits. (8AD6) is valid MARC21 character
3. Korean Family name -No MARC 21 equivalent
Display Order 1.Browse search: sorted by Unicode value number roman Japanese Hancha Hangul
2.Keyword search: sorted by alphabet order of Romanization formnumber -- Romanization3.Display order : character by character on designated value
sort2 Unicode total strokes radical (# : stroke): 9280: 14 167 (gold) 8: 9580 : 8 169 (gate) 8: 990A: 15 184 (eat) 6: 9B42 14 194 (ghost) 10: AC00
sort3
Display OrderBrowse search: sorted by Unicode value number roman Japanese Hancha Hangul
2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization3.Display order : character by character on designated valueNOT word by word
sort1: C9C4: CE68: C911: C778
Display Order1.Browse search: sorted by Unicode value number roman Japanese Hancha Hangul
2.Keyword search: sorted by alphabet order of Romanization form number -- Romanization3.Display order : character by character on designated valueNOT word by word
CJK Compatibility Database
The CJK Compatibility Database includes more than 450 non-MARC21 Chinese, Japanese and Korean characters, Hangul syllables and diacritic marks, matched with their MARC21 equivalents.The database is intended to enable catalogers to quickly and conveniently replace a non-MARC21 character with its MARC21 equivalent.The list of characters in the database was initially identified by LC staff, and was supplemented by entries in a similar database at Yale University.The database is a cooperative undertaking, and is intended for the use of all CJK catalogers. If you encounter a non-MARC21 character in the course of your work, please report it to us so that it can be added to the database. Notify Young Ki Lee, Senior Cataloging Specialist, Korean/Chinese Team, Library of Congress, at ylee@loc.gov.
Thank you