Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative...

13
Japanese Records and Japanese Records and Whether or not to Whether or not to Switch from MARC 8 to Switch from MARC 8 to Unicode Storage Unicode Storage (with an Innovative Interfaces Millennium local system) (with an Innovative Interfaces Millennium local system) The University of The University of Washington Law Library’s Washington Law Library’s Decision-making Process Decision-making Process

Transcript of Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative...

Page 1: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

Japanese Records and Japanese Records and Whether or not to Switch from Whether or not to Switch from MARC 8 to Unicode StorageMARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system)(with an Innovative Interfaces Millennium local system)

The University of Washington Law The University of Washington Law Library’s Decision-making Process Library’s Decision-making Process

日本日本

語語

Page 2: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

Differences in Storage and/or Export Differences in Storage and/or Export Settings With Different Local Systems Settings With Different Local Systems

Your Mileage May VaryYour Mileage May VaryIt’s important to note that different local systems vary widely in whether and It’s important to note that different local systems vary widely in whether and how data is stored, imported and exported. These differences will have a how data is stored, imported and exported. These differences will have a huge impact on the experience of librarians making decisions on whether or huge impact on the experience of librarians making decisions on whether or not to export records in Unicode from OCLC to the local system.not to export records in Unicode from OCLC to the local system.

Innovative Interfaces Millennium Local SystemsInnovative Interfaces Millennium Local Systems Do not allow import of records encoded differently than the encoding for Do not allow import of records encoded differently than the encoding for storage. In other words, If III storage is set to Unicode, records must be storage. In other words, If III storage is set to Unicode, records must be imported from OCLC in Unicode. If storage is set to MARC 8, records must imported from OCLC in Unicode. If storage is set to MARC 8, records must be imported in MARC 8be imported in MARC 8

Voyager Local Systems (CJK version)Voyager Local Systems (CJK version)Can be set to convert imported MARC 8 records to Unicode on-the-fly for Can be set to convert imported MARC 8 records to Unicode on-the-fly for storage. This makes the decision about exporting from OCLC Connexion in storage. This makes the decision about exporting from OCLC Connexion in Unicode VS MARC 8 less important (almost irrelevant)Unicode VS MARC 8 less important (almost irrelevant)

Other Local Systems?Other Local Systems? Local systems that store data in MARC 8 cannot import and display Unicode Local systems that store data in MARC 8 cannot import and display Unicode records unless they convert the records to MARC 8. Conversely, local records unless they convert the records to MARC 8. Conversely, local systems storing data in Unicode cannot import MARC 8 records unless the systems storing data in Unicode cannot import MARC 8 records unless the data is converted to Unicode. data is converted to Unicode.

Ask these questions about your local system:Ask these questions about your local system: What encoding is used for storage?What encoding is used for storage? Is there a required encoding for imported records?Is there a required encoding for imported records? If not, are imported records automatically converted to the appropriate encoding for If not, are imported records automatically converted to the appropriate encoding for

storage?storage?

日本日本

語語

Page 3: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語 Our Library is trying to decide…Our Library is trying to decide…

To switch, or not to switch…To switch, or not to switch…

Innovative InterfacesMillennium System

OCLC ConnexionJapanese Records

Marian Gould Gallagher Law Library

MARC 8OR

Unicode Storage??

Page 4: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語Unicode VS MARC 8 BasicsUnicode VS MARC 8 Basics

Computers store text as numeric codes. Unicode has Computers store text as numeric codes. Unicode has become the standard for text storage worldwide. Its use become the standard for text storage worldwide. Its use facilitates the storage, transfer, and display of text in a facilitates the storage, transfer, and display of text in a wide range of computer software environments (the wide range of computer software environments (the internet, databases, browsers, word processors, etc)internet, databases, browsers, word processors, etc)

What is MARC 8?What is MARC 8?MARC 8 has been the North American Library MARC 8 has been the North American Library Community’s text storage standard.Community’s text storage standard.((“The group of 7/8-bit and 24-bit character sets used to encode MARC “The group of 7/8-bit and 24-bit character sets used to encode MARC 21 records. These sets are specified in MARC 21 Specifications for 21 records. These sets are specified in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Character Record Structure, Character Sets, and Exchange Media, Character Sets, Part 1.”Sets, Part 1.”1)1)

What is Unicode?What is Unicode?Unicode has become the international standard for text Unicode has become the international standard for text storage.storage.“The Universal Character Set (UCS) which is ISO 10646 and its “The Universal Character Set (UCS) which is ISO 10646 and its industry counterpart Unicode.”industry counterpart Unicode.”11

11Source: LC’s “MARC 21 Specifications for Record Structure, Character Source: LC’s “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS”Sets, and Exchange Media: CHARACTER SETS”http://http://www.loc.gov/marc/specifications/speccharintro.htmlwww.loc.gov/marc/specifications/speccharintro.html

Page 5: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What Problems are Specific to Japanese?What Problems are Specific to Japanese?

Q:Q: Do Some Problems associated with Unicode vs Do Some Problems associated with Unicode vs MARC 8 storage affect one language (such as MARC 8 storage affect one language (such as Japanese) more than others?Japanese) more than others?

A:A: Not Really. Problems with character display for Not Really. Problems with character display for specific languages are more often an issue of specific languages are more often an issue of font availability. Each application must have font availability. Each application must have access to a font that will display the proper access to a font that will display the proper characters. Arial Unicode MS can display most characters. Arial Unicode MS can display most Unicode characters. In library records, an Unicode characters. In library records, an additional issue is converting between MARC 8 additional issue is converting between MARC 8 and Unicode.and Unicode.

But these issues can affect many languages and But these issues can affect many languages and scripts; not just Japanese.scripts; not just Japanese.

Page 6: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What Problems are Specific to Japanese?What Problems are Specific to Japanese?

Q:Q: So are there any Japanese-specific problems? So are there any Japanese-specific problems? A:A: Not when it comes to Unicode storage itself. Not when it comes to Unicode storage itself. But But

there are common problems with display of Kanji there are common problems with display of Kanji and Japanese romanization in library catalogs. and Japanese romanization in library catalogs. These are mainly font-availability issues, not These are mainly font-availability issues, not Unicode storage issues. Unicode storage issues.

Examples of Font-based Problems Specific to Examples of Font-based Problems Specific to JapaneseJapanese

Romanization (Diacritic Problem)Romanization (Diacritic Problem)• ““Alif” as in kon’in Alif” as in kon’in 婚姻婚姻

KanjiKanjiExamples of Japanese Kanji not in EACCExamples of Japanese Kanji not in EACC(Different Unicode Code Point Required for Verified (Different Unicode Code Point Required for Verified Catalog Record in OCLC)Catalog Record in OCLC)

• MARC 8/ EACC: MARC 8/ EACC: 說 說 (U+8AAA) instead of (U+8AAA) instead of 説 説 (U+8AAC)(U+8AAC)• MARC 8/ EACC: MARC 8/ EACC: 虛 虛 (U+865B) instead of (U+865B) instead of 虚 虚 (U+865A)(U+865A)• MARC 8/ EACC: MARC 8/ EACC: 卷 卷 (U+5377) instead of (U+5377) instead of 巻 巻 (U+5DFB)(U+5DFB)• MARC 8/ EACC: MARC 8/ EACC: 錄 錄 (U+9304) instead of (U+9304) instead of 録 録 (U+9332)(U+9332)• MARC 8/ EACC: MARC 8/ EACC: 查 查 (U+67E5) instead of (U+67E5) instead of 査 査 (U+67FB)(U+67FB)

Page 7: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What Problems are What Problems are

Specific to Japanese?Specific to Japanese?Why Switch to Unicode Storage?Why Switch to Unicode Storage?

Q:Q: If there are no problems with MARC 8 storage If there are no problems with MARC 8 storage specific to Japanese, then why should our library specific to Japanese, then why should our library switch to Unicode storage?switch to Unicode storage?

A:A: Consider this quote from Microsoft: Consider this quote from Microsoft:““Deciding whether to store non-DBCS [double-byte Deciding whether to store non-DBCS [double-byte

character set] data as Unicode is generally character set] data as Unicode is generally determined by an awareness of the effects on determined by an awareness of the effects on storage, and about how much sorting, storage, and about how much sorting, conversion, and possible data corruption might conversion, and possible data corruption might happen during client interactions with the data. .. happen during client interactions with the data. .. However, for most applications the effect is However, for most applications the effect is negligible. Databases with well-designed negligible. Databases with well-designed indexes are especially unlikely to be affected… indexes are especially unlikely to be affected…

Page 8: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What Problems are Specific to What Problems are Specific to

Japanese?Japanese?Why Switch to Unicode Storage?Why Switch to Unicode Storage?

A: (continued)A: (continued) Most of the time, the decision to store Most of the time, the decision to store character data, even non-DBCS data, in Unicode character data, even non-DBCS data, in Unicode should be based more on business needs instead should be based more on business needs instead of performance. In a global economy that is of performance. In a global economy that is encouraged by rapid growth in Internet traffic, it is encouraged by rapid growth in Internet traffic, it is becoming more important than ever to support becoming more important than ever to support client computers that are running different client computers that are running different locales. Additionally, it is becoming increasingly locales. Additionally, it is becoming increasingly difficult to pick a single code page that supports difficult to pick a single code page that supports all the characters required by a worldwide all the characters required by a worldwide audience.”audience.” 22

22See the Microsoft article See the Microsoft article “Storage and Performance “Storage and Performance Effects of Unicode” : Effects of Unicode” : http://msdn2.microsoft.com/en-us/library/ms189617.aspxhttp://msdn2.microsoft.com/en-us/library/ms189617.aspx

Page 9: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What are the Pros and Cons to Converting our What are the Pros and Cons to Converting our

Local System to “Unicode Storage”?Local System to “Unicode Storage”?

Advantages of Staying with MARC 8Advantages of Staying with MARC 8 May not be possible to “back out” of switch to Unicode if May not be possible to “back out” of switch to Unicode if

problems crop upproblems crop up Your records have No risk of being damagedYour records have No risk of being damaged Could be faster than Unicode (but probably is not)Could be faster than Unicode (but probably is not) In a phrase: “If it ain’t broke, don’t fix it!”In a phrase: “If it ain’t broke, don’t fix it!”

Advantages of Switching to UnicodeAdvantages of Switching to Unicode Could enhance data exchange capabilitiesCould enhance data exchange capabilities

• Export/ImportExport/Import• Copy/Paste between ApplicationsCopy/Paste between Applications• Network printingNetwork printing

Allows for display of your records in a wide variety of world-Allows for display of your records in a wide variety of world-wide computing environmentswide computing environments

May improve some long-standing problems with local May improve some long-standing problems with local system software (such as printing, display)system software (such as printing, display)

Supporting the international Unicode standard is one of Supporting the international Unicode standard is one of presenting your library catalog as a global resourcepresenting your library catalog as a global resource

““Nothing ventured, nothing gained!”Nothing ventured, nothing gained!”

Page 10: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語 In our library:In our library:

The Head of Technical The Head of Technical ServicesServices

• Main contact with InnovativeMain contact with Innovative• Requests information about Requests information about

successes/problems at other successes/problems at other librarieslibraries

East Asian Law DepartmentEast Asian Law Department• Responsible for Chinese, Responsible for Chinese,

Japanese, and Korean recordsJapanese, and Korean records• Work together with Tech ServicesWork together with Tech Services

OCLC ConnexionOCLC ConnexionOCLC Connexion

Gallagher Law LibraryLocal System an Innovative Interfaces, Inc. Millennium local system

MARC 8 MARC 8 StorageStorage

Unicode Unicode StorageStorage

Who decides whether to flip…Who decides whether to flip…

……the switch to the switch to Unicode Storage?Unicode Storage?

OCLC Connexion

Page 11: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What will our library do?What will our library do?

Undetermined!Undetermined!Our library is still in the Our library is still in the decision processdecision process

We’re considering all We’re considering all of the information of the information noted in this noted in this presentationpresentation

We will probably We will probably decide soon!decide soon!

University of Washington Marian Gould Gallagher Law Library

Page 12: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

日本日本

語語What sources of information are there?What sources of information are there?

Your Local System GuidesYour Local System Guides Library of Congress GuidesLibrary of Congress Guides

Such as: Such as: LC’s “MARC 21 Specifications for Record Structure, LC’s “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS”Character Sets, and Exchange Media: CHARACTER SETS”http://http://www.loc.gov/marc/specifications/speccharintro.htmlwww.loc.gov/marc/specifications/speccharintro.html

OCLC CJK HelpOCLC CJK Help Microsoft GuidesMicrosoft Guides

Such as:Such as: “Storage and Performance Effects of “Storage and Performance Effects of Unicode” : Unicode” : http://msdn2.microsoft.com/en-us/library/ms189617.aspxhttp://msdn2.microsoft.com/en-us/library/ms189617.aspx

Unicode ConsortiumUnicode Consortiumhttp://http://www.unicode.orgwww.unicode.org//

OCLC CJK listservOCLC CJK listserv Eastlib listservEastlib listserv

Page 13: Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

Flipping the switch…Flipping the switch…

Is up to you and Your Is up to you and Your Library…Library…MARC 8 MARC 8

StorageStorage

Unicode Unicode StorageStorage