SMART-GS: A Tool for Studying Digitized Historical Manuscripts
-
Upload
yuta-hashimoto -
Category
Software
-
view
143 -
download
0
Transcript of SMART-GS: A Tool for Studying Digitized Historical Manuscripts
SMART-GS: A Tool for Studying Digitized Historical Manuscripts
Yuta HashimotoPhD student, Department of Humanistic InformaticsKyoto University
March 15, 2015 @ University of Michigan
Introduction• Who am I
• A PhD student studying DH at Kyoto University• Research interest: Digital History• Background: History of Science• Also an iOS/Android Developer
• Kin Digi Reader (近デジリーダー) • A mobile reader for the Kindai Digital Library
• In this talk, I will…• Introduce an application named SMART-GS• And its possible contributions to Japanese studies
What is SMART-GS?
• A transcription/annotation suite for digitized historical manuscripts
• Has been developed in Kyoto University since 2007
• An open source project
• SMART-GS is NOT• An OCR application for handwritten texts• A language-dependent application
A Screenshot
Project Background:The Increase of Large-Scale Digital Archives
How Should Historians Handle Digital Images?
David Hilbert (1862-1943)
Problems with Paper-based Research
1. Papers are heavy and require space
2. Difficult to share the “metadata” added to the manuscripts with co-workers
3. Organizing information is also difficult• Searching, grouping, indexing, etc…
Main Features of SMART-GS
Introducing SMART-GS
Markup Functions for Texts and Images
• Various ways of marking up image regions:• rectangle or polygon shape• Drawing an arrow from one
region to another• Putting a comment on it• etc.
• HTML markup for texts:• Highlighting a certain word or
phrase• Adding a link to an external
website
Linking Markups
• Any two markups can be linked to each other
• These links are one-to-many and bidirectional
• Link itself can be annotated
Word Spotting for Handwritten Text (DSC Search)
Search results for query “Scheler” (a German philosopher’s name)
How DSC Search indexes images1. Separate the image into
lines
2. Divide each line into thin slits
3. Compute a gradient vector for each pixel in each slits
4. Accumulate these gradient vectors (which will be used as “feature vectors”)
How DSC Search Finds Similar Images
Query image
Candidate Image
• DSC Search calculates the “distances” between the query and candidate images by comparing their feature vector sequences
• The smaller the distance is, the more likely two images have similar shapes
Pros and Cons of DSC Search
• Pros• Can be applied to any type of documents, regardless of
its languages and text directions• No need for executing machine learning
• Cons• Requires preprocessing by users for separating lines• Not accurate for manuscripts written by multiple authors
Applications of SMART-GS to Historical Research Projects
Transcription Project of Kuratomi’s Diary
• Baron Yuzaburo Kuratomi (1853-1943)• An elite bureaucrat-politician of
Meiji, Taisho, and early Showa era
• Project goal• to publish complete transcription of
Kuratomi’s diary• which consists of more than 300
notebooks
Team-based Transcription with SMART-GS
WebDAV Server
gsx file
1. Create draft transcriptions
2. Add annotations
3. Revise and finalize transcription texts
Transcription of Hajime Tanabe’s Lecture Notebooks
• Hajime Tanabe (1885-1962)• One of prominent philosophers
of Kyoto School
• Tanabe’s lecture notebooks• Written in Japanese, German,
Latin, Greek, and English• And written in extremely bad
handwriting
Group Reading of Tanabe’s Notebooks
Transcription of Earthquake Recordings
◀ Teibi Shinsai Roku ( 丁未震災録 ): A recording of a large earthquake that took place in 1847
▲Reading Group of Earthquake Recordings (古地震研究会)
How SMART-GS can Contribute to Japanese Studies
As a Group Learning Tool
Creating a Shared Dictionary with SMART-GS
As a Platform for the International Collaboration
• NIJL’s large-scale project• Titled “Construction of the International Collaborative
Network on Japanese Classical Books”• 0.3 million books will be digitized and published on the
web by 2024
Our Current Attempts
• To have NIJL use SMART-GS as their official transcription tool
• And to make SMART-GS a global platform for Japanese studies
• So that scholars all over the world can cooperate through the network on the same platform
Ongoing Development: the Web Version
Conclusion• More and more digital images of historical manuscripts have become available on the web
• SMART-GS provides a set of features to handle these digital images effectively
• And it offers ways to collaborate with other scholars through the network
• Our next attempt is to make SMART-GS a global platform where scholars can collaborate with each other
Thank you for listening!
ご清聴ありがとうございましたAny questions and comments?