Post on 27-Jun-2015
description
Undergraduates Collaborating in Digital Humanities Research
NITLE Digital Scholarship SeminarApril 27, 2012
Panelists
• Moderator: Janet Simons, Associate Director of Instructional Technology, Co-Director, Digital Humanities Initiative (DHi), Hamilton College
• John Burnett, Wheaton College • Sarah Schultz, Hamilton College• Amanda Kleintop, University of Richmond• Gabrielle Kirilloff, University of Pittsburgh• Janis Chinn, University of Pittsburgh
John Burnett, Wheaton College
• Wheaton College Digital History Project• http
://wheatoncollege.edu/digital-history-project/
Sarah Schultz, Hamilton College
• Agha Shahid Ali Poetry Project• http://asa.dhinitiative.org/demo/index.html
Amanda Kleintop, University of Richmond
• History Honors Thesis, “Networks of Resistance: Black Virginians Remember Civil War Loyalties”
How Digital Tools Impact Research Questions and Methodologies in
Literary Studies
http://ft.obdurodon.orghttp://gk.obdurodon.org
Gabrielle Kirilloff, University of Pittsburgh
Research overview
• Previous research– Speech as agency– Speech hierarchies
• Research question– What are the correlations among speech, gender,
and moral alignment?
Research methodologies
• Why XML?– Unique, descriptive tags– Processing a large number of tales– Creating multiple views of the same data
• Learning XML and related technologies
The website
The website
The website
The website
The impact of DH
• Examining a large number of texts• The power of visualizations • Creating a research tool• Looking at texts differently
Studying Register Variation Using Computational MethodsJanis ChinnUniversity of PittsburghApril 27, 2012
http://twitter.obdurodon.org/
Research Question
To what extent do twitter users exercise register shifting when communicating with twitter users at large, non-verified users, and verified users?
Linguistic Register
• Situation-specific variety of language
• Spoken Register• Unconscious effort• Acquired naturally
• Written Register• Conscious effort• Acquired through study
Corpus Building• Python script collects
tweets from the public timeline
• Shell scripting and Perl filter down the corpus
• XML encoded, accessed via XQuery
• English tweets• US and Canada• Currently 519,018 tweets• 98% accurate filtering to
English only text
• Current Statistics:
• Total words: 5,375,767• Unique words: 654,755• Type-token ratio: 0.12• Average tweet length (words): 10.36• Average tweet length (characters):
60.39• Total tweets: 519,018• Total authors: 483,940• Total verified authors: 687• Total non-verified authors: 483,253
What’s next?• Judge tweets on relative register based on:• Expletives and profanity• Rate of non-dictionary word usage • Average word length of dictionary words• Appropriate capitalization• Standard punctuation
• Leetspeak • Chatspeak
• Ratio of function words within a tweet
• Potential additions:• Analysis of word n-grams and character bi-grams• Prescriptive use of ‘whom’ over ‘who’.
Motivations
• “…The Telegraph quoted an actor and a television producer emitting typically brainless "Kids Today" plaints about how modern modes of communication, especially Twitter, are degrading the English language, so that "the sentence with more than one clause is a problem for us", and "words are getting shortened".“ –Mark Liberman, Language Log, 2011
Motivations
• Impossible without DH• Quantifiable and repeatable results• Empowering to build and manipulate tools
to work with data