LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

10
Challenges of Using Twitter Data Twitter Restrictions (only Tweet ID’s) Twitter Developer Policy only allows you to distribute or allow download of Tweet IDs and/or User IDs You may provide export via non- automated means of up to 50,000 public Tweets and/or User Objects per user of your Service, per day Can “hydrate” tweet ID’s from previous datasets Skills Needed Understanding of technology Limitations Social media collections within web archives tend to be event-driven and limited to selected platforms, pages or user accounts Social media platforms protect the algorithms used to generate the allowed sample size Without the algorithm used, researchers cannot verify the sample does not contain any misrepresentation Only certain amount can be requested to prevent excessive data access Context •Individual tweets are limited in their length and contain very little information Complicates the intelligibility of the content at a later time Storage •Sufficient storage space •Can’t store on third party cloud due to twitter restrictions No established standards and best practices Library of Congress Update 2013 First Object (was to be completed in 2013) •Acquire and preserve the 2006-10 archive •Establish a secure and sustainable process for receiving and preserving daily, ongoing steam of tweets •Create a structure for organizing the entire archive by date Second Objective •Confronting and working around the technology challenges to make archive accessible to researchers in a useful way Progress •Archive is at approximately 170 billion tweets! •Has not yet provided researchers access to the archive •A single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours »Limits the number of possible searches »Require an extensive infrastructure of hundreds if not thousands of servers •Working to develop a basic level of access that can be implemented while archival access Archiving Social Media: Twitter Stefanie Hew, Shazia Naderi, Kristiana Wesloh Fall 2016 • LIS 653-01 • Professor Cristina Pattuelli Library of Congress + Twitter •April 2010, the Library signed an agreement with Twitter providing the Library all public tweets from the company’s inception thorough the date of the agreement (2006-10) •Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis on the same terms Value of Archiving Twitter •Primary method of communication and creative expression •Supplementing and supplanting traditional print media •Provide future researchers access to a fuller picture of today’s Cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes Figure 1. Figure 2. Examples of influential hashtag on Twitter. Software to Use Twitter API •Tawpperkepper (before 2011) Twap Socail Feed Manager Twarc TAGS: Twitter Archive Google Sheet Twitter Capture and Analysis Toolset Netlytic References Developer Agreement and Policy. (2016) Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/overview/terms/agreement-and-policy Felt, M. (2016, January-June). Social Media and the Social Sciences: How researchers Employ Big Data Analytics Big Data & Society: 1-15. Retrieved from DOI: 10.1177/2053951716645828 Firehose. (2016). Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/streaming/firehose Library of Congress. (2013, Jan.). Update on the Twitter Archive At the Library of Congress. Retrived from https://www.loc.gov/today/pr/2013/files/twitter_report_2013jan.pdf . Risse, T., Peters, W., Senellart, P., and Maynard, D. (2014). Documenting Contemporary Society by Preserving Relevant Information from Twitter. Retrived from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.337.9643 Thomson, S. D. (2016). Preserving social media. Digital Preservation Coalition Technology Watch Report 16-01 February 2016 . Retrieved from http://dx.doi.org/10.7207/twr16-01 Images: Figure 1: https://about.twitter.com/en-gb/company/brand-assets; https://upload.wikimedia.org/wikipedia/commons/4/41/US- How to Access Twitter Data •Twitter provides a streaming API that gives raw access to Twitter’s global stream of data Creates a “back door” into the current activity on Twitter Only goes back about a week and does not allow for searches of historical tweets •Different volumes of streaming data From 1% of tweets to the Firehose that opens access to 100% of tweets Firehose requires special permission to access •Twitter does not reveal how the 1% sample is selected

Transcript of LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

Page 1: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

Challenges of Using Twitter DataTwitter Restrictions (only Tweet ID’s)

• Twitter Developer Policy only allows you to distribute or allow download of Tweet IDs and/or User IDs

• You may provide export via non-automated means of up to 50,000 public Tweets and/or User Objects per user of your Service, per day

• Can “hydrate” tweet ID’s from previous datasetsSkills Needed

• Understanding of technologyLimitations

• Social media collections within web archives tend to be event-driven and limited to selected platforms, pages or user accounts

• Social media platforms protect the algorithms used to generate the allowed sample size

• Without the algorithm used, researchers cannot verify the sample does not contain any misrepresentation

• Only certain amount can be requested to prevent excessive data access

Context•Individual tweets are limited in their length and contain very little information

• Complicates the intelligibility of the content at a later timeStorage•Sufficient storage space•Can’t store on third party cloud due to twitter restrictionsNo established standards and best practices

Library of Congress Update 2013First Object (was to be completed in 2013)

•Acquire and preserve the 2006-10 archive

•Establish a secure and sustainable process for receiving and preserving daily, ongoing steam of tweets

•Create a structure for organizing the entire archive by date

Second Objective

•Confronting and working around the technology challenges to make archive accessible to researchers in a useful way

Progress

•Archive is at approximately 170 billion tweets!

•Has not yet provided researchers access to the archive

•A single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours

»Limits the number of possible searches »Require an extensive infrastructure of hundreds if not thousands of servers

•Working to develop a basic level of access that can be implemented while archival access technologies catch up

Archiving Social Media: TwitterStefanie Hew, Shazia Naderi, Kristiana Wesloh

Fall 2016 • LIS 653-01 • Professor Cristina Pattuelli

Library of Congress + Twitter•April 2010, the Library signed an agreement with Twitter providing the Library all public tweets from the company’s inception thorough the date of the agreement (2006-10)

•Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis on the same terms

Value of Archiving Twitter•Primary method of communication and creative expression•Supplementing and supplanting traditional print media•Provide future researchers access to a fuller picture of today’s

• Cultural norms, dialogue, trends and events to inform scholarship, the legislative process, new works of authorship, education and other purposes

Figure 1.

Figure 2. Examples of influential hashtag on Twitter.

Software to Use Twitter API•Tawpperkepper (before 2011)•Twap•Socail Feed Manager•Twarc•TAGS: Twitter Archive Google Sheet•Twitter Capture and Analysis Toolset•Netlytic

ReferencesDeveloper Agreement and Policy. (2016) Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/overview/terms/agreement-and-policy Felt, M. (2016, January-June). Social Media and the Social Sciences: How researchers Employ Big Data Analytics Big Data & Society: 1-15. Retrieved from DOI: 10.1177/2053951716645828 Firehose. (2016). Retrieved 10 22, 2016, from Twitter: https://dev.twitter.com/streaming/firehose Library of Congress. (2013, Jan.). Update on the Twitter Archive At the Library of Congress. Retrived from https://www.loc.gov/today/pr/2013/files/twitter_report_2013jan.pdf. Risse, T., Peters, W., Senellart, P., and Maynard, D. (2014). Documenting Contemporary Society by Preserving Relevant Information from Twitter. Retrived from

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.337.9643 Thomson, S. D. (2016). Preserving social media. Digital Preservation Coalition Technology Watch Report 16-01 February 2016. Retrieved from http://dx.doi.org/10.7207/twr16-01 Images: Figure 1: https://about.twitter.com/en-gb/company/brand-assets; https://upload.wikimedia.org/wikipedia/commons/4/41/US-LibraryOfCongress-Logo.svg; Figure 2: https://twitter.com

(

How to Access Twitter Data

•Twitter provides a streaming API that gives raw access to Twitter’s global stream of data

• Creates a “back door” into the current activity on Twitter• Only goes back about a week and does not allow for

searches of historical tweets•Different volumes of streaming data

• From 1% of tweets to the Firehose that opens access to 100% of tweets

• Firehose requires special permission to access•Twitter does not reveal how the 1% sample is selected

Page 2: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 3: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

TANGIBLE, INTANGIBLE, DIGITAL, EPHEMERALTowards a Unified Heritage Classification Scheme

BY MARC CASTELLANI, RACHEL EGAN, CORMAC FITZGERALD, AND DANA LACHENMAYER | PRATT INSTITUTE, LIS 653-02 FALL 2016

Fig. 1 Overlapping

Fields of Cultural Heritage

Cultural Heritage Metadata StructuresMetadata standards often start

as schemas developed by a specialized community in order to enable more accurate item

description. Is there a structure that unites all heritage fields?

1. United Nations Educational, Scientific, and Cultural Organization. Convention for the Safeguarding of the Intangible Cultural Heritage (29 September-17 October 2013). Retrieved from http://www.unesco.org/culture/ich/en/convention

2. Zeng, M., & Qin, J. (2016). Metadata (2nd ed.). Chicago: ALA Neil-Schuman, p. 42.3. Baca, M., Harpring, P., Ward, J., & Beecroft, A. (Eds.). (2014). Metadata standards crosswalk. The

Getty Research Institute. Retrieved from http://www.getty.edu/research/publications/electronic_publications/intrometadata/crosswalks.html

Page 4: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 5: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 6: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 7: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 8: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

LIS 653 Knowledge Organization

Fall 2016 Dr. Cristina Pattuelli

Pratt Institute School of Information

Main References:

Gilman, I. (2006). From marginalization to accessibility: Classification of indigenous materials. Faculty Scholarship (PUL), 6.

Swanson, R. (2015). Adapting the Brian Deer Classification System for Aanischaaukamikw Cree Cultural Institute. Cataloging & Classification Quarterly, 53(5/6), 568-579.

Knowledge Organization Practices of Indigenous

PeopleAnna Holbert & Leslie To

Maori Subject Headings

In 1998, the Maori Subject Headings Working Party formed and went on to develop subject headings in the Maori language. The first group of headings were published in 2005. MSH utilizes controlled vocabularies within the Library of Congress structure. These unique implementations focus on relationships, as opposed to rigid hierarchies, which are central constructs of Maori culture. By introducing a bilingual thesaurus, MSH provides narrower and more specific search results leading to improvements and increase of access to users and researchers.There are now more than 500 subject headings in use today.

Brian Deer Classification

Created in 1974 by Brian Deer it reflected a First Nations epistemological framework and appropriate language. Rather than working within an existing framework, the scheme was developed from scratch. BDCS provided a foundation for institutions to create tailored classification schemes. Libraries could have a First Nation/Inuit/M´etis focus without everything being classified under one subject call number. As a member of the First Nations, Deer better understood the subtleties and worldview making the resulting classification more accessible.

Although the Library of Congress Classification system is an incredible resource for organizing information, it is lacking in specificity and is inherently biased towards non-western cultures and information. The Brian Deer Classification and Maori Subject Headings are two examples of indigenous librarian innovation and the need for flexibility and openness in knowledge organization.

Focus of Research

Page 9: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016
Page 10: LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall 2016

Crowdsourcing in Libraries, Archives, and Museums

What is crowdsourcing?+ Outsourcing work to a crowd+ Often involves “microtasks,” or tasks

not easily accomplished by a computer

+ Began as a money-making business tool, but quickly expanded to volunteer work

Selected References + Brabham, D. (2013). Concepts, Theories, and Cases of

Crowdsourcing. In Crowdsourcing (pp. 1-40). MIT Press. Retrieved from http://www.jstor.org/stable/j.ctt5hhk3m.7

+Ellis, Sally (2014). A History of Collaboration, a Future in Crowdsourcing: Positive Impacts of Cooperation on British Librarianship. International Journal of Libraries & Information Services. 1-10. Retrieved from http://www.crowdconsortium.org/wp-content/uploads/A-History-of-Collaboration-a-Future-in-Crowdsourcing-Positive-Impacts-of-Cooperation-on-British-Librarianship.pdf

+ Oomen, Johan and Lora Aroyo (2011). Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges. Proceedings of the 5th International Conference on Communities and Technologies. 138-149. Retrieved from http://dl.acm.org/citation.cfm?id=2103373

ScribeCurrent Projects

LIS 653-01 + Knowledge Organization + Professor Cristina Pattuelli + Fall 2016

Meg Edison, Karalyn Mark, Katrina Rink, Clair Rock

Smithsonian Transcription Center+ An ongoing crowdsourcing project launched in June 2013 by the Smithsonian Institution + Invites anyone with internet access to contribute transcriptions to a variety of documents provided by 14 of the Smithsonian’s libraries, archives, and museums+ Contributions enable the materials to be text-searchable.

NYPL Labs+ Began to initiate crowdsourcing projects in 2011

+ “What’s on the Menu” enlists volunteers in the transcription of historical menus.

+ “Map Rectifier” prompts amateur cartographers to digitally align ("rectifying") historical maps from the NYPL's collections to match today's precise maps.

+ Contributions culminated in open access release of NYPL’s entire public domain Digital Collection in 2016.

+ An entirely crowdsourced API software used to crowdsource information from large databases of handwritten documents.

+ Uses 3 simple steps to gather information. + Mark

+ Transcribe+ Verify

+ Created and sponsored by NYPL Labs and Zooniverse.