JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

66
A Framework for Aggregating Private and Public Web Archives Mat Kelly Old Dominion University, Norfolk, VA Advisor: Michele C. Weigle JCDL 2015 Doctoral Consortium June 21, 2015

Transcript of JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

Page 1: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for AggregatingPrivate and Public Web Archives

Mat KellyOld Dominion University, Norfolk, VA

Advisor: Michele C. Weigle

JCDL 2015 Doctoral ConsortiumJune 21, 2015

Page 2: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

2

The Problem

JCDL 2015 Doctoral Consortium

privatearchive

privatearchive

otherprivatearchive

otherprivatearchive

Page 3: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

3

All Archives Cannot Be Aggregated

JCDL 2015 Doctoral Consortium

privatearchive

privatearchive

otherprivatearchive

otherprivatearchive

Page 4: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

4JCDL 2015 Doctoral Consortium

Page 5: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

5JCDL 2015 Doctoral Consortium

Page 6: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

6JCDL 2015 Doctoral Consortium

Page 7: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

7JCDL 2015 Doctoral Consortium

Page 8: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

8JCDL 2015 Doctoral Consortium

Page 9: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

9JCDL 2015 Doctoral Consortium

Page 10: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

10JCDL 2015 Doctoral Consortium

Page 11: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

11JCDL 2015 Doctoral Consortium

t = k t = k-1≠

Page 12: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

12JCDL 2015 Doctoral Consortium

Page 13: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

13JCDL 2015 Doctoral Consortium

90 days at a time

Only back to one year!

Page 14: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

14JCDL 2015 Doctoral Consortium

1 year ago 2 year ago 10 year ago

180 year ago

Page 15: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

15JCDL 2015 Doctoral Consortium

privatearchive

Page 16: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

16

Proactive Preservation

• Just-in-time WARC creation• Personal and potentially private web archiving• Mitigates deferral problem

JCDL 2015 Doctoral Consortium

Page 17: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

17

Public vs. PrivateWeb Archiving

• Public Web Archiving– Relies on deferred capture– Uses WARC, Memento, etc.– Integrates with other public web archives

• Private Web Archiving– Same tools, less overhead, less bureaucracy– Uses WARC, Memento, etc.– Does not integrate

JCDL 2015 Doctoral Consortium

Page 18: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

18

Typical Web Archive Access

1. Web User Interface2. Memento

TimeGate TimeMap– Accept-Datetime (content negotiation)

JCDL 2015 Doctoral Consortium

Page 19: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

Aggregating Multiple Web Archives

• Memento Aggregator– Temporally Sorted TimeMap combined from

multiple archives– Allows temporal gaps in one archive to be filled in

by another

Page 20: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

20

Archive Supplementation

• More capturesgreater temporal coverage• Content on Deep Web• A large chunk of the Web is not preserved– Tools’ inability– Inconsistency over time due to personalization

JCDL 2015 Doctoral Consortium

Page 21: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

21

Concerns in Aggregating Private Web Archives

• Privacy• Inconsistency of page representation– URI is insufficient key for access

JCDL 2015 Doctoral Consortium

• Archival integrity– Has private archives content been manipulated?

Page 22: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

22

Why Individuals Might Want Personalized Aggregations

• Show my private web archive captures• Concerned about exposing sensitive info to

public– But still want to view temporally inline

• Private/Restricted Archives are becoming ever more common

JCDL 2015 Doctoral Consortium

Page 23: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

23

Temporal Supplementation

JCDL 2015 Doctoral Consortium

Page 24: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

24

My Archives HaveWhat They May Have Missed

JCDL 2015 Doctoral Consortium

Page 25: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

25

The Concerns Distilled

• Access Control– And indicators for PWA

• Preservation of Private Content• Interoperability without privacy compromise

JCDL 2015 Doctoral Consortium

Page 26: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

26

Web Archive Usage Pattern 1:Direct Access

OR

JCDL 2015 Doctoral Consortium

Page 27: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

27

Web Archive Usage Pattern 2:Web Archive Aggregation

• Better results for a URI due to more sources for capture

JCDL 2015 Doctoral Consortium

Page 28: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

28

Previous Patterns: Status Quo

• Patterns 1 and 2 are status quo– provided by framework

• Querying web archives currently only considers public web content– URI for lookup

• Framework introduces 2 new entities– Memento Meta Aggregator (MMA)– Private Web Archive Adapter (PWAA)

JCDL 2015 Doctoral Consortium

Page 29: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

29

Memento Meta Aggregator (MMA)

• Functional superset of (MA)• Can act as intermediary client to relay MA

results to ultimate user• Allows just-in-time (JIT) inclusion of archives– as specified at query time

• Set of archives aggregated can be dynamic– e.g., Results must not include IA

JCDL 2015 Doctoral Consortium

Page 30: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

30

MY CNN CAPTURES

Aggregating My Captures

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

Various public web archives

My web archives

Page 31: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

31

MY CNN CAPTURES

The Current Memento Aggregator

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

Page 32: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

32

MY CNN CAPTURES

Accessing the Aggregator

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

Page 33: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

33

MY CNN CAPTURES

Accessing the Aggregator…does not include our archives

MY BANK CAPTURES

NOT AGGREGATED

NOT AGGREGATED

JCDL 2015 Doctoral Consortium

100

30

10

140

Page 34: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

Access via the Meta Aggregator

MY CNN CAPTURES

Pattern 3: Aggregator Relay

MY BANK CAPTURES

100

30

10140140

Page 35: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

MY CNN CAPTURES

Web Archive Usage Pattern 4:Including Additional Archives in Aggregation

MY BANK CAPTURES

Access via the Meta Aggregator…allows our archives to be included

100

30

10

15

140155

Page 36: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

36

MY CNN CAPTURES

MMAs Allow Our Public Captures to be Shared

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

15

140155

155

155

Page 37: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

37

MY CNN CAPTURES

Web Archive Usage Pattern 5: Recursive MMA Access

MY BANK CAPTURES

Bob’s public CAPTURES

The organization’s public CAPTURES 1

The organization’s public CAPTURES 2

containsA B C D

ContainsB C D

ContainsC D

A

B C

D

JCDL 2015 Doctoral Consortium

10

5

15

15

20

35

35

15

50

50

Page 38: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

38

New Framework Entity 1:Memento Meta Aggregator

• Allow dynamic and JIT set of archives• Superset can be recursively constructed• Sets can be shared

My public captures

can be integrated

with public web archives’JCDL 2015 Doctoral Consortium

Page 39: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

39

Private Web Archive Adapter (PWAA)

• Regulates access to Private Web Archives (PWAs)

• Acts as token authorizer• With credentials OK, relays results as if

querying the PWA directly

JCDL 2015 Doctoral Consortium

Page 40: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

40

MY CNN CAPTURES

User Establishes Access with PWA

MY BANK CAPTURES

GET TOKEN for PWAKey: abcd1234

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 41: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

41

MY CNN CAPTURES

MMA Relays Request

MY BANK CAPTURES

GET TOKEN for PWAKey: abcd1234

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 42: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

42

MY CNN CAPTURES

PWAA Accepts RequestGenerates Reusable Token

MY BANK CAPTURES

ACCESS OKToken: 4f33c64

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 43: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

43

MY CNN CAPTURES

User Submits Request for URI-R with Token

MY BANK CAPTURES

GET mementos for URIToken: 4f33c64

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 44: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

44

MY CNN CAPTURES

MMA Relays Request (again)

MY BANK CAPTURES

GET mementos for URIToken: 4f33c64

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 45: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

45

MY CNN CAPTURES

PWAA Verified & Relays RequestMA Gets Mementos, per usual

MY BANK CAPTURES

Token: 4f33c64OK

GET mementos for URI

GET mementos for URI

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 46: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

46

MY CNN CAPTURES

Archives Return Mementos

MY BANK CAPTURES

Token: 4f33c64 OKReturning mementos

Return mementosFor URI

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

Page 47: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

47

MY CNN CAPTURES

PWAA Relays TimeMap

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

3

140

10,000

10,000

10,143 140 captures

Page 48: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

48

MY CNN CAPTURES

MMA Annotates and Aggregates

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

10,143

140 captures 3 captures 10,000 captures

Page 49: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

49

MY CNN CAPTURES

Web Archive Usage Pattern 6:Aggregating Public & Private Archives

MY BANK CAPTURES

JCDL 2015 Doctoral Consortium

100

30

10

3 captures

10,000 captures

10,143 captures

Page 50: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

50

MY CNN CAPTURES

Regulated Access Can Be Shared

MY BANK CAPTURES

GET mementos for URIToken: 4f33c64

GET mementos for URIToken: c5463b4

GET TOKEN for PWAKey: 2265eef3

No/invalid tokenreturned

Access denied or0 mementos

JCDL 2015 Doctoral Consortium

3 captures

10,000 captures

Page 51: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

51

Aggregating Multiple PWAs

JCDL 2015 Doctoral Consortium

MY BANK CAPTURES

Linda’s Private Captures

Bob’s Private Captures

GET TOKENs for PWAsKey: abcd1234, Archive: MyKey: cab45cbf, Archive: LindaKey: b0b01b, Archive: Bob

3 captures

5 captures

10 captures

5

310

Page 52: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

52

Aggregating Multiple PWAs

JCDL 2015 Doctoral Consortium

MY BANK CAPTURES

Access OKToken: 7790ca

Access OKToken: b0b01b

ACCESS DENIED

Linda’s Private Captures

Bob’s Private Captures

3 captures

5 captures

10 captures

5

310

Page 53: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

53

PWAs Can Then be Aggregated

JCDL 2015 Doctoral Consortium

MY BANK CAPTURES

GET mementos for URIToken: 7790ca, Archive: MyToken: null, Archive: LindaToken: b0b01b, Archive: Bob

Linda’s Private Captures

Bob’s Private Captures

3 captures

5 captures

10 captures

5

310

310

ø13

Page 54: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

54

Sample TimeMap..., <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 15:57:03 GMT", <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 16:39:39 GMT", <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";

datetime="Tue, 03 Mar 2015 16:28:41 GMT", <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";

datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e", <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";

datetime="Tue, 05 Mar 2015 21:59:22 GMT", <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT", <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";

datetime="Tue, 10 Mar 2015 14:07:21 GMT"...

JCDL 2015 Doctoral Consortium

Page 55: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

55

Access Token Included in TimeMap

..., <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 15:57:03 GMT", <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 16:39:39 GMT", <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";

datetime="Tue, 03 Mar 2015 16:28:41 GMT", <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";

datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e", <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";

datetime="Tue, 05 Mar 2015 21:59:22 GMT", <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT", <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";

datetime="Tue, 10 Mar 2015 14:07:21 GMT"...

JCDL 2015 Doctoral Consortium

MY PRIVATE FACEBOOK CAPTURES

Page 56: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

56

My Public Web Archive, Now Aggregated

..., <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 15:57:03 GMT", <http://web.archive.org/web/20150228163939/http://www.facebook.com/>;rel="memento";

datetime="Sat, 28 Feb 2015 16:39:39 GMT", <http://web.archive.org/web/20150303162841/https://www.facebook.com/>;rel="memento";

datetime="Tue, 03 Mar 2015 16:28:41 GMT", <http://users2machine.local/web/20150305000101/https://www.facebook.com/>;rel="memento";

datetime="Thu, 05 Mar 2015 00:01:00 GMT"; key="e395935019ee467c797034ee410cc91e", <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";

datetime="Tue, 05 Mar 2015 21:59:22 GMT", <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT", <http://web.archive.org/web/20150310140721/https://www.facebook.com/>;rel="memento";

datetime="Tue, 10 Mar 2015 14:07:21 GMT"...

JCDL 2015 Doctoral Consortium

MY PUBLIC FACEBOOK CAPTURES

Page 57: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

57

Evaluation Plan

• How effective is the Framework?• Scalability ramifications of additional

infrastructure?• Is public-private tokenization most suitable

method for persistent access?• How can a single archive be sub-divided

between private/public and access controlled?

JCDL 2015 Doctoral Consortium

Page 58: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

58

Previous WorkPreservation and ReplayPDA 2013 - Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingJCDL 2012 - WARCreate - Create Wayback-Consumable WARC Files from Any Webpage

Evaluating CaptureIJDL 2015 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing ResourcesIJDL 2015 - The Impact of JavaScript on ArchivabilityJCDL 2014 - Not All Mementos Are Created Equal: Measuring The Impact Of Missing ResourcesJCDL 2014 - The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and JavaScriptDlib 2013 - A Method for Identifying Personalized Representations in the ArchivesTPDL 2013 - On the Change in Archivability of Websites Over Time

Archival IntegrationJCDL 2015 - Mobile Mink: Merging Mobile and Desktop Archived WebsJCDL 2014 - Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento

WARCreate – preserve from the browser

WAIL – private web archiving all-in-one suite

Mink – Integrate the live and archived web

SOFTWARE PRODUCTS

PUBLICATIONS

Page 59: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

59

Current Work

• Other approaches of archival lookup beyond URI

• Appropriate metadata to indicate private web content in WARC files

• Existing integration attempts by private web archives & individuals

JCDL 2015 Doctoral Consortium

Page 60: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

Background Research PhD Requirements (Coursework, Qualifying Exam, etc.) Build preliminary framework model JCDL Doctoral Consortium

EXTENDED RESEARCH• Research prevalence of private web archives• Research access control methods in web archiving and other domains• Investigate other access patterns and expound on those defined• PhD Candidacy Exam describing merit of research plan• Implement feedback received from candidacy exam committee• Programmatically implement MMA and PWAA

CASE STUDIES (real-world application)• Publicly Available Non-Aggregated Archives (e.g., Rhizome)• Deep web preservation/access (bank account/Facebook feeds)

• DISSERTATION DEFENSE

Dissertation Plan

Page 61: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

61

Preliminary Publication Plan

JCDL 2016 Evaluation of User Access Patterns for Private Web Archives

TPDL 2016 Methods in adding JIT Inclusion of Private Web Archives in Memento

ACM SACMAT*

Research exploring tokenization and similar methods for archival access establishment

iPres 2016 Research investigating URI clash & other needed identifiers for distinguishing archived content from the “deep web” with archived content from the public live web.

JCDL 2015 Doctoral Consortium

* Symposium on Access Control Models and Technologies

Page 62: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

62

Future Research Questions

• Can a PWAA perform content negotiation[1] on the private-public spectrum?

• What level of security is needed?– e.g., reporting UNAUTHORIZED vs. 0 mementos

JCDL 2015 Doctoral Consortium

[1] RFC2295 https://www.ietf.org/rfc/rfc2295.txt

Page 63: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

63

Summation

• Why?– No means exists to integrate private and public web

archives.• How to Evaluate?– Does this framework fit real world needs? Scalable?

• When will I know I am done? – Any public/private web archive* can be integrated.

JCDL 2015 Doctoral Consortium

* -compliant

Page 64: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

64

References• D. Abrams, R. Baecker, and M. Chignell. Information Archiving with Bookmarks: Personal Web Space Construction and

Archiving. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 41–48, 1998.• A. AlSum, M. Weigle, M. Nelson, and H. Van de Sompel. Profiling Web Archive Coverage for Top-Level Domain and Content

Language. International Journal on Digital Libraries, 14(3-4):149–166, 2014.• J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. Not All Mementos Are Created Equal: Measuring The

Impact Of Missing Resources. In Proceedings of JCDL, pages 321–330, London, England, 2014.• J. F. Brunelle, M. Kelly, M. C. Weigle, and M. L. Nelson. The Impact of JavaScript on Archivability. International Journal on

Digital Libraries, pages 1–23, 2015.• J. F. Brunelle and M. L. Nelson. An Evaluation of Caching Policies for Memento TimeMaps. In Proceedings of JCDL, pages

267–276, 2013.• D. Gomes, S. Freitas, and M. J. Silva. Design and Selection Criteria for a National Web Archive. In Research and Advanced

Technology for Digital Libraries, pages 196–207. Springer, 2006.• D. Hardt. The OAuth 2.0 Authorization Framework. IETF RFC 6749, October 2012.• M. Jones and D. Hardt. The OAuth 2.0 Authorization Framework: Bearer Token Usage. IETF RFC 6750, October 2012.• M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. A Method for Identifying Personalized Representations in the

Archives. D-Lib Magazine, 19(11/12), Nov/Dec 2013.• M. Kelly, J. F. Brunelle, M. C. Weigle, and M. L. Nelson. On the Change in Archivability of Websites Over Time. In Proceedings

of the International Conference on Theory and Practice of Digital Libraries (TPDL), pages 35–47, Valletta, Malta, 2013.• M. Kelly, M. L. Nelson, and M. C. Weigle. Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Using

XAMPP. Poster and demo presented at Personal Digital Archiving, February 2013.• M. Kelly, M. L. Nelson, and M. C. Weigle. The Archival Acid Test: Evaluating Archive Performance on Advanced HTML and

JavaScript. In Proceedings of JCDL, pages 25–28, London, England, September 2014.

JCDL 2015 Doctoral Consortium

Page 65: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for Aggregating Private and Public Web Archives

65

References• M. Kelly and M. C. Weigle. WARCreate - Create Wayback-Consumable WARC Files from Any Webpage. In Proceedings of

JCDL, pages 437–438, Washington, DC, June 2012.• C. C. Marshall. Rethinking Personal Digital Archiving, Part 1. D-Lib Magazine, 14(3/4), Mar/Apr 2008.• C. C. Marshall. Rethinking Personal Digital Archiving, Part 2. D-Lib Magazine, 14(3/4), Mar/Apr 2008.• J. Niu. Functionalities of Web Archives. D-Lib Magazine, 18(3/4), Mar/Apr 2012.• M. Phillips. PANDORA, Australia’s Web Archive, and the Digital Archiving System that Supports It.

http://pandora.nla.gov.au/pandas.html, 2003.• H. C.-H. Rao, Y.-F. Chen, and M.-F. Chen. A Proxy-based Personal Web Archiving Service. SIGOPS Oper. Syst. Rev., 35(1):61–72,

Jan. 2001.• A. Rauber, M. Kaiser, and B. Wachter. Ethical Issues in Web Archive Creation and Usage-Towards a Research Agenda. In 8th

International Web Archiving Workshop (IWAW08), 2008.• D. Rosenthal. Re-thinking Memento Aggregation. http://blog.dshr.org/2013/03/re-thinking-memento-aggregation.html,

2013.• T. Schwarz, M. Baker, S. Bassi, B. Baumgart, W. Flagg, C. van Ingen, K. Joste, M. Manasse, and M. Shah. Disk Failure

Investigations at the Internet Archive. In Work-in-Progess session, NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST2006), 2006.

• S. Strodl, F. Motlik, K. Stadler, and A. Rauber. Personal & Soho Archiving. In Proceedings of JCDL, pages 115–123, 2008.• M. Thelwall and L. Vaughan. A fair history of the Web? Examining country balance in the Internet Archive. Library &

Information Science Research, 26(2):162–176, 2004.• B. Tofel. ‘Wayback’ for Accessing Web Archives. In 7th International Web Archiving Workshop (IWAW07), 2007.• H. Van de Sompel, M. Nelson, and R. Sanderson. HTTP Framework for Time-Based Access to Resource States – Memento.

IETF RFC 7089, December 2013.• T. Wang, M. Srivatsa, and L. Liu. Fine-Grained Access Control of Personal Data. In Proceedings of the 17th ACM Symposium

on Access Control Models and Technologies, pages 145–156, 2012.

JCDL 2015 Doctoral Consortium

Page 66: JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Public Web Archives

A Framework for AggregatingPrivate and Public Web Archives

Mat KellyOld Dominion University, Norfolk, VA

Advisor: Michele C. Weigle

JCDL 2015 Doctoral ConsortiumJune 21, 2015