JCDL 2013 DOCTORAL CONSORTIUM

40
Digital Preservation: a New Approach from Computational Intelligence Jose Antonio Olvera Cañizares TECNIO EASY Innovation Center University of Girona Advisor: Dr. Josep Lluís de la Rosa JCDL ’13 Doctoral Consortium 1 TECNIO – Centre EASY Universitat de Girona

description

JCDL 2013 DOCTORAL CONSORTIUM

Transcript of JCDL 2013 DOCTORAL CONSORTIUM

  • 1. Digital Preservation: a NewApproach from ComputationalIntelligenceJose Antonio Olvera CaizaresTECNIO EASY Innovation CenterUniversity of GironaAdvisor: Dr. Josep Llus de la RosaJCDL 13 Doctoral Consortium1

2. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements2 3. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements3 4. Context and situation Digital information generated in all areas ofour society is growing at an exponential pace: In 2011, exceeded 1.8 zettabytes (1.8 trillion ofgygabytes)[1] Grows by a factor of 9 every 5 years[1] Gantz, J. and Reisel, D. (2011). ExtractingValue from Chaos. DOI=http://idcdocserv.com/1142 4 5. Context and situation Problems to solve in DP A recent review of DP research noted that despite twentyyears of active research, there is still a lot of work to solvethe core problems [2] The level of automation in DP solutions is low The scalability of existing preservation solutions has beendemonstrated to be poor Solutions have often not been properly tested against diversedigital resources or in heterogeneous environments As said in the Objective ICT-2011.4.3 Digital Preservation(EU FP7 call 6 of 2012)1, Self-preserving objects are seenby many as the Holy Grail of preservation, but noindividual Research team has the capacity to address thisproblem[2] Hugo Quisbert. On long-term digital preservation information systems : a framework andcharacteristics for development. PhD Thesis. Lule University of Technology. DOI=5http://epubl.ltu.se/1402-1544/2008/77/LTU-DT-0877-SE.pdf 6. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements6 7. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements7 8. Statement of thesis The theoretical basis of this thesis is in theframework of Computational Intelligence,dealing specially with intelligent agents Computational Intelligence is a branch ofArtificial Intelligence focused on the study ofadaptive mechanisms and intelligent behaviorof complex and dynamic systems8 9. Statement of thesis Computational Intelligence meets the followingrequirements of DP: Scalability: The exponential growth of digitally bornobjects requires of scalable solutions from thetechnological point of view Cost: Associated to the exponential growth becausethere are limited resources to cope with DP Uncertain future: DP is about heuristics of whatresults we will get in the futureThe result of the PhD Thesis will be its proof of concept9 10. Statement of thesisThe actual perspective Our perspectiveInstitutionsPersons and companiesTop-downresources capacity knowledgeresources capacity knowledgeDigital ObjectsInstitutionsPersons and companiesBottom-upresources capacity knowledgeresources capacity knowledgeDigital Objects10 11. Statement of thesis My thesis will study what self-preservation behaviorsneed the self-preserving digital objects (SPDO), basedin computation intelligence (CI), and related methodsof cost management under their own budget,powered by a social network as an environment thatenables their behavior under the policy thatpreservation is to share. In this concept, digital objects become active actors intheir own LTDP, which has a DP budget devoted tofunding the replication of the objects and otheroperations such as format migration or moving througha social network of users; in all, a controlledenvironment where they will live.11 12. Statement of thesis It is divided into three confluent areas: (A1) Behavior model (A2) SPDO architecture fromthe agents (A3) Social environment12 13. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements13 14. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements14 15. Background and related work BucketsThe very first evidence of object-centric paradigm.Buckets were designed to imbue information objects withcertain responsibilities, such as the display, dissemination,protection, and maintenance of their contents LOCKSS (Lots Of Copies Keep Stuff Safe)Open-source, library-led digital preservation system builton the principle that lots of copies keep stuff safe MUSE (Memories USing Email)Provides four novel types of cues to help spot interestingtrends and messages in a large-scale email archiveObject-centricparadigmPreservation isto share15 16. Background and related work BRTF (Blue Ribbon Task Force on SustainableDigital Preservation and Access)Was the first systematic attempt to focus not just on thecost of managing information over time, but on theeconomic framework that is required to allow that tohappen KRDS/I2S2A toolkit for establishing the value chain and benefitsanalysis of digital preservationCostmanagement16 17. Background and related work Synergic works on computational ecologies [3][4][5]This research is a continuation of the first series of studiesapplying computational ecologies to DP, as noted inprevious work in [5], in which was used SwarmIntelligence for DPComputationalecologies[3] de la Rosa J. L., Hormazbal N., Aciar S., Lopardo G., Trias A., and Montaner M. 2011. A Negotiation StyleRecommender Based on Computational Ecology in Open Negotiation Environments, ISSN: 0278-0046, IEEE Trans. onIndustrial Electronics 58 (6) 2073-2085, June 2011[4] Hogg, T. and Huberman, B.A. 1991. Controlling chaos in distributed systems, IEEE Trans. Syst. Man Cybernetics21 (6) 1325, 1991[5] de la Rosa, J. L., Trias, A., del Acebo, E., Aciar, S., and Quisbert, H. (2009). Crew Intelligence Systems for Digital17Objects Preservation, SIAAS-09 2nd Swarm Intelligence Algorithms and Applications Symposium, April 6-9, 2009 18. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements18 19. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements19 20. Research questions (A1) Behavior model Evolutionary Computation techniques that back thepreservation of SPDOs Swarm Intelligence techniques that back the preservation ofSPDOs (A2) SPDO architecture from the agents Social skills that determine how they interact with each other How they must manage their budget (devoted to funding thereplication of files and other operations such as formatmigration or copying themselves through the social network ofusers) What rules must be applied to determine its mission How they will manage their copies, that are distributed in thesocial network20 21. Research questions (A3) Social environment How the social network must technically be What are the social behaviors that supports the taskof preservation of the digital objects (rules ofcollaboration) What are the topologies of social networks that backbetter the digital objects and how to promote thesetopologies The sign in and sign out management of the users ofthe network The definition of compensation mechanisms for usersto promote user engagement21 22. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements22 23. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements23 24. Research goals A state of the art from the articles published on the topic Work in the three confluent areas explained before (A1, A2and A3) separately. The work is being done on a simulatedenvironment where results will be obtained, analyzed andmade partial publication of these results in journals I have some preliminary results in A1 with [6]. As a result of this research, the state of the art will beextended Finally a prototype will be implemented in a realenvironment in order to assure the DP requirementsexplained before[6] de la Rosa J. L. and Olvera J.A. First Studies on Self-Preserving Digital Objects.Artificial Intelligence Research & Dev., Procs 15th Intl Conf. of the Catalan Assoc.for Artificial Intelligence, CCIA 2012, Vol.248, pp: 213-222, 2012, Alacant, Spain.24 25. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements25 26. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements26 27. Dissertation status Read literature Analyze CI techniques Implement preliminary simulator Realize the state of the art (including thisconference) Implement robust platform of simulation Experimentation in the areas A1 and A3 Publication of work in conferences and journal Experimentation in the area A2 Publication of work in journals Implement a real prototype Exploit results Finish writing the dissertationPhD defenseCurrent state27 28. Dissertation status VIDEO!28 29. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements29 30. CONTENTS Context and situation Statement of thesis Background and related work Research questions Research goals Dissertation status Expected achievements30 31. Expected achievements We expect to obtain with our platform ofsimulation, the 99% of readability of the digitalobjects at the rates of 10%, 33% and 50% of newsoftware adoption waves along cycles of 3 times5 years (that is 20 years of simulation). And with the real prototype it is intended toachieve the requirements that have beenexposed, necessary for DP: Scalability Cost Heuristics for Uncertain future31 32. Digital Preservation: a NewApproach from ComputationalIntelligenceJose Antonio Olvera CaizaresTECNIO EASY Innovation CenterUniversity of GironaAdvisor: Dr. Josep Llus de la RosaJCDL 13 Doctoral Consortium32 33. 33 34. Statement of thesis We make a useful abstraction: Work on a level that is not in detail but will be useful for a morespecific level We will investigate in: Behavior of Digital Objects: copies, migration (primitives!) Budget of Digital Objects Environment: no matter the system. It will be a social network ofpeople sharing Digital Objects, where objects and computers changeEven self-preserving objects is a promising approach, thereare few evidences or experimental work [Nelson 2001, 2010,de la Rosa et al. 2009, 2010; Olvera et al. 2012]showing theircontribution to digital preservation with better scalabilityand accurate cost management. That is why we propose aproof of concept of self preserving objects.34 35. Statement of thesis Our perspective:InstitutionsPersons and companiesbottom upresources capacity knowledgeresources capacity knowledgeDigital Objects35 36. Paradigma SPDO Mesura de la resilincia Entropia de Shannon: Com ms alt s el valordentropia preveiem millor preservabilitatnH ( x ) p logpi 2 i i1 p pi j i j nH xni j151, 2 , log 5) ( 1,i ,j,ki ki jffpOn: n s el total d'objectes digitals originals j sn els diferents formats que hi ha, que sn 5 pi,j s el percentatge que suposen les cpies de format j respecte el total de cpiesque t un objecte digital original i k sn els diferents formats que hi haAdaptat alsmodels simulats36 37. Context and situation37 38. SIMULATIONSOFTWARE CHANGE1 2 3 4 5format={ }Format ivmdaetog:e: 13 243812345User agentDigitalObjectsand theirpossibleformatsUsers affectedby softwarechanges 39. Dissertation status39 40. 21.81.61.41.210.80.60.40.201 21 41 61 81 101 121 141 161 181 201 221501X 502X 503X 504X 505X 506X 507X508X 509X 510X 511X 512X 513X 514X