정보 생애주기에 따른 데이터 보존을 위해 고려할 사항- 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 -2010. 4. 1.정영임한국과학기술정보연구원 정보유통본부 지식기반실
- 2 -Table of ContentsDigital Archiving in the Framework of Information Life Cycle ManagementCreationAcquisitionCataloging/IdentificationStoragePreservationAccess
Digital Archiving in the Framework of Information Life Cycle ManagementDigital archiving frameworkConsidered at all stages of the information life cycle managementInformation life cycleCreationAcquisitionCataloging/IdentificationStoragePreservationAccess- 3 -
CreationCreation Defined as an act of producing the information product in the broadest senseShould be regarded as a starting point of long-term and preservationSuggestion of provision of a preservation indicator for creators U.S. Department of Agriculture’s Digital Publications Preservation Steering CommitteeEstablishment of guidelines for creators Oak Ridge National Laboratory, USA  A Guide To Record Series Supporting Epidemiological Studies Conducted for the Department of EnergyLimits on softwareFormat and layout of the documents- 4 -
CreationAdaption of Standard Descriptive LanguagesStandard groups incorporate XML and RDF architectures Attachment of Metadata on Digital Contents- 5 -
Acquisition and Collection DevelopmentThree main aspects to acquisition of digital objects Collection policiesGathering methodsIntellectual Property Concerns- 6 -
Establishment of Collection PoliciesCollection policiesSelecting What to ArchivePurposeFor Dark Archiving: Back issueFor Light Archiving: Current issueCriteria Easiness of Content AcquisitionQuality of Contents UtilizationOn-going access feeContent Type Coverage: E-journals/R&D Reports/Patents/Scientific DataDetermining ExtentArchiving LinksRefreshing the Archived Contents- 7 -
Considerations on Gathering MethodGathering methodsHand selectionValue Judgment and Retention Scheduling (Edinburgh University Library)Not preserved Preserved for defined period Preserved indefinitely Automatic selectionNational Library of Sweden: Automatic acquisition without making value judgment (priority: periodicals, static documents, HTML pages >> conferences, usenet groups, ftp archives)EVA projects: Establishment of time limits to avoid the overloading- 8 -
Considerations on Intellectual Property ConcernsReliance on LegislationFreedom of Information Act 2001The public may have unrestricted access to certain records. (Consider what categories of information may need to be viewed by the public - these records need to remain accessible at all times.)In general, due to absence of international digital deposit legislationPANDORA project seeks permission from the copyright ownerSwedish and Finnish national library projects do not contact the ownersMaking Agreement with Content ProvidersE-journal: Publishers or academic associationsCLIR/DLF draft model license, NESLi2 Standard license modelAgreement of Cornell University with publishersGovernment document: Open to publicScientific data: individual creators or data centersArts and Humanities Data Service provide information on what is needed for a digital archive and what creators are likely to be willing to deposit- 9 -
Agreement of Cornell University with PublishersTopics identified in the agreement(Thomson and Kroch, 2000)The general responsibilities of the publishers and Cornell Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited Guidelines on transmission methods and media for deposit Procedures for the deposit Procedures and protocols Cornell will use to verify the arrival and completeness of the data Rights of the depositing organizations to audit the repository The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data Access policies for users of the repository, and how they may vary over time Conditions on the use of the data, and again how they may vary over time Fees (if any) associated with the deposit Cornell's ability to share the data with partners to create an agreed-upon level of redundancy Clarification of issues surrounding copyright retained by authors - 10 -
Identification and CatalogingIdentificationProvision of a unique key for finding the digital object and linking object to other related objectsCataloging in the form of metadataSupport for organization, access and curation- 11 -
Persistent IdentificationProblems in using URL as IdentifierUse of server as location identifier can result in lack of persistent over time both for the source object and any linked objectsContinuous use of URLNew approaches on persistent identificationOCLC: PURLsACS: Digital Object Identifier (DOI), MN (Manuscript Number)DTIC: Handle® systemAAS: Bibcode, PubRef numbers- 12 -
Creation of Metadata at Cataloging Stage (1/3)Creation Method of MetadataManual creation of metadataAutomatic generation of metadataA project by US Environmental Protection AgencyDefense Information Technology Testbed project- 13 -
Creation of Metadata at Cataloging Stage (2/3)Formats of Descriptive MetadataE-journalFull MARC cataloging Traditional library cataloging standardsNLA’s PANDORA ArchiveCurrent development of descriptive metadata standardsMARCXML, MODS(Metadata Object Descriptive Schema)Web-based resources Dublin Core-like format EVA projectNon-textual dataIdentification of metadata elements needed for non-textual data types such as images, video, multimedia and othersZ39.87 NISO/AIIM Technical metadata for digital still imagesAES X089 core audio metadata- 14 -
Creation of Metadata at Cataloging Stage (3/3)Management of Heterogeneous Metadata FormatTranslation between various metadata formatsKey to the development of networked, heterogeneous archivesAdaption of packaging metadata standardsOpen Archival Information System (OAIS) Reference ModelIs developed by ISO Consultative Committee for Space DataSystemsEncapsulates specific metadata as needed for each object type in a consistent data modelMetadata Encoding and Transmission Standard (METS) Is produced by Library of Congress Standards Office and Digital Library FederationProvides framework for holding all types of metadata for digital objectOthersMPEG-21 Digital Item Declaration LanguageIMS Global Learning Consortium Content Packaging StandardsSharable Content Object Reference Model (SCORM)CCSDS XML Packaging scheme- 15 -
Development of Technical Model for StorageRecommendation for Developing a technical model for the repository (Cornell University)Establishing a baseline of e-journal software and file format needs Specify the archival repositorySpecifying monitoring tools that will flag documents within the repository that require migrationSpecifying a baseline hardware and software infrastructure to house the repositoryExploring the need and implementation models for redundancy in the repository- 16 -
Issues on Changing Storage MediaProblem of changing storage mediaBlock size, tape size and tape drive mechanism have changed over time.Common SolutionData migration to new storage systemsMuch cost and imperfect transferring system is still an issue.Check/validation algorithms are extremely importantManual check is still necessary.Atmospheric Radiation Monitoring Center plans to migrate to new storage systems every 4-5 yearsEach data migration will take 6-12 months- 17 -
Issues on Terabytes of Data StorageProblem of dealing with large-scale dataExtensive validation routines to ensure the quality of the information as the information is migratedNCBI has 30 Ph.D.s reviewing the information manually, even after it has passed a variety of validation algorithmsSimilar cost has been spent forCorrections and additions to particular recordsMaintenance of a history of changesApproval by the owner of all changes controlled by NCBICommon SolutionLarge-scale data can be stored in different file formatsBiological sequence data is held in simple ASCII files for preservation purposes.Data in a structured database is provided for searching, reporting and maintenanceExtensive tasks can be transitioned to a non-profit consortiaProtein Data Bank: Collaboratory for Structured Bioinformatics - 18 -
PreservationLong-term preservationNo common agreement on the definition of long-term preservationMain aspects on preservationSelection of digital preservation strategies/technologiesCycle for hardware/software migration No specific investigation on the cycle for hw/sw migration has been done.Depending on the particular technologies and subject disciplines, it can be vary from 2 to 10 years.Preservation of the “look and feel” of digital contents- 19 -
Digital Preservation StrategiesBitstream CopyingRefreshingDurable/Persistent MediaTechnology PreservationDigital ArchaeologyAnalog BackupsMigration (SW, HW migration)ReplicationReliance on StandardsNormalizationCanonicalizationEmulationEncapsulationUniversal Virtual Computer- 20 -
Hardware and Software MigrationProblems on MigrationMigration is not guaranteed to work for all data typesMigration of information products having used sophisticated software feature is unreliableGenerally, there is no backward compatibility, and if it is possible, there is certainly loss of integrity in the result.Emulation as an alternative to migrationEncapsulates the behavior of the hardware/software with the objectsMS Word 2000 document with metadata indicating how to reconstruct the document at the engineering levelCreates an emulation registry identifying the HW/SW environment and providing information on how to recreate the environment- 21 -
Advantages and Disadvantages of Preservation Strategies- 22 -
Selection of Preservation StrategiesA schematic diagram for selection of preservation techniques of digital information. (Lee et al, 2002)- 23 -
Preservation of the Look and FeelFormat of materials In order to save the “look and feel” of materialTIFFThe most prevalent for those organizations involved with the conversion of paper back fileE.g.) JSTORThis does not allow the embedded references to be active hyper linksSGML/HTMLUsed by many large publishers after years of converting publication systems from proprietary format to SGMLAmerican Astronomical Society (AAS)PDFThe most prevalent format for purely electronic documents used for both formal publications and grey literatureNational Library of SwedenConcerns remain for long-time preservationIt may not be accepted as a legal depository form because of its proprietary nature- 24 -
Normalization vs. Native FormatsNormalizationProcess of converting the native format to a standard formatAAS, ACS transform the incoming file into SGML-tagged ASCII formatElectronic master copy is able to serve as the robust electronic archival copy.Well-tagged copy can be updated periodically, at very little cost.It takes advantage of advances in both technology and standards.Content remains unchanged, but the public electronic version can be updated to remain compatible with the browsers and other access technologyExamples of data normalization provided data communityNASA Data Active Archive CentersTransform incoming satellite and ground monitoring information into standard Common Data FormatU.K’s National Digital Archive of DatasetsTransforms the native format into one of its own devisingNormalized formats are considered to be the archival versionsIntellectual property question- 25 -
Reliance on StandardsEmphasis on StandardsDOE OSTI Limited the number of acceptable input formatsText in SGML (and its relatives HTML and XML), PDF, WordPerfect and Word.Image in TIFF Group4 and PDF Image- 26 -
Preservation Strategies Used in Major Projects- 27 -CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, EJO: Ohio LINK Electronic Journal Center KB: KB e-Depot, KOP: Kopal DDB, LA: LOCKSS Alliance, LANL: Los Alamos National Laboratory Research Library, NLA: National Library of Australia PANDORA, OSP: Ontario Scholars Portal, PMC: PubMed Central, PORT: Portico
Issues on AccessAccess MechanismsAccess and display mechanismsProviding accessRestricting accessRights Management and Security RequirementsSecurity and version controlCreation metadata to manage encryption, watermarks, digital signatures- 28 -
Access MechanismsProviding Access NLM’s Profiles in ScienceCreates an electronic archive of the photographs, text, video, etcElectronic archive is used to create new access versions as access mechanisms change Providing access technologiesSuper DistributionValue-chain supportRestricting AccessUsage rulePersistent protection- 29 -
AccessRights Management and Security RequirementsMost difficult access issues for digital archivingSecurity and version control impact digital archivingRight management includes providing or restricting access as appropriateContent protection technologiesContents EncryptionTrusted EnvironmentMetadata for managing encryption, watermarks, digital signatures needs to be created.- 30 -
ReferencesCLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23] Hodge, 2000. Best Practices for Digital Archiving: An Information Life Cycle Approach, D-Lib Magazine:6(1) [online] [cited 2009-07-23] < http://www.dlib.org/dlib/january00/01hodge.html>Hodge et al, 2004. Digital Preservation and Permanent Access to Scientific Information, [online] [cited 2009-07-23] ICPSR, 2009. Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems [online] [cited 2009-12-03] http://www.icpsr.umich.edu/dpm/index.htmlKenney, A. R., Entlich, R., Hirtle, P. B., McGovern, N. Y. and Buckley E. L., 2006. E-Journal Archiving Metes and Bounds: A Survey of the Landscape[online] [cited 2009-12-03] Lee, K., Slattery, O., Lu, R., Tang, X. and McCrary, V. 2002. The State of the Art and Practice in Digital Preservation, Journal of Research of the National Institute of Standards and Technology: 107(1), 93-106.Thomas, S. E. and Kroch, C. A. 2000, Project Harvest: The Cornell University Library's Proposal to The Andrew W. Mellon Foundation To Develop a Repository for E-Journals, [online] [cited 2010-03-26] <http http://www.diglib.org/preserve/cornellprop.htm >Edinburgh University Library Digital Archives Research Project. A report and recommendations- 31 -

More Related Content

PPT
Digital Preservation
PDF
Digital preservation: an introduction
PPT
Digital Preservation
PDF
Digital preservation from a records management perspective
PDF
Intro to Digital Preservation
PPT
Brief Introduction to Digital Preservation
PPTX
Digital preservation: an introduction
Digital Preservation
Digital preservation: an introduction
Digital Preservation
Digital preservation from a records management perspective
Intro to Digital Preservation
Brief Introduction to Digital Preservation
Digital preservation: an introduction

What's hot (19)

PPTX
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
PPT
Getting started in digital preservation
PPTX
Preparation, Proceed and Review of preservation of Digital Library
PPTX
Completepresentation
PPT
Digital preservation
PPTX
Digital Preservation Best Practices: Lessons Learned From Across the Pond
PPT
Digital preservation
PPT
An Introduction to Digital Preservation
PPTX
Data preservation
PDF
ARCLib project presentation from Pasig 2016
PPT
The Role of OAIS Representation Information in the Digital Curation of Crysta...
ZIP
Digital Preservation
PDF
iRODS UGM 2016 Preso Summary FINAL
PPT
Repositories and digital preservation
PPT
Digital Preservation
PPT
Strategies for the curation of CAD Engineering Models
PPT
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
PPT
Integrated research data management in the Structural Sciences
PPT
Trm Trusted Repositories
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Getting started in digital preservation
Preparation, Proceed and Review of preservation of Digital Library
Completepresentation
Digital preservation
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital preservation
An Introduction to Digital Preservation
Data preservation
ARCLib project presentation from Pasig 2016
The Role of OAIS Representation Information in the Digital Curation of Crysta...
Digital Preservation
iRODS UGM 2016 Preso Summary FINAL
Repositories and digital preservation
Digital Preservation
Strategies for the curation of CAD Engineering Models
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Integrated research data management in the Structural Sciences
Trm Trusted Repositories
Ad

Similar to 20100401 정영임 da 전략 tft_0330 (20)

PPT
Getaneh Alemu
PPT
Gettingstartedwithdigitalcollectionsweb[1]
PPT
Digital Preservation
PPT
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
PDF
Preservation Metadata Initiatives and Standards
PDF
Standard based Electronic Archiving for Clinical Trials
PPT
D.3.1: State of the Art - Linked Data and Digital Preservation
PPT
The digital preservation technical context
PPT
Trm Introduction
PPTX
Presentation arsip nov 2012 frans smit handout
PPT
Introduction to Digital Preservation
PDF
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
PPT
Preservation Metadata, Michael Day, DCC
PPT
Trm Vilnius Metadata New
PPT
PRESERVATION Web archiving
PDF
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
PDF
Oais Based Information Flow Esther Conway
PDF
Digital Preservation (UWE)
PPT
Digital preservation geoscinfo
PPT
Preservation Metadata
Getaneh Alemu
Gettingstartedwithdigitalcollectionsweb[1]
Digital Preservation
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
Preservation Metadata Initiatives and Standards
Standard based Electronic Archiving for Clinical Trials
D.3.1: State of the Art - Linked Data and Digital Preservation
The digital preservation technical context
Trm Introduction
Presentation arsip nov 2012 frans smit handout
Introduction to Digital Preservation
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Preservation Metadata, Michael Day, DCC
Trm Vilnius Metadata New
PRESERVATION Web archiving
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Oais Based Information Flow Esther Conway
Digital Preservation (UWE)
Digital preservation geoscinfo
Preservation Metadata
Ad

More from glorykim (13)

PPT
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
PPTX
2010 0603 황혜경_해외저널_0603
PPTX
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602
PPTX
2010 0603 이상호_과학데이터 아카이빙-이상호
PPTX
20100511 최선희 사업추친체계_20100511 최선희 송부용
PPTX
20100526 노경란 우선적용분야및대상
PPT
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
PDF
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
PPTX
20100407 박진호 d_lifecycle_kisti
PPT
20100407 이규철 digital archiving
PPTX
20100401 황혜경 디지털아카이빙계획v03312010
DOC
20100401 신진섭 아카이빙 관련 법제도정리
PDF
6호 디지털자료 보존과 저작권
2010 06 22_김희정_digital archiving - kisti (2010 6) - 완료
2010 0603 황혜경_해외저널_0603
2010 0603 최명석_웹 아카이빙-글꼴포함-20100602
2010 0603 이상호_과학데이터 아카이빙-이상호
20100511 최선희 사업추친체계_20100511 최선희 송부용
20100526 노경란 우선적용분야및대상
20100511 신진섭 [발표자료]디지털 자료의 보존과 저작권20100511v1.0
20100413 노경란 선진-주요국의_디지털_아카이빙_프로젝트_사례조사(0407)
20100407 박진호 d_lifecycle_kisti
20100407 이규철 digital archiving
20100401 황혜경 디지털아카이빙계획v03312010
20100401 신진섭 아카이빙 관련 법제도정리
6호 디지털자료 보존과 저작권

20100401 정영임 da 전략 tft_0330

  • 1. 정보 생애주기에 따른 데이터 보존을 위해 고려할 사항- 국가 디지털 아카이빙 전략 연구 TF 내부 세미나 -2010. 4. 1.정영임한국과학기술정보연구원 정보유통본부 지식기반실
  • 2. - 2 -Table of ContentsDigital Archiving in the Framework of Information Life Cycle ManagementCreationAcquisitionCataloging/IdentificationStoragePreservationAccess
  • 3. Digital Archiving in the Framework of Information Life Cycle ManagementDigital archiving frameworkConsidered at all stages of the information life cycle managementInformation life cycleCreationAcquisitionCataloging/IdentificationStoragePreservationAccess- 3 -
  • 4. CreationCreation Defined as an act of producing the information product in the broadest senseShould be regarded as a starting point of long-term and preservationSuggestion of provision of a preservation indicator for creators U.S. Department of Agriculture’s Digital Publications Preservation Steering CommitteeEstablishment of guidelines for creators Oak Ridge National Laboratory, USA A Guide To Record Series Supporting Epidemiological Studies Conducted for the Department of EnergyLimits on softwareFormat and layout of the documents- 4 -
  • 5. CreationAdaption of Standard Descriptive LanguagesStandard groups incorporate XML and RDF architectures Attachment of Metadata on Digital Contents- 5 -
  • 6. Acquisition and Collection DevelopmentThree main aspects to acquisition of digital objects Collection policiesGathering methodsIntellectual Property Concerns- 6 -
  • 7. Establishment of Collection PoliciesCollection policiesSelecting What to ArchivePurposeFor Dark Archiving: Back issueFor Light Archiving: Current issueCriteria Easiness of Content AcquisitionQuality of Contents UtilizationOn-going access feeContent Type Coverage: E-journals/R&D Reports/Patents/Scientific DataDetermining ExtentArchiving LinksRefreshing the Archived Contents- 7 -
  • 8. Considerations on Gathering MethodGathering methodsHand selectionValue Judgment and Retention Scheduling (Edinburgh University Library)Not preserved Preserved for defined period Preserved indefinitely Automatic selectionNational Library of Sweden: Automatic acquisition without making value judgment (priority: periodicals, static documents, HTML pages >> conferences, usenet groups, ftp archives)EVA projects: Establishment of time limits to avoid the overloading- 8 -
  • 9. Considerations on Intellectual Property ConcernsReliance on LegislationFreedom of Information Act 2001The public may have unrestricted access to certain records. (Consider what categories of information may need to be viewed by the public - these records need to remain accessible at all times.)In general, due to absence of international digital deposit legislationPANDORA project seeks permission from the copyright ownerSwedish and Finnish national library projects do not contact the ownersMaking Agreement with Content ProvidersE-journal: Publishers or academic associationsCLIR/DLF draft model license, NESLi2 Standard license modelAgreement of Cornell University with publishersGovernment document: Open to publicScientific data: individual creators or data centersArts and Humanities Data Service provide information on what is needed for a digital archive and what creators are likely to be willing to deposit- 9 -
  • 10. Agreement of Cornell University with PublishersTopics identified in the agreement(Thomson and Kroch, 2000)The general responsibilities of the publishers and Cornell Characteristics of the data, accompanying metadata, and any additional documentation that are to be deposited Guidelines on transmission methods and media for deposit Procedures for the deposit Procedures and protocols Cornell will use to verify the arrival and completeness of the data Rights of the depositing organizations to audit the repository The respective roles, responsibilities, and rights of the Cornell and the data producers with regard to the data Articulation of Cornell's responsibilities and capabilities with regard to the accessioning, description, management, and even transformation of the deposited data Access policies for users of the repository, and how they may vary over time Conditions on the use of the data, and again how they may vary over time Fees (if any) associated with the deposit Cornell's ability to share the data with partners to create an agreed-upon level of redundancy Clarification of issues surrounding copyright retained by authors - 10 -
  • 11. Identification and CatalogingIdentificationProvision of a unique key for finding the digital object and linking object to other related objectsCataloging in the form of metadataSupport for organization, access and curation- 11 -
  • 12. Persistent IdentificationProblems in using URL as IdentifierUse of server as location identifier can result in lack of persistent over time both for the source object and any linked objectsContinuous use of URLNew approaches on persistent identificationOCLC: PURLsACS: Digital Object Identifier (DOI), MN (Manuscript Number)DTIC: Handle® systemAAS: Bibcode, PubRef numbers- 12 -
  • 13. Creation of Metadata at Cataloging Stage (1/3)Creation Method of MetadataManual creation of metadataAutomatic generation of metadataA project by US Environmental Protection AgencyDefense Information Technology Testbed project- 13 -
  • 14. Creation of Metadata at Cataloging Stage (2/3)Formats of Descriptive MetadataE-journalFull MARC cataloging Traditional library cataloging standardsNLA’s PANDORA ArchiveCurrent development of descriptive metadata standardsMARCXML, MODS(Metadata Object Descriptive Schema)Web-based resources Dublin Core-like format EVA projectNon-textual dataIdentification of metadata elements needed for non-textual data types such as images, video, multimedia and othersZ39.87 NISO/AIIM Technical metadata for digital still imagesAES X089 core audio metadata- 14 -
  • 15. Creation of Metadata at Cataloging Stage (3/3)Management of Heterogeneous Metadata FormatTranslation between various metadata formatsKey to the development of networked, heterogeneous archivesAdaption of packaging metadata standardsOpen Archival Information System (OAIS) Reference ModelIs developed by ISO Consultative Committee for Space DataSystemsEncapsulates specific metadata as needed for each object type in a consistent data modelMetadata Encoding and Transmission Standard (METS) Is produced by Library of Congress Standards Office and Digital Library FederationProvides framework for holding all types of metadata for digital objectOthersMPEG-21 Digital Item Declaration LanguageIMS Global Learning Consortium Content Packaging StandardsSharable Content Object Reference Model (SCORM)CCSDS XML Packaging scheme- 15 -
  • 16. Development of Technical Model for StorageRecommendation for Developing a technical model for the repository (Cornell University)Establishing a baseline of e-journal software and file format needs Specify the archival repositorySpecifying monitoring tools that will flag documents within the repository that require migrationSpecifying a baseline hardware and software infrastructure to house the repositoryExploring the need and implementation models for redundancy in the repository- 16 -
  • 17. Issues on Changing Storage MediaProblem of changing storage mediaBlock size, tape size and tape drive mechanism have changed over time.Common SolutionData migration to new storage systemsMuch cost and imperfect transferring system is still an issue.Check/validation algorithms are extremely importantManual check is still necessary.Atmospheric Radiation Monitoring Center plans to migrate to new storage systems every 4-5 yearsEach data migration will take 6-12 months- 17 -
  • 18. Issues on Terabytes of Data StorageProblem of dealing with large-scale dataExtensive validation routines to ensure the quality of the information as the information is migratedNCBI has 30 Ph.D.s reviewing the information manually, even after it has passed a variety of validation algorithmsSimilar cost has been spent forCorrections and additions to particular recordsMaintenance of a history of changesApproval by the owner of all changes controlled by NCBICommon SolutionLarge-scale data can be stored in different file formatsBiological sequence data is held in simple ASCII files for preservation purposes.Data in a structured database is provided for searching, reporting and maintenanceExtensive tasks can be transitioned to a non-profit consortiaProtein Data Bank: Collaboratory for Structured Bioinformatics - 18 -
  • 19. PreservationLong-term preservationNo common agreement on the definition of long-term preservationMain aspects on preservationSelection of digital preservation strategies/technologiesCycle for hardware/software migration No specific investigation on the cycle for hw/sw migration has been done.Depending on the particular technologies and subject disciplines, it can be vary from 2 to 10 years.Preservation of the “look and feel” of digital contents- 19 -
  • 20. Digital Preservation StrategiesBitstream CopyingRefreshingDurable/Persistent MediaTechnology PreservationDigital ArchaeologyAnalog BackupsMigration (SW, HW migration)ReplicationReliance on StandardsNormalizationCanonicalizationEmulationEncapsulationUniversal Virtual Computer- 20 -
  • 21. Hardware and Software MigrationProblems on MigrationMigration is not guaranteed to work for all data typesMigration of information products having used sophisticated software feature is unreliableGenerally, there is no backward compatibility, and if it is possible, there is certainly loss of integrity in the result.Emulation as an alternative to migrationEncapsulates the behavior of the hardware/software with the objectsMS Word 2000 document with metadata indicating how to reconstruct the document at the engineering levelCreates an emulation registry identifying the HW/SW environment and providing information on how to recreate the environment- 21 -
  • 22. Advantages and Disadvantages of Preservation Strategies- 22 -
  • 23. Selection of Preservation StrategiesA schematic diagram for selection of preservation techniques of digital information. (Lee et al, 2002)- 23 -
  • 24. Preservation of the Look and FeelFormat of materials In order to save the “look and feel” of materialTIFFThe most prevalent for those organizations involved with the conversion of paper back fileE.g.) JSTORThis does not allow the embedded references to be active hyper linksSGML/HTMLUsed by many large publishers after years of converting publication systems from proprietary format to SGMLAmerican Astronomical Society (AAS)PDFThe most prevalent format for purely electronic documents used for both formal publications and grey literatureNational Library of SwedenConcerns remain for long-time preservationIt may not be accepted as a legal depository form because of its proprietary nature- 24 -
  • 25. Normalization vs. Native FormatsNormalizationProcess of converting the native format to a standard formatAAS, ACS transform the incoming file into SGML-tagged ASCII formatElectronic master copy is able to serve as the robust electronic archival copy.Well-tagged copy can be updated periodically, at very little cost.It takes advantage of advances in both technology and standards.Content remains unchanged, but the public electronic version can be updated to remain compatible with the browsers and other access technologyExamples of data normalization provided data communityNASA Data Active Archive CentersTransform incoming satellite and ground monitoring information into standard Common Data FormatU.K’s National Digital Archive of DatasetsTransforms the native format into one of its own devisingNormalized formats are considered to be the archival versionsIntellectual property question- 25 -
  • 26. Reliance on StandardsEmphasis on StandardsDOE OSTI Limited the number of acceptable input formatsText in SGML (and its relatives HTML and XML), PDF, WordPerfect and Word.Image in TIFF Group4 and PDF Image- 26 -
  • 27. Preservation Strategies Used in Major Projects- 27 -CSI: CISTI Csi, ECO: OCLC Electronic Collections Online, EJO: Ohio LINK Electronic Journal Center KB: KB e-Depot, KOP: Kopal DDB, LA: LOCKSS Alliance, LANL: Los Alamos National Laboratory Research Library, NLA: National Library of Australia PANDORA, OSP: Ontario Scholars Portal, PMC: PubMed Central, PORT: Portico
  • 28. Issues on AccessAccess MechanismsAccess and display mechanismsProviding accessRestricting accessRights Management and Security RequirementsSecurity and version controlCreation metadata to manage encryption, watermarks, digital signatures- 28 -
  • 29. Access MechanismsProviding Access NLM’s Profiles in ScienceCreates an electronic archive of the photographs, text, video, etcElectronic archive is used to create new access versions as access mechanisms change Providing access technologiesSuper DistributionValue-chain supportRestricting AccessUsage rulePersistent protection- 29 -
  • 30. AccessRights Management and Security RequirementsMost difficult access issues for digital archivingSecurity and version control impact digital archivingRight management includes providing or restricting access as appropriateContent protection technologiesContents EncryptionTrusted EnvironmentMetadata for managing encryption, watermarks, digital signatures needs to be created.- 30 -
  • 31. ReferencesCLIR, 2002. The State of Digital Preservation: An International Perspective [online] [cited 2009-07-23] Hodge, 2000. Best Practices for Digital Archiving: An Information Life Cycle Approach, D-Lib Magazine:6(1) [online] [cited 2009-07-23] < http://www.dlib.org/dlib/january00/01hodge.html>Hodge et al, 2004. Digital Preservation and Permanent Access to Scientific Information, [online] [cited 2009-07-23] ICPSR, 2009. Digital Preservation Management: Implementing Short-term Strategies for Long-term Problems [online] [cited 2009-12-03] http://www.icpsr.umich.edu/dpm/index.htmlKenney, A. R., Entlich, R., Hirtle, P. B., McGovern, N. Y. and Buckley E. L., 2006. E-Journal Archiving Metes and Bounds: A Survey of the Landscape[online] [cited 2009-12-03] Lee, K., Slattery, O., Lu, R., Tang, X. and McCrary, V. 2002. The State of the Art and Practice in Digital Preservation, Journal of Research of the National Institute of Standards and Technology: 107(1), 93-106.Thomas, S. E. and Kroch, C. A. 2000, Project Harvest: The Cornell University Library's Proposal to The Andrew W. Mellon Foundation To Develop a Repository for E-Journals, [online] [cited 2010-03-26] <http http://www.diglib.org/preserve/cornellprop.htm >Edinburgh University Library Digital Archives Research Project. A report and recommendations- 31 -