Digital Preservation Technical Paper

Download 48.37 Kb.
Size48.37 Kb.
1   2   3   4


The PUID scheme has been developed in accordance with the following criteria:

  • Persistence: Once assigned, PUIDs must be persistent. As such, they must be immune from changes in technology or scheme administration. The scheme must be sufficiently flexible to be adaptable to future developments without any need for post facto changes to existing identifiers.

  • Ubiquity: The PUID must be technology independent, and capable of describing any class or granularity of representation information.

  • Focus: The PUID should not be used to convey any information beyond that required for identification.

The PUID scheme is designed to be applicable to any class of representation information capable of being described within the PRONOM registry. However, at present, its implementation has been limited to a single class: file formats. Within the context of this scheme, a file format is defined as a follows:

The internal structure and encoding of a digital object, which allows it to be processed, or to be rendered in human-accessible form. A digital object may be a file, or a bitstream embedded within a file.
This structure and encoding will usually be formally expressed as a technical specification, although de facto standards also exist without formal specifications, such as Comma Separated Variable (CSV) format. File formats may be software-independent, or developed in tandem with specific software products. Format specifications are subject to regular revision, resulting in new format versions.
The granularity at which separate formats are identified is a crucial feature of the scheme. The PUID identifies formats at the most specific possible level of granularity. For example, the eXtensible Markup Language (XML) is a format which exists in a number of different versions (currently 1.0 and the forthcoming 1.1). Each version is regarded as a distinct format within the scheme. The Scaleable Vector Graphics (SVG) format is both a separate format in its own right, with three versions (1.0, 1.1 and the forthcoming 1.2), and an XML format. Each SVG version has its own specification, which makes reference to, but is distinct from, the XML specification. Thus, each SVG version is also distinguished by a separate PUID.
However, the granularity of PUIDs extends only to features which separate one format from another, and not to those which are inherent to a format. For example, the TIFF 6.0 image format supports a number of different image compression algorithms (RLE, CCITT Group 3 and 4, JPEG etc.), but these all relate to a single format.
In particular, the PUID does not distinguish on the basis of the following:

  • Character encoding schemes: File formats may use a variety of different character encoding schemes, such as Unicode UTF-8 or US-ASCII. In most cases, the allowable encoding schemes are defined as part of the specification; as such, they are not elaborated within the format PUID. However, an additional class of PUIDs for identifying character encoding schemes will be implemented at a future date.

  • Byte orders: Most formats use a specific byte order, either defined within the specification or as a consequence of the operating system within which they are created. Some formats, such as TIFF, support multiple byte orders. However, differences in byte order are not distinguished within the PUID.

  • Encapsulated formats: Some formats support the encapsulation of other formats (e.g. TIFF images embedded within a PDF file). The PUID does not itself distinguish this. However, the PUIDs for both container and encapsulated components can be cited to support the modelling of such relationships within a metadata management system.

  • Classifications: The PUID does not incorporate any form of classification system – such schemes are largely subjective and many formats do not lend themselves to simple categorisation.

It must be recognised that the function of the PUID is not to describe the features of a particular instance of an electronic object in a given format, nor to provide the information required to perform any particular action on that object; the PUID is simply required to provide a persistent and unambiguous binding to a definitive description of that format (provided, for example, by PRONOM or another technical registry).

  1. Share with your friends:
1   2   3   4

The database is protected by copyright © 2017
send message

    Main page
mental health
health sciences
gandhi university
Rajiv gandhi
Chapter introduction
multiple choice
research methods
south africa
language acquisition
Relationship between
qualitative research
literature review
Curriculum vitae
early childhood
relationship between
Masaryk university
nervous system
Course title
young people
Multiple choice
bangalore karnataka
state university
Original article
academic performance
essay plans
social psychology
psychology chapter
Front matter
United states
Research proposal
sciences bangalore
Mental health
compassion publications
workplace bullying
publications sorted
comparative study
chapter outline
mental illness
Course outline
decision making
sciences karnataka
working memory
Literature review
clinical psychology
college students
systematic review
problem solving
research proposal
human rights
Learning objectives
karnataka proforma