Metadata Extraction Tool - Introduction PDF
Metadata Extraction Tool - Introduction PDF
Metadata Extraction Tool
Home Introduction
The Metadata Extraction Tool was developed by the National Library of New Zealand to programmatically extract preservation metadata
Information Sheet from a range of file formats like PDF documents, image files, sound files Microsoft office documents, and many others.
Project page The tool was initially developed in 2003 and released as open source softtware in 2007. The current version can be downloaded from the
SourceForge download page.
Documentation
Purpose of the Metadata Extraction Tool
Screenshots The Tool builds on the Library's work on digital preservation, and its logical preservation metadata schema. It is designed to:
Download automatically extracts preservationrelated metadata from digital files
output that metadata in a standard format (XML) for use in preservation activities.
Bugs
The Tool was designed for preservation processes and activities, but can be used to for other tasks, such as the extraction of metadata for
resource discovery.
Contact
Supported File Formats
The Metadata Extract Tool includes a number of 'adapters' that extract metadata from specific file types. Extractors are currently provided
for:
Images: BMP, GIF, JPEG and TIFF.
Office documents: MS Word (version 2, 6), Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.
Audio and Video: WAV, MP3 (normal and with ID3Tags), BFW, FLAC.
Markup languages: HTML and XML.
Internet files: ARC
If a file type is unknown the tool applies a generic adapter, which extracts data that the host system 'knows' about any given file (such as
size, filename, and date created).
Capabilities
The tool has both a Microsoft Windows interface and a UNIX command line interface. This enables work to be automated through batch
processing or processed on an individual basis as required.
The application opens all files as readonly, ensuring the integrity of original files. The tool only reads header information, so the extraction
process is quick.
Open Source Development
The Tool is written in Java and XML and is distributed under the Apache Public License (version 2).
Developers may be interested in extending some of the key components of the Metadata Extraction Tool such as extending existing
adapters or developing new ones to process other file types, or creating new XSLT files to generate different XML output formats.
Please refer to Developers Guide for more information on these components.
http://meta-extractor.sourceforge.net/ 1/1