Catalo (07/02/09)

- [blog] Three Catalogers Walk Into a Blog
(source: Cataloging Futures, 30/01/09)
- Future Directions in Metadata Remediation for Metadata Aggregators Catalo (07/02/09)

(source: Digital Library Federation Aquifer, fév. 09 / via ResourceShelf)
DLF Aquifer, a Digital Library Federation initiative, focuses on making digital content—especially cultural heritage materials pertinent to American culture and life—easier for scholars to fnd and use. One avenue to providing better access to digital collections is by including the collections in aggregations that are promoted and exposed through commonly used channels such as commercial search services. Successful aggregation depends on robust, consistent metadata. While data providers may strive to include all applicable felds for their chosen metadata format in newly created records, records that have been mapped from legacy data in other formats will seldom be optimized in their new home, and the creators of these records may not have the resources to augment these records in any more than the simplest ways. Remediation tools to improve the quality of metadata for improved services are therefore highly desirable. With support from The Gladys Krieble Delmas Foundation, the Digital Library Federation embarked on a project to inventory existing tools and services for metadata mapping, remediation, and enhancement. Once identifed, tools were evaluated for general applicability across digital library and other cultural heritage environments. The results of the research show that a handful of tools are usable as-is, but many tools need more work to be generally applicable in a variety of environments and signifcant development would be required to create a robust and well-defned set of metadata remediation services. Key points of note:
- Relatively few tools are available that can work directly on metadata records rather than full text, and those that are available need to be customized for each aggregator.
- Workable tools are available for date normalization, and also for normalizing and matching coordinates to U.S. geographic names.
- A statistical topic model program for subject clustering has been developed.
- Both named entity and topical keyword extraction remain problematic, with a fairly high percentage of errors.
- Authority fles may be used to break up pre-coordinated Library of Congress subject strings into topical, name, geographic, temporal, and genre facets to improve searching.
- Mappings between different thesauri, which should allow for better search processing in aggregations containing multiple subject vocabularies, are still under development.
- Infrastructure for work collocation, appropriate to aggregators with significant published materials, is still underdeveloped and will probably need to wait for the widespread adoption of the new standard for resource description, Resource Description and Access (RDA).
- Unambiguous identifers for entities such as names and works would be useful when the community infrastructure is developed, but are not yet supported by most metadata formats.
- Unambiguous, machine-actionable rights statements are also an area where the community infrastructure is still under development.
- Biblios.net
Catalo collaborative (documentation en ligne)
- Understanding PREMIS Catalo (07/02/09)

(source: LoC / via Benoît, 03/02/09)
Une introduction au standard PREMIS, métadonnées de conservation (voir aussi le PREMIS Data Dictionary Catalo (07/02/09)