CULTURAL HERITAGE CONTENT RE-USE : AN AGGREGATOR ’ S POINT OF VIEW

This paper introduces a use case of re-using aggregated and enriched metadata for the tourism creative industry. The MORe aggregation and enrichment framework is presented along with an example for enriching cultural heritage objects harvested from a number of Omeka repositories. The enriched content is then published both to the EU Digital Library Europeana (www.europeana.eu) and to an Elastic Search component that feeds a portal aimed at providing tourists with interesting information.


INTRODUCTION 1.1 Related work
The work presented in this paper focuses on a use case of reusing aggregated cultural heritage related content for the tourism industry.The key component in this paper is the aggregation infrastructure that is used and more specifically two main topics are covered: a) its micro-services architecture and b) its enrichment services.These two distinct approaches have been explored partially in the bibliography with the majority of papers focusing on curation micro-services.Enrichment microservices can be considered as a category of curation microservices.
In [4], news items are automatically enriched with information from Linked Open Data (LOD) sets and use an ontology based browser to demonstrate the advantages of LOD enabled navigation.In (Rainer Simon et al, 2011) the authors use an annotation tool to help users annotate records with information drawn from LOD thesauri.In (Stephen Abrams et al, 2010), authors propose and present a curation micro-services infrastructure in order to demonstrate the powerful characteristics and flexibility of such an approach.In a microservices architecture which focuses on digital curation and preservation is presented.A presented in (Kevin Clair, et al 2011), curation micro-services are also used on a thematic aggregator to enrich information and improve the quality of content.

MORe metadata aggregator
The Metadata & Object Repository (MORe) (Christos Papatheodorou et al, 2012) was established as early as 2011 in the context of the EU CARARE project.Since then has been used to aggregate over two million metadata records and delivered it to Europeana in the EDM (Europeana Data Model) format (Martin Doerr et al, 2011).In the heart of the aggregator lies a repository that is used to store metadata, maintain versions and identity relations between objects, etc.Initially, MORe was based on fedora-commons and provided enrichment services such as: de-duplication, geo-spatial, etc.The cloud based version of MORe that was introduced in the LoCloud project in 2013 replaces the fedora-commons backend of the earlier version with one based on Apache Cassandra, and standalone services with a pluggable and scalable services layer that provides for higher processing capacity and thus reduces aggregation time significantly.The general architecture of the MoRe aggregator can be seen in Fig. 1.Its primary components are shown with two of them having a centralized role: a) the storage layer and b) the services layer.The first is responsible for handling metadata storage and the latter is responsible for gluing together the various services involved.The main aggregation workflow used in this paper is presented in Fig. 2 and depicts all the main steps that users can perform.For each one of these steps a specific service is used to handle the work required and certain steps like the transformation and enrichment require additional work such as validation and indexing of content.

Motivation
The main motivation behind building a flexible aggregation framework that includes the enrichment layer lies mainly in ensuring metadata quality and interoperability.This is based on the premise that automatic enrichment leads to metadata records that are richer, more comprehensive and provide links to other related resources such as: other related records thesauri terms -Wikipedia lemmas This increases the visibility of the records, improves the search results and ultimately the user experience.It also allows the records to be machine readable and thus to be automatically used by additional services as needed.
Furthermore, content re-use helps build more sustainable models because as it will be made clear, with the appropriate technologies little extra work is required.

The Omeka Repository & Content model
The work reported in this paper relies on content that has been harvested from Omeka (http://omeka.org/).Omeka is an open source repository that provides out of the box functionality for individuals and institutions that wish to publish collections on the web using international standards.Omeka provides two schemas through OAI-PMH and REST: a) a Dublin Core based metadata schema and b) a custom schema named Omeka-XML.Both use an internal Dublin Core based representation which allows administrators to extend it with new elements.It is possible to get the content in other formats (through metadata crosswalks) such as: mods, CDWA-Lite and METS.METS is used to provide the structure of each record.As we are interested in metadata aggregation, we focus here on metadata, and more specifically on OAI_DC.The OAI_DC metadata contains 15 elements.These elements were provided in a completely unqualified format, meaning that even language information (the xml:lang attribute) was not present.This is typical of the scenario of harvesting from diverse cultural heritage collections addressed here.
Available metadata is used to present records though the Omeka web portal (Fig. 3).Although the mods, omeka-xml and cdwalite schemas are more expressive, Omeka provides them without qualifiers, thus presenting the same problems as with the oai_dc.Hence, the approach illustrated here for OAI_DC (selected by virtue of its popularity) is applicable for these other schemas as well.
The primary elements harvested are depicted in the following

Overview
The enrichment services framework presented in this paper consists a generic enrichment service that orchestrates a series of enrichment micro services into simple workflows referred to as: enrichment plans.The enrichment process involves executing one or more enrichment micro services in a specific sequence (referred to as enrichment plan).Each micro-service enriches each record in a specific way (e.g. by inferring coordinates out of a place name or by adding language identifiers).Each enrichment plan can be applied to one metadata schema (e.g.OAI_DC) and each one of the enrichment services support specific schemas.

Available enrichment micro-services
At the moment there is number of available enrichment microservices support the OAI DC and EDM (Europeana Data Model) schemas.These micro-services are:  Language identification: this service is responsible for identifying the language of text and adding the proper qualifiers to the corresponding element.The Apache Tika Language Tools are used. Spatial identification & normalization: this service is responsible for identifying spatial information provided through the Coverage element and normalizing it. Temporal identification: this service is responsible for identifying the temporal information provided through the Coverage element.This involves dates normalization.


Reverse geo-coding: this service is responsible for reverse geo-coding an address and place name description out of the coordinates provided. Spatial translation: this service adds the ability of providing a textual description of Place in various languages.The Geonames Web service is used. Spatial coordinate transformation: this service is based on open-geo libraries and allows transformation between coordinate systems. Thesauri enrichment: this service allows the ingestion manager to associate one or more thesaurus concepts to each item.Concepts, describing the collection in general are drawn from standard thesauri or authorities, such as the Library of Congress Subject Headings, GEMET, etc.

Enrichment plans
The architecture shown in Fig. 4, includes a number of enrichment micro-services combined within an overall framework in order to provide sophisticated enrichment in a variety of metadata schemas.This architecture allows configuring on the fly (with no coding at all) an enrichment plan for each content provider and collection.The diverse enrichment micro-services are applied on each metadata record in a predefined sequence, according to rules specified by the aggregation manager.Not all micro-services are applied to all packages harvested.Each content provider primary repository or collection has its own specific characteristics, possibly requiring only a subset of the microservices.Enrichment micro-services need to be applied in a specific order (for instance, as in Fig. 5) so that: -Micro-services that provide information useful to other micro-services are executed first -Micro-services with a highest degree of confidence are executed in a higher order The language identification step takes precedence in order to provide qualifiers to textual elements.
The thesauri enrichment, adds dc:subject terms from standard thesauri.
The spatial identification & normalization extracts the coordinate information out of a string (in this case: dc:coverage) and splits and identifies Lat/Lng coordinates.
The geo-coding service takes the coordinates and provides a place name and geoname-id which it then populates to the record.

Example of an enriched record
In this section an example of a Dublin Core (OAI DC) record before (Fig. 6) and after (Fig. 7) an enrichment plan is applied is shown.In this particular record, the xml:lang attributes are filled, one subject terms is added from library of congress subject headings, the non-parsable (due to cataloguing error) coordinates are fixed and augmented by textual descriptions and a geonames URI.
Figure 6 Enrichment services framework In this enrichment example, it is possible to augment the record with useful information such as a place name, languages and thematic information.As it will be made clear in the next section, this is the critical part that makes the content usable in other domains.

CONTENT RE-USE FOR TOURISM
As it can be seen from Fig. 8, the MORe aggregator is typically used to aggregate content from multiple sources, transform it to a common schema (in our case EDM) and publish it to a single provider (in our case Europeana).

Figure
Figure 2. Aggregation workflow

Figure 3 .
Figure 3. Sample view of a record in the Omeka repository from one of the selected stakeholders: the Postal Services in Cyprus (a stamp representing a fresco in one of the ten paintedchurches in Cyprus).

Figure 8 .
Figure 8. Use case setup A localized portal for tourism must provide specific functionalities and has specific quality constraints in terms of content.The functionalities include:  a thematic hierarchical browsing of the content  the placement of the content on a map  the ability to search for content  the browsing of content based on language