2.4.1 Model outline - Moving from a hierarchy to a graph
The CIDOC-CRM model was created by the CIDOC organization, which is a documentation wing of the International Council of Museums, developed to replace the E-R model, a previously used modelling system used in the design of relational database systems for cultural heritage domains.
The E-R model suffered from several issues, mostly related to a lack of flexibility. This meant that as the needs of cultural heritage databases increased so too did the model, eventually leading to an ever growing, complex model that was increasingly difficult to maintain, until its support could no longer be justifiable [OL14].
The CIDOC-CRM model was developed to answer these problems through a semantically richer form of representation, based on an object-oriented approach, which allowed it to surpass the redundant representations of the previous model that had accumulated over time.
The CIDOC-CRM model is formalized as an ontology, a form of knowledge representation that represents a categorical knowledge within a domain and provide a framework under which different organizations can collaborate and interpret their information in an interoperable man-ner. An ontology can also be defined as a "explicit specification of a conceptualization"[Gru95], what this means is that "definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the
1http://apex-project.eu/index.php/en/outcomes/standards/apeead
2http://www.loc.gov/rr/ead/lcp/
Archival standards and data models
names mean, and formal axioms that constrain the interpretation and well-formed use of these terms"[Gru95]. CIDOC-CRM was developed with several object-oriented programming values in mind, which increased its modularity and rate of generalization or specialization for each imple-mentation. [OL14]
Another of the goals of CIDOC-CRM is the creation for data harmonization,and as such the data from different sources joined together can still be integrated into consistent technological data frameworks that can always be worked on while maintaining their consistency.
One of the biggest strengths of the CIDOC-CRM model lies in its flexibility, which can be seen in several fields. Technologically speaking the CIDOC-CRM model is entirely independent from any technological implementation frameworks, which allows any organization or developer to have flexibility on the choice of framework. This however can also create issues as the lack of implementation standardization there are less references on how to implement the model at a technical level, and only at a conceptual level.
Pertaining to implementations, the model keeps its flexibility by not mandating any kind of fields or properties, being that any implementation needs only to use whatever is necessary and won’t use the load of all the other entities and properties of the model, making every use of the model as light as possible. Finally the model is poli-hierarchical, which allows it to be as generalized or specific as necessary for whatever specific dataset it is applied to. Additionally the CIDOC-CRM model was designed with rich computer-based reasoning in mind, and as such the object-oriented ontology still conforms to the needs of logic based systems.[OL14]
Due to its structure, the CIDOC-CRM can be considered a graph model; in contrast,the ISAD(G) is a hierarchical model. The graph model improves on several aspects of the hierar-chical model, as it allows for faster searches of data, makes data more harmonized, and more importantly interlinked, which also promotes the interoperability of the information contained in these models.
2.4.2 Entities and Properties
The model itself is composed of two major principles that from which the entire model spans:
entities and properties. The model consists of sets of entities, which are real world things related using the properties, which establish entity inter-relationships. The entity and property types added to the model were drafted out of analysis of numerous cultural heritage data models and direct interviews with cultural heritage experts through meetings and workshops. To implement these all the entities of the model use the prefix “E” and capitalize the first letter of each word, while the Properties use the prefix “P” and are written in lowercase. As an example, the most general element is "E1 CRM Entity"; it can have a relationship "P2 has type" that can only target the entity
"E55 Type" and entities that inherit from it, due to the fact that the property "P2 has type" has "E55 Type" as its range and "E1 CRM Entity" as its domain—thus, an instance of "P2 has type" have any entity that is E1 or that inherits from it as its subject. Since CIDOC-CRM is formalized as an ontology, Entities types and Properties are formalized as Classes and Object Properties.
12
Archival standards and data models
The CIDOC-CRM ontology is poly-hierarchical, which means that both entities and relation-ships have a specific hierarchy of meanings that allows for different amounts of specialization or generalization depending on the needs of every specific use of the model. Entity types can have subtypes which are more specific than themselves. Conversely, sub-properties have different kinds of properties that can only be applied to themselves and not to the higher levels but they can use any of the properties applicable to their super-entity types. As an example a type of entity that exists in the CIDOC-CRM model is the Man-made Thing to which entails "identifiable man-made items that are documented as single units". Two entities belong to this class a Beethoven piece and the Statue of Liberty. If a higher specificity was intended, then it would be possible to make the Beethoven piece a Conceptual Object instance, while the Statue of Liberty a Physical Man-Made Thing instance, making them both more specific than just Man-made Thing" . This means that they can use all properties the Man-Made Thing could use, but now they can also use the properties associated with their more specific classes.
2.4.3 Applications and Past Implementations
The CIDOC-CRM boasts a high degree of flexibility in several ways. Because it doesn’t establish any technological frameworks, the developers that use the ontology are free to choose the most appropriate alternative. Another interesting aspect is the fact that the model does not mandate any kind of entities or properties, and that it allows to use any level of detail or specificity, making the CIDOC-CRM ontology applicable to many different problems.
The most common use of CIDOC-CRM as well as its intended target as per its creation is the modelling of cultural heritage information—information about a group or society that pertains to either physical things or even abstract characteristics [OL14]. This kind of information is normally organized in either archival repositories or museums.
The CIDOC-CRM model can be used to model the information contained in archival records in a manner that can contain the information of the content of each record in a more atomic manner, improving the record, as previously part of the information of the contents of the record was stored in lengthy textual descriptions.
A modelation of CIDOC-CRM into a model that has similarities with ISAD-G was the map-ping of the Dublin Core Metadata Element set, a vocabulary that was used to represent both phys-ical and digital information resources utilizes[Doe00]. The mapping of the Dublin Core allowed for a better knowledge of how to model descriptions and their contents to the CIDOC-CRM. The Dublin Core Metadata Element set has been a starting point in the effort of mapping ISAD-G through CIDOC-CRM, as its limited scope of 15 generic descriptors makes it too simple to fully encompass the needs needs of the ANTT, despite this simplicity making it one of the most well known and used metadata schemas.
An example of implementations of the CIDOC-CRM model for representing cultural her-itage information includes the AMA (Archive Mapper for Archaeology) project, whose aim is to develop tools for semi automatic mapping of cultural heritage data to a CIDOC-CRM com-pliant model, done by creating electronic text form the original documents and supplying a TEI
Archival standards and data models
header with bibliographical information about the text and marking all archaeological information through a predefined XML grammar that is already mapped to CIDOC-CRM subsets.[EFO+08]
2.4.4 Case study: The Construction of FEUP in CIDOC-CRM terms
In order to better understand the CIDOC-CRM model, a draft case study was developed. Its objective is to represent the construction of FEUP using entities and properties defined in the CIDOC-CRM model.
This case study focuses in recording the event of the beginning of the construction of FEUP in space and time, and as such focuses more heavily on entities that represent them.
To explain the model firstly there are the "E5 Event" nodes "FEUP Construction" and "Laying of the First Stone in FEUP", which are connected via instances of the property "P9 consists of".
In this case Events consist of changes of state in physical, social or cultural systems. These nodes connect themselves to others of the type "E52 Time Span" through the property "P4 has time span". The Time Span nodes represent intervals of time of any kind, and in this case they are the nodes "From 27-09-1996 to 22-03-2001", "22-03-2001" and "27-09-1996". These time spans can be contained within one another, a relationship expressed through the "P86 falls within" property.
Coming back to the Events nodes, these can be interlinked to "E53 Place" nodes through the "P7 took place" property. The Place nodes represent non physical denominations of space, and in this model there is only one that one, that one being "The entirety of the Campus of FEUP". To actually locate the premises of FEUP the entities "E41 Appellations" are used. This serve to identify elements in a specific context, in this case to make the location of FEUP possible to understand given the address coordinates and name of the location. Place connects to Appellations through
"P87 is identified by" and Appellations connect to each other through "P139 has alternative form".
It is important to note that the published version of the CIDOC-CRm contains entities for specific appellations, for example coordinate appellations, however we are not using them for two reasons, one to show that one can use entities in more general levels and still be model compliant and two because the current version of the model, not the published one, has deprecated those different kinds of appellations, and has such when the new version gets published this case study would no longer be CIDOC-CRM compliant.