A Survey of Top-level Ontologies
A Pathway requirements for a Foundation Data Model
Extract from: The pathway towards an Information Management Framework: A ‘Commons’ for Digital Built Britain (2020) (Hetherington, 2020)
3.5. A Foundation Data Model: clearing up the concepts.
- Our Foundation data model will need to address the questions proper to top-level ontology, which can describe general concepts independent of a problem domain. Our FDM should be able to provide answers to:
- Time, space and place: How does the ontology deal with time and space-time? How does the ontology deal with places, locations, shape, holes and a vacuum?
- Actuality and possibility: How does the ontology deal with what could happen or what could be the case, such as where multiple data sets give conflicting stories on the behaviour of a network?
- Classes and types: How does the ontology deal with issues of classification?
- Time and change: How does the ontology deal with time and change?
- Parts, wholes, unity and boundaries: How does the ontology deal with relations of parthood?
- Scale and granularity: How does the ontology deal with scale, resolution and granularity?
- Qualities and other attributes: How does the ontology deal with qualities and other qualitative attributes?
- Quantities and mathematical entities: How does the ontology deal with quantitative data and with mathematical data and theories?
- Processes and events: How does the ontology deal with processes?
- Constitution: How does the ontology deal with the relation – sometimes referred to as a relation of “constitution” – between material entities and the material of which, at any given time, they are made?
- Causality: How does the ontology deal with causality?
- Information and reference: How does the ontology deal with information entities?
- Artefacts and socially constructed entities: How does the ontology deal with artefacts (e.g. engineered items) and socially constructed items like money and laws?
Our FDM will need to select from the available ontological approaches to provide answers to these questions that are as comprehensive and rigorous as possible. These choices will need to be guided by the general requirements of the engineering domain and the domains engineered systems support. This document does not attempt to make these choices now.
- A digital twin is more than just a collection of pieces of data that describes the world. How do we describe the relationship between a digital twin and the corresponding elementary pieces of data?
- How do we describe the domain of validity of a digital twin, including any assumptions or simplifications of real-world behaviour? This may include describing assumptions about the underpinning physics, engineering, biology or sociology that influence the way that the asset operates. How do we define the boundaries of validity of models to ensure that models are used and composed only in ways that are meaningful? Especially for “black box” models, how do we ensure the validity of inputs and outputs and mitigate the risk/damage of erroneous outputs, especially when users will not be close to the development decisions made in the creation of the models
- How will the data models used by an existing digital twin be validated and interpreted so that mappings and transformations can be established prior to seeking to integrate the twins?
- What is the relationship between a twin and the physical, mathematical or computational model(s) that underpins it? How do we define the non-physical parameters describing the mathematical model used, such as a grid resolution?
- What is the relationship between a digital twin, and the kind of things it describes? Is this a twin of the make and model of my car, or of my specific car? If the latter, what do we call the kind of thing that is a potential twin of all such cars, before I connect it to the telemetry from my specific car? To what extent does such telemetry actually need to be real-time or would it be sufficient to periodically collect, to analyse performance and arrange maintenance? How do we aggregate models that operate at very different tempos – from seconds, to hours to days?
- How do we make statements about time? How do I talk about discrete time periods like “on Thursdays” or “in Summer” or “FY 20/21”? How do we describe and capture change over time? How do we model future operations to assess the impact of planned changes, without disturbing the current operating model?
- How do we break down the physical world into parts? What is the relationship between a twin of a component of a city, such as a building, and a twin of a city? How do we reference the “coarse-graining” that takes place when we have different models, at different resolutions, that overlap in the aspects of the physical world that they describe? How do we aggregate models, particularly in circumstances where there may be modelling gaps? How do we deal with missing information, or unconnected assets?
- How do we handle uncertainty? How should we best manage the difference between “measurement uncertainty”, “variability within a class”, “variation over time”, “environmental noise” and so on?
- Some models will be mechanistic, based on known understanding of the physical world. Others will be purely empirical: based on maximising goodness-of-fit from models to information without incorporation of domain knowledge. Many will be a mix of these paradigms . How will this aspect of the use of digital twins be reflected in the ontology? How reliable is each paradigm, and how can a lack of reliability be taken into account?
It is also worth noting that the scope of the data to be described covers more than just the digital twins themselves. Our data requirements also include the following:
- How do we handle versioning? Should version histories be curated indefinitely? How do we handle archiving and ultimately removal of out-of-date data?
- How is invalid data corrected? What audit trail is needed and how will data ownership be managed and maintained? Are statements recorded together with the identity of the person or organisation making the claim, so that the provenance of information is tracked and unreliable information can be managed? How do we correctly handle missing, invalid or inaccurate data? Outlying or erroneous data can sometimes still be useful when analysed in a different way.
- What relationships do twins have to their authors? Who owns them? Who can know what about them? What kind of roles and actors are there?
- How do we describe not just the models themselves, but the methods that are used to derive insight from them? Models live alongside visualisations, interfaces, deployments and so on, which also need to be described and exchanged. • How do we describe the uses to which models have been put? Will we need to log each question asked of a digital twin? This will facilitate audit and meta-analysis and save on computational time lost when re-running old studies, but may have information governance and privacy implications.
- How do we model social concepts related to ownership, rights, legislation and regulation? What are the permitted uses of the model and any licence or usage constraints?
References  West (2011) Developing High Quality Data Models. San Francisco, CA: Morgan Kaufmann Publishers Inc.
Continue to next section: Appendix B - ISO IEC 21838-1: 2019 - Documenting Coverage