An update from the Information Management Framework Team of the National Digital Twin programme
The mission of the National Digital Twin programme (NDTp) is to enable the National Digital Twin (NDT), an ecosystem of connected digital twins, where high quality data is shared securely and effectively between organisations and across sectors. By connecting digital twins, we can reap the additional value that comes from shared data as opposed to isolated data: better information, leading to better decisions through a systems thinking approach, which in turn enable better outcomes for society, the economy and our environment.
The NDTp’s approach to data sharing is ambitious: we are aiming for a step change in data integration, one where the meaning is captured sufficiently accurately that it can be shared unambiguously. Conscious that “data integration” may justifiably mean different things to different people, we would like to shed some light on our current thinking and present one of the tools we are currently developing to help us articulate the need for this step change. It is a scheme for assessing the level of digitalisation of data items based upon four classifiers: the extent of what is known, media, form, and semantics. The scheme entails the 8 levels below - which are likely to be finetuned as we continue to apply the scheme to assess real data sets:
We trust that the first levels will resonate with your own experience of the subject:
Extent: as it is not possible to represent what is unknown, the scheme starts by differentiating the known from the unknown. By looking into the information requirements of an organisation, “uncharted territories” may be uncovered, which will need to be mapped as part of the digitalisation journey.
Media: information stored on paper (or only in brains) must be documented and stored in computer systems.
Form: information held in electronic documents such as PDFs, Word documents, and most spreadsheets, needs to be made computer readable, i.e. moved to information being held as data, in databases and knowledge graphs for example.
- Semantics: the progression towards “grounded semantics” and in particular the step from the “explicit” level to the “grounded” level is where, we believe, the fundamental change of paradigm must occur. To set the context for this step, it is worth going back to some fundamental considerations about the foundational model for the Integration Architecture of the NDT.
From a Point-to-Point model to a Hub and Spoke model empowered by grounded semantics
A key challenge at present is how to share data effectively and efficiently. What tends to happen organically is that point-to-point interfaces are developed as requirements are identified between systems with different data models and perhaps reference data. The problem is that this does not scale well. As more systems need to be connected, new interfaces are developed which share the same data to different systems, using different data models and reference data. Further there are maintenance problems, because when a system is updated, then its interfaces are likely to need updating as well. This burden has been known to limit the effective sharing of data as well as imposing high costs.
The alternative is a hub and spoke architecture. In this approach, each system has just one interface to the hub, which is defined by having a single data model and reference data, that all systems translate into and out of. It is important to note, that although this could be some central system, it does not need to be, the hub can be virtual with data being shared over a messaging system according to the hub data model and reference data. This reduces costs significantly and means that data sharing can be achieved more efficiently and effectively. Neither is this novel. The existing Industry Standard Data Models were developed to achieve exactly this model. The new piece is that the requirement now is to be able to share data across sectors, not just within a single sector, and to meet more demanding requirements.
Thus, the National Digital Twin programme is developing a Foundation Data Model (a pan-industry, extensible data model), enabling information to be taken from any source and amendments to be made on a single node basis.
But what would differentiate the NDT's common language - the Foundation Data Model - from existing industry data models?
Our claim is that the missing piece in most existing industry data models which have “explicit semantics”, is an ontological foundation, i.e. ”grounded semantics”.
Experience has shown us that although there is just one real world to model, there is more than one way to look at it, which gives way to a variety of data models representing the same “things” differently and eventually, to challenges for data integration. To tackle them, we recommend to clarify ontological commitments (see our first conclusions on the choice of a Top Level Ontology for the NDT’s Foundation Data Model) so that a clear, accurate and consistent view of “the things that exist and the rules that govern them” can be established. We believe that analysing datasets through this lens and semantically enriching them is a key step towards better data integration.
As we begin to accompany organisations in their journey towards “grounded semantics”, we are looking forward to sharing more details on the learnings and emerging methodologies on the DT Hub. We hope this window into our current thinking, which is by no mean definitive, has given you a good sense of where the positive disruption will come from … We are happy to see our claims challenged, so please do share your thoughts and ask questions.