IMF Pathway Consultation Questions
-
In the responses to the Consultation to the Pathway toward an Information management Frame work a number of direct questions were raised about the approach, we have attempted to answer them here:
How does the IMF pathway address the inclusion of the human level of the flourishing system and how people will interact and behave with an NDT?
The NDT programme is initially focused on critical national infrastructure, but that will extend to the services provided by that infrastructure and so does include the human level of flourishing, as in the end that is where the benefits arise."
Are all digital twins geospatial?
No. Plenty of Digital Twins are geospatial of course, but it is not a defining characteristic. The NDT will support multiple ways for a geolocation to be specified and conversions between them. Whether one becomes a preferred method is beyond the competence of the NDT programme itself.
Is the intention is to adopt existing data library structures?
No. In the IMF Reference Data Libraries are an extension of the Foundation Data Model, not something separate, so they will need to be tightly integrated to the FDM for them to extend the FDM with the same quality. It may be that some existing RDLs already conform the the as yet undefined FDM, but that seems unlikely.
Some options are to improve the quality of existing RDLs that are relevant to the NDT so that they do conform the the standards we develop, or to treat them in the same way as another data source and build an interface to them so that they can be accessed (as far as possible) seamlessly.How will the NDTp address the lack of direct benefits, or that the benefit will be to 3rd parties at significant cost to asset owners?
The question here should be about the lack of benefits to some contributing to the NDT, where there is a cost in doing so.
Much use of the NDT will be data sharing as part of collaboration between organizations, or as fulfillment of contractual obligations (handing over design data). Other data handover will be to meet regulatory requirements. As an example consider planning applications. At present, planning application data has to be handed over to councils, it is just done as (usually) an "electronic paper" exercise. But you don't get your planning application considered without handing the data over. The NDT will support this being handed over as data, as well as other data such as building controls data. Whilst there will be a cost in making the change in how this is done, in the longer term this will be cheaper and provide data that can help with problems like finding which other tower blocks had the same cladding as Grenfell Tower.How can we be sure data can be trusted?
The NDT programme will be establishing quality standards for data that is part of the NDT. As a minimum it will be that the provenance is known, and the claimed quality is stated. The aim is for something that is like the mains water system, where water is treated before it enters the system, so that it can be drunk by anyone who turns on a tap, rather than allowing anything to enter the mains system and leaving each consumer to treat the water at the tap.
Your data being accepted as part of the NDT should be seen as a badge of quality that regulators, business partners and customers will insist on.How do we avoid reasoning ourselves into prohibitive level of risk aversion?
When managing risk you should be considering the risk of not making data available to those who need it as much as the risk of it falling into the hand who can use it to do harm. As a result the NDT programme will be taking a balanced approach to security that reflects the need for access by those with legitimate uses for data, but also respects the rights of the owners of data, which may have commercial or national sensitivity, to control access to it (which may include paying for access to data owned by organizations such as OS).
If the FDM is distinct from the ontology then what is it, and what form does it take?
This is related to the difference between ontology and epistemology. Ontology is about how the world is, whilst epistemology is about what we know. It is good for our FDM to be based on a well defined ontology, because that helps to make our data consistent. However, our data model only holds what we know. For example, our ontology might say that everyone has a mother, but just because everyone has a mother does not mean we know who everyone's mother is, if it did, our database would have to go back to Adam and Eve. So our FDM needs to support that people have mothers, but not insist that we know who they are.
What are the potential negative consequences of different resources refreshing at different rates when we are trying to make in-the-moment decisions?
What it is important to understand about the timeliness of data is that it is related to the timeconstant of the system that they relate to, where the timeconstant is the time it takes for a perturbation to the system to manifest as change in the systems state. If you are looking at the control surfaces on an aircraft that might be fractions of a second, if that is the UK economy response to tax changes it might be years.
What is important is that data is kept up-to-date relative to the timeconstant of the system in question. Something like one tenth of the timeconstant is plenty good enough.Will the NDT provide us with the ability to stop our data being used in ways that are unethical or against the interests of our organisation or customers?
There are limits to what technology can do. However, what we will be providing is that the owners of information authorize who can use their data. This gives the opportunity for authorization to be subject to a contract that can specify the uses to which the data can be put. Whilst this does not of itself prevent misuse of data, it does at least provide a remedy which puts data owners in much the same position as they currently are when they share data.
It reads as if the DD is a platform for data sharing where I thought the intention was to enable simple and more challenging questions to be asked.
It is the sharing of data that is a key step towards enabling challenging questions to be asked.
It appears to be starting afresh, with a delivery window of +20 years by which time many organisations will have created their own digital twins.
We will not have to wait 20 years for the NDT, it will just be delivered incrementally in thin slices over some period of time. Hopefully, as people develop their Digital Twins they will choose to make them part of the NDT, partly because that adds value for them, and partly because it will reduce the cost of developing a Digital Twin to use the standards we develop.
The IMF Pathway does not mention APIs, why?
The Pathway document does not mention any specific technology. Those choices are yet to be made. However, it is worth pointing out one way that APIs are used that we will not be doing. A popular use of APIs is for organizations to provide an API for access to some of their data by third parties, who can then access that data. This creates a point-to-point network of interfaces. The IMF specifies a hub and spoke approach to data integration where all providers of data supply their data according to the FDM/RDL, so that users of data only have one data model/RDL they need to be concerned with. This probably can be done using APIs, but it will require a new approach to how APIs are developed and used.
We would like to share our recommended practice with CDBB - please advise how we share this with you.
You can speak to the IMF Technical Development team through the IMF Community on the DT Hub.
a) Data is exchanged on an enduring basis;
b) Data is exchanged on request for a disclosed purpose on the trust that it will be deleted afterwards.
c) The result of a query / calc is exchanged but data remains distributed.
In these three options there are different intrinsic levels of protection, to what extent will these be supported by the IA and contractual framework?
The Integration Architecture supports providing access to authorized users. The authorization process provides the opportunity to put in place a contract that governs how the data is used. This does not necessarily prevent misuse, but does provide a remedy in much the same way as would happen with data sharing by other means.
Therefore a) and b) are just ordinary use of the IA, and c) is a case where a calculation result is shared rather than the input data to that calculation.How, and at what level, will individuals and organizations be identified and linked to data For example - who owns an object, who is using an object, who is responsible for maintenance, who certified that X was true. Is there a single registry of people and organizations? how is this managed?
Data provenance and data quality certification is part of the data that will be held about the payload data. That this is available will be part of the quality requirements for data to be part of the NDT.
Will data lineage be managed by the IA (how did you get to this and where did your input data come from), commercial elements (who is using my data for what, can I deduce its value from this?) and the data equivalent of 'recalls' (how do I let people who have used my data know that the model / sensor etc. that generated it is/was faulty?)
The IA will be providing QA of the data you use, but not of what you do with it. That is your responsibility. However, if you add that data back into the NDT it will be your responsibility to provide that data.
The authorization process will mean that you will know who has used your data (Even if it is made publicly available, the minimum security is that it is made available to registered users) so it will be possible to follow up with people who are using your data if necessary.The IA definition includes support two data exchange mechanisms (query and messaging), have other mechanisms such as file sharing been considered?
Yes. The Integration Architecture sets an ambition for what we wish to achieve. It is not a product you can currently purchase, and there may be problems that mean we have to lower our ambition, or reach it in stages.
There is nothing in the IMF that prevents file exchange methods being used to share data between parties using just the FDM/RDL. Clearly though, this will not be on the IMF catalogue, and will not therefore be discoverable.What about Measurement – how does the FDM deal with various definitions of measurement, or is this handled within the RDL?
Metrology is a surprisingly difficult area to pin down, given how widely used measurements are in the world of engineering. The FDM will have a data model for measurement (and other ways of ascribing properties), but the details of quantity types, scales, and units of measure will be handled at the RDL level.
Does the RDL presume a need to handle natural language interaction?
The RDL will have names that are human readable, but interaction with the RDL will be via applications and natural language interaction will be part of those applications. The NDT programme is likely to develop a reference implementation of an RDL management system, and if it did it would need to support natural language (search engine type) queries among other requirements.
Is the maintenance of the RDL the sole responsibility of “The information management commons”
No. The ambition is that the RDL content is provided by authoritative sources in the field that data originates from, but that it conforms to the standards defined by the IMF.
Is there an assumption that data owners publicly publish digital twins, as the authorisation layer is not explicitly linked in the same way as consumer actors?
No data from the National Digital Twin will be available to anonymous users. Even for data that is openly available, it will only be available to registered users, and the access to the data will be logged."
Does the authorisation layer need any specific consideration of law enforcement use cases at this stage?
We have already considered what we think the minimum data we need to hold is so we can trace bad actors using the NDT. Doubtless this will need refining as things develop.
In Task 2, is the set of digital twins for use in developing further deliverables to be made open?
The status of the Digital Twins that are part of the corpus used to check the FDM/RDL are unlikely to be made open because they have been used for that purpose. Some of them will already be open of course, whether others become open will be a matter for the data owners.
Considering existing data standardisation processes, would the quality of the data facilitate the development of the IMF?
Data, existing data standards and ontologies are all examples of requirements the FDM/RDL will seek to support. The higher the quality of the data, the easier it is to establish the requirements.
Can you please provide examples of some ontologies you would use? How would you use the FDM national and international standards?
We will look at many ontologies/data models as sources of requirements (see the TLO review and the IDM review). However, since in many cases these sources are inconsistent with each other, it is unlikely we will be adopting any of them directly, but rather creating an FDM that can consistently meet the requirements across the different sources, and for example, where there is a standard with a body of data using it, provide a mapping to it.
Do you have any plans or measures in place to encourage organisations to use the “Commons”?
The Commons is just one stream on the NDT Roadmap. Others, in particular the Change Stream are more focused on adoption.
How would you track progresses and changes?
We produce regular reports on the work we are doing for those outside the programme, and obviously monitor our progress against deliverables within the programme. You can also find updates through the weekly Gemini Call. However, this is not a conventional project and has more in common with Christopher Columbus setting off to discover the (hoped for) New World that the average IT project.
What are the requirements for a digital twin to be compliant?
These are not fully worked out yet, but will include:
1. Compliance to the FDM/RDL.
2. Provenance of the data being provided.
3. A statement of quality for the data provided (what quality it has).
Essentially, what it takes for the data to be trusted.