The BiCIKL project is born from a vision that biodiversity data are most useful if they are presented as a network of data that can be integrated and viewed from different starting points. BiCIKL’s goal is to realise that vision by linking biodiversity data infrastructures, particularly for literature, molecular sequences, specimens, nomenclature and analytics. We have formulated five high-level recommendations for infrastructures to improve their interoperability. These guidelines are a shortened version of the more detailed paper Recommendations for interoperability among infrastructures.
Table of Contents
Use of data brokers
Although data infrastructures can be directly linked to each other, the incentive for bi-directional linking is often lacking, because it requires substantial trust and coordination from both instances. Third party infrastructures that act as data brokers, such as Wikidata for example, serve a great alternative to this. Especially when multiple identification systems exist, data brokers could be of added value.
Building communities and trust
This is an essential aspect of interoperability that goes beyond purely technical issues. Prioritising the possibility to report issues is vital for building trust in a community of users. Since users are not only utilising the infrastructure, but also actively contributing to it, it should be made easy to report and discuss issues. A possible solution could be to have an open source code-base, which allows users to resolve their issues within the community. This builds engagements, avoids infrastructures being reinvented, supports both technical and social innovation, and is inclusive. GitHub or similar platforms are often used for this purpose.
Cloud computing as a collaborative tool
Cloud computing technology serves as a means to purchase computation and storage resources for a certain period of time, without having to deal with physical hardware. An important aspect of cloud computing is the ability to collaborate. A great asset is the fact that cloud infrastructures often offer several services built on massive-scale machine-learning implementations. Infrastructures can use these state-of-the-art services to enrich the data they serve and make links to other infrastructures. This way they are benefitting from a scaling effectiveness they could not meet on their own.
The application and compliance to community standards greatly benefits interoperability. This can be done by using common terms, controlled vocabularies, and data models. However, users often bump into the limitations of standards when using real data. Therefore, there should be a way for standards to receive feedback and evolve. To be useful to the whole community, standards should be developed by a broad community.
Modalities of access
Biodiversity research infrastructures should aim to provide open data that can be used in any way the user wants. By providing as many different modalities of access as possible, the use of data is not limited. The following key modalities of access were identified in BiCIKL:
- Browsing data via a web portal, which allows users to evaluate what data are available in an infrastructure, in what format and what the quality and structure is like.
- Programmatic access via an API, which provides simple programmatic access to data.
- Downloading data to be used locally, which allows large amounts of data to be analysed.
- Personal requests for unique sets of data, which is sometimes necessary but is an inefficient way to provide data.