General Interoperability Recommendations

194 April 7, 2023

The BiCIKL project is born from a vision that biodiversity data are most useful if they are presented as a network of data that can be integrated and viewed from different starting points. BiCIKL’s goal is to realise that vision by linking biodiversity data infrastructures, particularly for literature, molecular sequences, specimens, nomenclature and analytics. We have formulated five high-level recommendations for infrastructures to improve their interoperability. These guidelines are a shortened version of the more detailed paper Recommendations for interoperability among infrastructures.

Use of data brokers

Building communities and trust

Cloud computing as a collaborative tool

Standards

Modalities of access

Use of data brokers

Although data infrastructures can be directly linked to each other, the incentive for bi-directional linking is often lacking, because it requires substantial trust and coordination from both instances. Third party infrastructures that act as data brokers, such as Wikidata for example, serve a great alternative to this. Especially when multiple identification systems exist, data brokers could be of added value.

Recommendations

If direct linking cannot be supported between infrastructures, explore using data brokers to store links.
Cooperate with open linkage brokers to provide a simple way to allow two-way links between infrastructures, without having to coordinate these activities between many different organisations.

Building communities and trust

This is an essential aspect of interoperability that goes beyond purely technical issues. Prioritising the possibility to report issues is vital for building trust in a community of users. Since users are not only utilising the infrastructure, but also actively contributing to it, it should be made easy to report and discuss issues. A possible solution could be to have an open source code-base, which allows users to resolve their issues within the community. This builds engagements, avoids infrastructures being reinvented, supports both technical and social innovation, and is inclusive. GitHub or similar platforms are often used for this purpose.

Recommendations

Provide cloud-based environments to allow external participants to contribute and test changes to features.

Consider the opportunities that cloud computing brings as a means to enable shared management of the infrastructure.

Promote the sharing of knowledge around big data technologies amongst partners, using cloud computing as a training environment.

Cloud computing as a collaborative tool

Cloud computing technology serves as a means to purchase computation and storage resources for a certain period of time, without having to deal with physical hardware. An important aspect of cloud computing is the ability to collaborate. A great asset is the fact that cloud infrastructures often offer several services built on massive-scale machine-learning implementations. Infrastructures can use these state-of-the-art services to enrich the data they serve and make links to other infrastructures. This way they are benefitting from a scaling effectiveness they could not meet on their own.

Recommendations

Provide cloud-based environments to allow external participants to contribute and test changes to features.

Consider the opportunities that cloud computing brings as a means to enable shared management of the infrastructure.

Promote the sharing of knowledge around big data technologies amongst partners, using cloud computing as a training environment.

Standards

The application and compliance to community standards greatly benefits interoperability. This can be done by using common terms, controlled vocabularies, and data models. However, users often bump into the limitations of standards when using real data. Therefore, there should be a way for standards to receive feedback and evolve. To be useful to the whole community, standards should be developed by a broad community.

Recommendations

Invest in standards compliance and work with standards organisations to develop new, and extend existing standards.

Report on and review standards compliance within an infrastructure with metrics that gives credit for work on standard compliance and development.

Modalities of access

Biodiversity research infrastructures should aim to provide open data that can be used in any way the user wants. By providing as many different modalities of access as possible, the use of data is not limited. The following key modalities of access were identified in BiCIKL:

Browsing data via a web portal, which allows users to evaluate what data are available in an infrastructure, in what format and what the quality and structure is like.
Programmatic access via an API, which provides simple programmatic access to data.
Downloading data to be used locally, which allows large amounts of data to be analysed.
Personal requests for unique sets of data, which is sometimes necessary but is an inefficient way to provide data.

Recommendations

Provide as many different modalities of access as possible.

Avoid requiring personal contacts to download data.

Provide a full description of an API and the data it serves.

Guidelines & Protocols

General Interoperability Recommendations

Table of Contents

Use of data brokers

Recommendations

Building communities and trust

Recommendations

Cloud computing as a collaborative tool

Recommendations

Standards

Recommendations

Modalities of access

Recommendations