In the last decade, high-throughput approaches have generated comprehensive and accurate datasets on the properties and structures of known and hypothetical materials. This information is typically curated, aggregated and stored in ever-growing databases, containing both computed and experimentally measured properties, allowing either public or commercial access. The availability of this data has opened new opportunities for the data-driven design of new materials to meet technological and environmental challenges, using machine learning and other informatics techniques.
Most databases can be queried online with an application programming interface (API), which allows the user to filter the data that is returned. Since each database may cover different material families and properties, it is beneficial to have access to information from multiple databases. This may, however, be quite difficult as each database has a different API and returns a different representation, or format, of the underlying materials data. To retrieve information about materials and filter a database, the user typically submits a query to that database via a URL. However, from one database to another, the queries vary in format and in strategy. Not only are the queries different, but the response formats as well. The subtle distinction between different queries and responses, different for each database, requires the user to become expert in many different APIs, should they wish to fully exploit the resources available.
For all these reasons, the developers of various databases gathered on several occasions for a series of workshops entitled “Open Databases Integration for Materials Design”, hence the acronym OPTIMADE. These were held:
- at the Lorentz Center in Leiden in the Netherlands in October 2016
- at the CECAM in Lausanne, Switzerland in June 2018, 2019, 2020 (online), and 2021 (online).
The participants developed the open source OPTIMADE API specification, of which version 1.0 was released in July 2020. The specification provides guidelines for hosting materials data online, such that queries and response formats are transferable across multiple databases. The OPTIMADE consortia also supports and creates tools for both querying (clients) and providing data (servers).
The latest OPTIMADE API specification v1.0 (Scientific Data: https://doi.org/10.1038/s41597-021-00974-z, Zenodo: https://zenodo.org/record/4195051), has been implemented by many leading crystal structure databases, comprising of a web of linked OPTIMADE providers, namely: AFLOW , COD , TCOD , Materials Cloud , Materials Project , NOMAD , odbx , Open Materials Database (omdb) , and OQMD . The API gives researchers standardized access to over 10,000,000 results for different materials, providing benchmarking opportunities whilst offering a huge opportunity for screening and machine learning studies.
The OPTIMADE API is flexible and extensible by design, such that it will cover more use cases going forward. The development and adoption of the OPTIMADE API rely on the involvement of a large number of scientists so contributions from the community are strongly encouraged, and questions on development, registration of a provider, or usage can be directed to the project’s homepage and GitHub repositories. The future development of scientific APIs with well-defined standards, including OPTIMADE, should herald an era of effective use of big, open data in materials science and beyond.