Databases have overlapping content or different data on the same entities. Besides, the database providers coin public identifiers. The result, each entity may have multiple identifiers in various databases. Consequently, persons performing analysis with data from different databases will have to do a crosswalk or mapping. But then the process of mapping is replete with challenges.
For example, a given pair of databases may provide various mappings created by different providers, and mapping as such may be interpreted otherwise. This is particularly challenging when trying to build a knowledge graph from multiple sources, where information about an entity must be merged.
Judging from the fact that there is an entire field--ontology alignment and matching, mappings are critical for ontologies too. In theory, ontologies should be able to make the meaning of mappings explicit. However, in practice, there are multiple alternative ways to state the same thing (OWL logical expressions, SKOS, and classic loose bioinformatics ‘dbxrefs’).
One way of avoiding the mapping issue is by reusing ontologies and concepts from ontologies—including reusing identifiers/URIs. This practice is promoted by the Open Bio Ontologies (OBO) project.
The need of the hour is a standard exchange format for mappings. The Simple Shared Standard for Ontology Mapping (SSSOM) developed by OBO provides a standard way to describe rich information about mappings, but it is still at a nascent stage.
Mappings between databases are necessary. What appears like a straightforward matter of standardizing the representation of mappings becomes complicated because there are differences in how ID/URIs, are written and degrees of semantic strength. Once there is a standard for sharing mappings, there will be no redundancy, and the relationship will be clear.