Science and Research Content

Unifying Disparate Data with Ontologies -


Data classification can be broadly defined as the process of categorizing data points into a hierarchical and systematic structure called a taxonomy. Well-defined taxonomies are mutually exclusive. In addition, they are collectively exhaustive hierarchical categorization systems. So, how do we traverse two datasets, each having its own taxonomy, using a unified system of categorization?

There are two approaches to traversing two datasets using a unified system of categorization. One is to append the taxonomies of the datasets into one super taxonomy, and the other would be to map the lesser taxonomy into the better taxonomy category by category. The first method would leave you with category duplicates, and the second method is time-consuming and begins to break down the hierarchical integrity of your taxonomies.

Therefore, a better solution would be to find a universal source of truth taxonomy in commonly understood industry vocabulary, such as an ontology into which all other taxonomies can be mapped.

Ontologies are frameworks for common industry vocabulary, and they include entity types and vertical relationships (hierarchical taxonomies) and horizontal relationships (cross-entity). In addition, ontologies as an industry-standard hierarchy of entities can serve as a common ground between taxonomies into which each category can be mapped. Therefore, a better approach to overcoming the challenges of traversing two datasets using a unified system of categorization would be to use an ontology.

Click here to read the original article published by Tamr.

STORY TOOLS

  • |
  • |

Please give your feedback on this article or share a similar story for publishing by clicking here.


sponsor links

For banner adsĀ click here