Language riddled with double meanings, implications, misspellings, and slang make word sense disambiguation incredibly challenging. Under such challenging circumstances what can be done to address the problems associated with word sense disambiguation, especially homographs? Ahren Lehnert, Senior Manager, Text Analytics Solutions with Synaptica, LCC, an enterprise taxonomy and ontology management software and solutions provider, suggests that challenges associated with word sense disambiguation can be addressed by using taxonomy as a source for disambiguation.
One of the simplest methods for disambiguating terms is to use a taxonomy to control multiple versions of a term. In fact, within an organization, an enterprise-specific taxonomy can act as a pointed source for term disambiguation. Modifying an existing taxonomy or building from the ground up can be time-consuming and include a lot of overhead in governance and ongoing maintenance. However, the specificity can improve accuracy in both auto-categorization and search retrieval. In addition, a hierarchical, navigable taxonomy is a simple way to create and maintain homographs as a model of organizational knowledge.
Furthermore, in conjunction with a taxonomy of preferred terms, their relationships, and a network of context keywords, a document set used for training helps to build a corpus of the terms annotated in context. Auto categorizing a small set of selected topical documents against one or more terms proves out the accuracy of the tagging. In addition, human involvement in the process of disambiguation allows users to select successful and unsuccessful instances of term categorization and helps to build a database of sample contexts at the sentence, paragraph, or document level. This database can be used for training future document sets. That is because unlike public web pages, organizational content requires human tagging and reviewing to build a profile of positive and negative contexts around a concept.
Click here to read the article.
Please give your feedback on this article or share a similar story for publishing by clicking here.