Large Language Models (LLMs) and Knowledge Graphs (KGs) are different ways of providing more people access to data. KGs use semantics to connect datasets via their meaning i.e. the entities they are representing. LLMs use vectors and deep neural networks to predict natural language. They are often both aimed at ‘unlocking’ data. For enterprises implementing KGs, the end goal is usually something like a data marketplace, a semantic layer, to FAIR-ify their data or to make their enterprise more data-centric.
These are all different solutions with the same end goal: making more data available to the right people faster. For enterprises implementing an LLM or some other similar GenAI solution, the goal is often similar: to provide employees or customers with a ‘digital assistant’ that can get the right information to the right people faster. The potential symbiosis is clear: some of the main weaknesses of LLMs, that they are black-box models and struggle with factual knowledge, are some of KGs’ greatest strengths. KGs are, essentially, collections of facts, and they are fully interpretable.
There are two ways KGs and LLMs are interacting right now: LLMs as tools to build KGs and KGs as inputs into LLM or GenAI applications. People working in the knowledge graph space are in the difficult position of building things that are expected to improve AI applications, while AI simultaneously changes the way we build those things.
One way to leverage LLM technology in the KG curation process is by vectorization (or embedding) your KG in a vector database. A vector database (or a vector store) is a database built to store vectors or lists of numbers. Vectorization is one of, if not the core technological component driving language models. These models, through incredible amounts of training data, learn to associate words with vectors. The vectors capture semantic and syntactic information about the word based on its context in the training data.
A benefit of using the vector-based retrieval method is that if you have embedded your KG into a vector database for tagging and entity resolution, the hard part is already done. Finding the most relevant entities related to a prompt is no different than tagging a chunk of unstructured text with entities from a KG.
AI is affecting the way we build KGs while we are expected to build KGs that facilitate AI. The prompt-to-query approach is a perfect example of this. The schema of the KG will affect how well an LLM can query it. If the purpose of the KG is to feed an AI application, then the ‘best’ ontology is no longer a reflection of reality but a reflection of the way AI sees reality.
Using AI as much as possible to build, maintain, and extend knowledge graphs, and KGs are necessary for enterprises looking to adopt GenAI technologies. This is for several reasons: data governance, access control, regulatory compliance; accuracy and contextual understanding; and efficiency and scalability.
Click here to read the original article published by Towards Data Science.
Please give your feedback on this article or share a similar story for publishing by clicking here.