Researchers at the National Institutes of Health (NIH) have developed an AI agent, GeneAgent, that improves the accuracy of gene set analysis by verifying claims against expert-curated databases.
GeneAgent is powered by a large language model (LLM) that generates descriptions of biological processes for given gene sets and then applies an independent self-verification module to cross-check its initial claims.
The verification process evaluates each claim using established biological databases and generates a verification report detailing whether the claims are supported, partially supported or refuted.
Unlike traditional LLMs, which are prone to producing inaccurate or misleading outputs, a phenomenon known as AI hallucination, GeneAgent addresses these limitations through an added layer of fact-checking. The tool was tested on 1,106 gene sets with known functions.
It generated 132 claims for a sample of 10 randomly selected gene sets, which experts manually reviewed. Results showed that 92% of GeneAgent’s self-verification decisions were correct, surpassing previous LLM-based tools such as GPT-4 in reliability.
The researchers also applied GeneAgent to novel gene sets derived from mouse melanoma cell lines, revealing potential new functional roles for genes that could support drug discovery and disease understanding.
While GeneAgent remains limited by the scope of existing databases and lacks human-like reasoning, the addition of self-verification represents progress in reducing AI hallucinations and enhancing the interpretability of molecular data analysis.
The project was led by teams at the National Library of Medicine (NLM), a leading center for biomedical informatics and data science under the NIH.
Click here to read the original press release.