Wolters Kluwer Health has introduced a clinical AI validation framework designed to reflect how clinicians use AI at the point of care. The framework aims to support governance committees by moving beyond basic output evaluation to determine whether AI‑generated responses are clinically useful, grounded in trusted knowledge, and reliable in complex care scenarios.
Traditional evaluation methods such as benchmarks, test questions, or user ratings often fail to capture whether an AI response aligns with clinical intent, omits critical information, or handles uncertainty appropriately. Wolters Kluwer’s new approach addresses these gaps through a multi‑method framework that evaluates the answers clinicians rely on when making care decisions.
The framework, outlined in the report A Measured Approach to Evaluating Clinical AI at the Point of Care, assesses AI performance across three dimensions: clinical intent, knowledge integrity, and clinical impact. Together, these provide governance committees with a more meaningful measure of clinical reliability than generic benchmarks.
The evaluation model applied to UpToDate Expert AI combines automated testing with structured human review by physician editors and clinical AI specialists. It incorporates rubric‑based assessment, stress testing, adversarial “red teaming,” and ongoing monitoring to detect omissions, unsupported claims, and contextual errors that generic evaluations may overlook.
In recent testing, UpToDate Expert AI was evaluated on 1,669 clinical queries covering more than 15,000 criteria. Results showed 99.9% clinical alignment, with significantly fewer omissions compared to two general‑purpose LLMs, which had omission rates 15% higher.
Key elements of the governance‑ready approach include:
• Clinically meaningful evaluation using point‑of‑care criteria developed by physician experts.
• Performance relevance measured by whether responses are clinically useful and include essential information.
• Grounded answers evaluated for integrity and traceability to trusted databases such as UpToDate.
• Risk‑aware design incorporating red teaming, bias testing, and regression monitoring.
• Clinical reasoning support to preserve transparency and avoid overreliance on AI tools.
UpToDate Expert AI is powered by Wolters Kluwer Expert AI, which leverages deep domain expertise and more than a decade of AI experience. As of April 30, over half of U.S. UpToDate Enterprise Edition customers—representing approximately 2,000 hospitals—had adopted UpToDate Expert AI, with adoption projected to reach 70% by mid‑year.
Click here to read the original press release.