OpenHealthGuard

OpenHealthGuard (OHG) is a neurosymbolic Retrieval-Augmented Generation (RAG) system designed to deliver safe, transparent, and trustworthy AI assistance for healthcare professionals. Rather than relying solely on opaque black-box models, OHG combines Semantic Knowledge Graphs (SKGs) with modern Large Language Models (LLMs) to ensure that every answer is grounded in verified data and traceable reasoning.

The core of OHG is an open-source neurosymbolic framework that provides reusable components for ingestion, vectorization, semantic linking, and evidence-based answer generation. This framework is vendor-flexible: it can be deployed with either open-source or commercial technologies for graph databases, vector stores, or language models depending on the needs, resources, and regulatory constraints of a local setting. This makes OHG practical for a wide range of clinical environments, including those with limited infrastructure.

By integrating structured medical knowledge with controlled document corpora, OHG eliminates hallucinations and ensures that every recommendation is backed by trusted sources. Clinicians always know where an answer came from, why it was selected, and which documents support it. This transparency is essential for responsible AI use in healthcare.

In prototype form, OHG has already demonstrated how responsible AI can support clinicians with material selection, safety information, and context-specific guidance—especially in regions where access to high-quality reference materials is limited. Because the underlying framework is open and extensible, healthcare organizations, researchers, and government agencies can adapt OHG to new medical domains, local languages, and region-specific datasets without depending on proprietary platforms.

OpenHealthGuard’s goal is not merely to provide a tool, but to enable sustainable, community-driven innovation in responsible healthcare AI. Its open, modular design supports transparency, empowerment, and long-term impact

The beginning of a Retrieval Augmented Generation (RAG) system is to create a corpus of curated documents for a specific domain. These documents are the core of the RAG knowledge base. The pipeline consists of the following steps:

Identify corpus documents. Our oral health expert Dr. Dutta, defines search criteria for journal articles, documentations from manufacturers such as 3M, announcements from regulatory agencies regarding potential fraud and defective products, etc. The metadata for each document is saved in a CSV file. Each row is new document and each column a new type of data such as the URL for the complete document, the type of document, and the authors. Each row in the CSV file is parsed. The metadata becomes data properties in the knowledge graph. The URL allows us to pull in the text for the complete document.
Post processing of the string values allows us to transform "strings to things" and define new objects in the knowledge graph. This is the benefit of the Neurosymbolic approach. The knowledge graph and the logical ontology that it is built on provide additional context for users that go beyond traditional RAG systems where the corpus is stored in a relational database.
The text for each document is divided into chunks of 1 to 5 paragraphs. Each chunk is passed to the Text Embedding model of the LLM. In this case the Ada-003 model from Open AI. These vectors model the meaning of the text. The vectors and the knowledge graph are stored in the same Neurosymbolic knowledge base. In our case we utilize the AllegroGraph graph database from Franz Inc.

Once, the corpus and vectors are defined, the system can now process user questions.

The user (e.g., a dentist or other oral health clinician) asks a new question.
The system uses the same embedding model to create a vector for the new question.
A cosine distance function is used to find vectors in the corpus that are within a defined minimum semantic distance from the question. If no such vectors are found, the system will simply respond that it doesn't have enough information to answer the question. That is one of the most important differences between a RAG system and an LLM. An LLM will always generate an answer even when it has little knowledge, biased knowledge, or knowledge that is not adequate for the question. The RAG can measure if it has sufficient knowledge. This eliminates the issue of hallucinations. If there is one or more strings in the corpus within the semantic distance, this text and the vector are passed to the LLM along with the relevant knowledge graph objects.
The LLM uses the vectors and text to generate an answer. The answer as well as the text strings and the matching knowledge graph objects are returned to the user. The knowledge graph provides a wealth of additional context that the user can utilize to evaluate the answer and to search the knowledge graph for additional information with various visualization and NLP tools.