HMT: HERMeS (HMT's Excerpt Retrieval Messaging System)
A Q&A chatbot which allows HMT staff to search HMT documents from the knowledge base, reducing time to find, read and access information.
Tier 1 Information
1 - Name
HERMeS (HMT鈥檚 Excerpt Retrieval Messaging System)
2 - Description
HERMeS is a Retrieval Augmented Generative (RAG) AI Chatbot
The Q&A chatbot allows HMT staff to search HMT documents from the knowledge base, reducing time to find, read and access information.
3 - Website URL
N/A
4 - Contact email
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
HM Treasury
1.2 - Team
Data Hub
1.3 - Senior responsible owner
Chief Data Officer
1.4 - External supplier involvement
No
1.4.1 - External supplier
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - External supplier role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
Generative AI chatbot which answers based only on HMT documents stored in SharePoint rather than built in training knowledge which comes from public information.
The chatbot processes staff鈥檚 questions related to HMT documents stored in SharePoint, using Retrieval Augmented Generation. The tool compares the user prompt with stored text and retrieves the most contextually similar extracts, which it then uses to generate an answer. The tool also references the documents from which the texts are retrieved and provides the list of the sources from which the information was retrieved. Users can continue refining their questions or ask new ones through the chatbot interface to get more specific answers.
2.2 - Scope
This tool is designed for:
- Quickly finding relevant information within a large set of documents
- Generating answers based on HMT specific, non-public documents and information
- Querying HMT guidance documents
Not designed for: - Performing analysis on data within said documents - Accessing publicly available information - Drafting entire documents or publications - Making the final decision as to if this is the information or document that officials need.
2.3 - Benefit
This tool saves HMT staff significant time in finding and reading through a large number of documents, many of which may not contain relevant information. The tool provides direct links and references to documents.
2.4 - Previous process
HMT staff were required to manually search for the documents they sort by searching SharePoint drives using keywords. Staff likely then needed to manually read and sift through multiple documents before they found the document or information they were looking for.
2.5 - Alternatives considered
N/A - No further relevant tools available to undertake this task to the same or better quality.
Tier 2 - Decision making Process
3.1 - Process integration
This tool serves as an interactive information gathering tool. It is down to the human to decide if this tool has retrieved the correct information. There is no fixed process that this tool is a part of - users can use this tool when looking for information in HMT stored on Microsoft SharePoint records.
3.2 - Provided information
The tool outputs to the user a text generated response using the most relevant text extracts from linked documents stored in SharePoint folder. The output format is in English plain text and in the markdown format if there鈥檚 tables that need formatting.
The user also receives a list of reference links to indicate where the information was found and where the user can find the information located.
3.3 - Frequency and scale of usage
Circa 20-30 staff use HERMES on at least a weekly basis - exact numbers can fluctuate. The tool is available to all HM Treasury staff upon request.
3.4 - Human decisions and review
The generated text does not provide an opinion as to if it is the correct information. Generated text must be validated against the source documents by a human. The tool only retrieves information from available information in SharePoint and not publicly available information as an answer.
The user can make further clarification questions to refine the answers that the chatbot provides, or undertake a new search until they get the answer they were looking for.
The user can also review the documents that the answers have been pulled from, as the chatbot response provides a direct reference to where the information has been pulled from.
3.5 - Required training
Users: There are specific instructions on how to use the tool presented to the user on the landing page when they accesses the app. Guidance is available to users via the intranet and the HMT provide prompt engineering and responsible AI use workshops face to face.
Developers: The tool was designed by an experienced team of HMT data scientists. To be able to build and maintain a GenAI tool developers would need skills such as: a strong foundation in machine learning, specifically in NLP and LLMs, to understand model integration and prompt engineering. Proficiency in cloud services, particularly Azure, is crucial, including Azure Storage for data storage.
The Data Hub has produced AI guidance which details best practices which references CDDO AI usage guidance available to all staff.
3.6 - Appeals and review
For users within HMT, a form to give general feedback and report issues is linked from within the tool.
Tier 2 - Tool Specification
4.1.1 - System architecture
A Logic App monitors changes in a SharePoint directory and mirrors these changes in Azure Blob Storage, triggering the Indexer Container App which then uploads the texts in vector format to HERMeS. The HERMeS Container App handles user interactions, generates outputs, and uploads files to Blob Storage while notifying the Indexer. Azure Storage serves as data storage. An Indexer Container App reads from Blob Storage and processes the data by embedding, uploading, and deleting information, sending embedded vectors to Azure AI Search. Azure AI Search uses these vectors to perform searches based on the user prompt received from the HERMeS Container App. Finally, Azure OpenAI processes user messages and chat history, using context from AI Search to generate responses.
4.1.2 - Phase
Production
4.1.3 - Maintenance
The tool undergoes ad-hoc maintenance when a bug is reported. Users can report bugs using a form linked within the tool or via email.
The tool management team review the level of tool usage as well as development requirements on a weekly basis. Guidance is updated on an ad-hoc basis.
4.1.4 - Models
OpenAI GPT-4o Large Language Model
Tier 2 - Model Specification
4.2.1 - Model name
OpenAI GPT
4.2.2 - Model version
4o
4.2.3 - Model task
Image to text/Text to text generation
4.2.4 - Model input
Text or image file
4.2.5 - Model output
Text
4.2.6 - Model architecture
GPT-4o, the latest in OpenAI鈥檚 Generative Pre-trained Transformer (GPT) family, is built on a transformer architecture designed for multimodal tasks. This model uses an autoregressive setup, enabling it to handle both text and image inputs while producing text and image outputs. Structurally, GPT-4o leverages an extensive parameter count, with estimates suggesting up to 1.76 trillion parameters, and includes larger context windows than previous models, allowing for up to 128,000 tokens to be processed at once in its Turbo variant. These enhancements are aimed at improving language understanding, reasoning, and factual accuracy.
GPT-4o was trained with Reinforcement Learning from Human Feedback (RLHF), which integrates real-world usage and feedback into its optimisation cycle, making it 82% less likely to respond with disallowed content and 40% more likely to generate accurate responses than its predecessor, GPT-3.5. You can explore GPT-4o in depth through OpenAI鈥檚 official documentation and technical report.
4.2.7 - Model performance
From OpenAI: MMLU (Massive Multitask Language Understanding): 88.7% accuracy 帽 tests knowledge across 57 subjects.
GPoQA (General Population Question Answering): 53.6% accuracy 帽 measures QA ability on general knowledge.
Math: 60.1% 帽 tests complex problem-solving in maths.
HumanEval: 90.2% 帽 evaluates code generation quality.
MGSM (Math General Skills): 90.5% 帽 for general maths skills.
DROP (Discrete Reasoning Over Paragraphs): 83.4% 帽 assesses reading comprehension.
Metrics found here:
4.2.8 - Datasets
GPT4o is trained on publicly available data on the web.
HERMeS, which is built on GPT4o accesses data stored on HMT SharePoint sites which are managed by individual teams.
4.2.9 - Dataset purposes
HMT SharePoint data is used as an additional input when generating answers in HERMeS. It is used also used for testing and validation by checking retrieved text against the references produced in the generated text.
Tier 2 - Data Specification
4.3.1 - Source data name
HMT SharePoint data
4.3.2 - Data modality
Text
4.3.3 - Data description
HMT documents from various different teams and functions across the department. The content is highly varied and depends on the uploading team鈥檚 responsibility and remit. The data is made up of guidance document, policy info, as well as newsletters and emails.
4.3.4 - Data quantities
At the time of writing, 341 documents are stored across 13 accounts, total storage is currently at 150MB. This will increase as Hermes takes on more users.
Files consist mostly of pdf, docx, msg, xlsx, pptx files.
When new users request and account, a new SharePoint folder is linked which is accessible from this new account.
4.3.5 - Sensitive attributes
Hermes accounts are safeguarded through password access. HERMeS accounts can only be linked to SharePoint folders that the users already have permission to access. The tool does not retain data to be used across different instances or accounts and therefore users cannot use it to access data they do not have access to otherwise.
4.3.6 - Data completeness and representativeness
The information that is used by this tool was created, reviewed and published, as such it should contain no missing data. The team have not identified any missing data. The documentation/guidance owner was content at the time of publishing their information. The owner continues to control the documentation uploaded onto SharePoint and this is where the tool takes its information from.
4.3.7 - Source data URL
N/A
4.3.8 - Data collection
All relevant documents which the user would need to query are first uploaded to SharePoint for review and access by staff. These documents have been then mirrored into HERMeS鈥� knowledge base in vector format for retrieve and recall.
4.3.9 - Data cleaning
N/A
4.3.10 - Data sharing agreements
None
4.3.11 - Data access and storage
All data is stored within Azure Blob Storage and are accessible only to designated maintainers within the Data Hub.
Account access is password controlled. Each team has their own login details to HERMeS which is linked to a specific location in SharePoint and Blob Storage.
No one account can generate answers based on information provided by any other account.
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessment
The quality of the generated outputs have been assessed regularly to ensure that they are relevant and representative of the information provided to the tool.
Focus groups have also been arranged to test the tool and its user experience during the development phase.
From some of these evaluations, the system prompt (which guides the format of the output), the number of retrieved texts, and the search method have been adjusted.
5.2 - Risks and mitigations
The main risk identified is that generated text may not be accurate, or the a prompt could induce hallucinations in the output.
To mitigate against this risk, colleagues are advised to check outputs from ALL generative AI tools. HERMeS references the source documents with every answer to make this easy.