Multi Agent based Medical Assistant for Edge Devices

2503.05397

Sakharam Gawade et el.

🤗 2025-03-13

TL;DR
#

Large Action Models face hurdles in healthcare due to privacy, latency, and internet reliance. This study introduces an on-device, multi-agent healthcare assistant to overcome these limitations. By using smaller, specialized agents, the system optimizes resource use, scalability, and performance, offering appointment scheduling, health monitoring, medication reminders, and health reporting. This solution addresses sensitive data handling, real-time responsiveness, and offline functionality, all crucial for healthcare applications.

Powered by the Qwen Code Instruct 2.5 7B model, the Planner and Caller Agents achieve impressive RougeL scores of 85.5 and 96.5, respectively, in planning and calling tasks, while remaining lightweight enough for edge deployment. This innovative approach merges the benefits of on-device systems with multi-agent architectures, paving the way for user-centric healthcare solutions with enhanced privacy and efficiency. A synthetic data pipeline is also created to finetune the models for planning and caller tasks.

Key Takeaways
#

Why does it matter?
#

This paper pioneers an edge-device healthcare assistant, sidestepping privacy and latency issues of cloud-based systems. It highlights the potential of multi-agent systems in personalized, secure healthcare, opening new avenues for on-device AI research and applications that prioritize user data protection and real-time responsiveness. The data creation pipeline can also be useful for other applications as well.

Visual Insights
#

🔼 This figure illustrates the communication pathways between the three main components of the multi-agent system: the Action Manager, the Health Manager, and the Memory Unit. The Action Manager oversees the planning and execution of tasks, coordinating the interactions between different agents. The Health Manager focuses on managing health data, generating reports, and handling medication reminders. The Memory Unit, crucial for personalized experiences, stores and retrieves both short-term (STM) and long-term (LTM) information. Arrows depict the flow of information between these components, showing how data and instructions are passed and processed for the overall functioning of the healthcare assistant.
read the caption
(a) Communication in the multi-agent system

Category	BLEU	Rouge1	Rouge2	RougeL
General	94.79	0.97	0.95	0.97
Counter	78.97	0.84	0.76	0.83
Negative	78.05	0.86	0.76	0.84
Dietician	74.44	0.80	0.71	0.78

🔼 This table presents the statistics of synthetic data generated for training the Planner and Caller agents in a multi-agent healthcare assistant system. The data includes various use cases categorized as Combined (a mix of general appointments, counter interactions, negative responses, and dietician referrals), Hard SOS (start and end scenarios), and Soft SOS. Each category represents different user interactions and system responses within the healthcare assistant’s workflow. The details provided allow for a comprehensive understanding of the dataset’s composition and balance across different interaction scenarios.
read the caption
Table 1: Statistics of Synthetic Data Created for Planner and Caller agents. Combined indicates mixture of appointment usecases (general, counter, negative, dietician), hard SOS (start and end) and soft SOS. General: Appointment booked after finding specialist for the symptoms, Counter: Follow-up questions asked to understand symptoms better, Negative: User declines the appointment, Dietician: Use is referred to dietician based on the symptoms

In-depth insights
#

Edge AI Healthcare
#

Edge AI in healthcare presents a compelling paradigm shift, addressing critical limitations of cloud-centric approaches. Privacy is paramount, as sensitive patient data remains on-device, mitigating risks associated with external data breaches. Reduced latency ensures real-time responsiveness, crucial for time-sensitive applications like emergency alerts and continuous health monitoring. Offline functionality guarantees uninterrupted operation, even without internet connectivity, vital in remote areas or during network outages. Resource optimization is achieved through task-specific agents, enabling efficient deployment on resource-constrained edge devices. Scalability is enhanced by modular agent architectures, allowing incremental functionality additions. Customization with various forms is possible, tailored to individual patient needs and preferences for personalized care.

Multi-Agent LAM
#

While the title ‘Multi-Agent LAM’ is not present, a multi-agent architecture incorporating Large Action Models (LAMs) offers interesting opportunities. By combining the reasoning capabilities of LAMs with a modular agent-based system, one could achieve task decomposition and parallel processing. Each agent, leveraging a smaller, specialized LAM, can focus on a specific sub-task, optimizing resource allocation and improving overall system efficiency. The use of multiple agents further ensures scalability and robustness, allowing easy adaption. Effective communication protocols and memory management strategies are important for a system to handle complex workflows.

Modular Scalability
#

While ‘Modular Scalability’ isn’t explicitly mentioned, the paper implicitly addresses it through a multi-agent architecture. Task-specific agents enable distributing workloads, optimizing resource allocation on edge devices. This approach fosters scalability by allowing incremental addition of agents for new functionalities, contrasting with monolithic models’ limitations. Each agent operates independently yet collaborates, realizing a ‘modular collaboration’. Scaling is achieved not by expanding individual agent size, but by integrating new, specialized agents. The architecture is well-suited to growing user needs and supports functional expansion without extensive retraining. The focus on lightweight, task-specific LoRA modules also facilitates modularity, enabling rapid adaptation and deployment of specialized healthcare solutions at scale.

Qwen-Coder Agent
#

While the provided text doesn’t directly discuss a “Qwen-Coder Agent”, its architecture relies on the Qwen 2.5-Coder-7B-Instruct model. This highlights the importance of specialized coding models in multi-agent systems, particularly for healthcare applications. The choice of Qwen, due to its compact size suitable for edge deployment and its performance, underscores a trade-off between model complexity and resource constraints. Further, the fine-tuning process using LoRA for planning and calling emphasizes that domain-specific adaptation is crucial for enhancing the abilities and aligning the model with the user’s specific needs. This finetuning is essential in order to incorporate planning and coding capabilities, further adapting the agent to be very useful in the field.

Synthetic Data
#

The research emphasizes synthetic data generation for healthcare AI agents, especially to address privacy and data scarcity. A pipeline is designed for creating this data, including steps for data format standardization, scenario trajectory generation using models, data enhancement via randomization and tool shuffling, verification against model failures, and finally interleaving for planner/caller agents. Data quality and diversity are crucial; therefore, the process involves checks, augmentations, and tool shufflings to make models robust. The focus is on creating a dataset that simulates user interactions, allowing AI agents to learn from diverse scenarios effectively and ensure their reasoning and decision-making capabilities.

More visual insights
#

More on figures

🔼 The figure illustrates the communication flow between the planner and caller agents within the Action Manager module. The planner agent, responsible for planning and generating action trajectories, sends the relevant information to the caller agent. The caller agent then executes the planned actions using various tools or APIs. This interaction is a key part of the multi-agent system’s workflow, showcasing how these agents collaborate to achieve complex healthcare tasks.
read the caption
(b) Communication between the planner and caller in Action Manager

🔼 This figure illustrates the multi-agent architecture of the healthcare assistant system. It showcases the three main components: the Action Manager, the Health Manager, and the Memory Unit (both short-term and long-term). The Action Manager coordinates the overall workflow by handling user prompts and managing agent interactions. The Health Manager houses several specialized agents for tasks such as health monitoring, medication reminders, and report generation. The Memory Unit acts as the system’s knowledge base, storing both short-term contextual information from ongoing interactions and long-term data such as patient history and preferences. The figure also displays how these components communicate and interact to provide a comprehensive healthcare assistance experience.
read the caption
Figure 1: Multi-Agent Design for Healthcare Assistant

🔼 This figure illustrates the three-tier architecture of the end-to-end (E2E) application. The frontend (user interface) handles user interaction, sending requests to the backend for processing. The backend manages agent models, data storage (SQLite database), and interacts with external services like Twilio (for notifications) and potentially a smart-watch simulator. The data layer (SQLite database) stores user data, interaction history, and other relevant information. The communication flow shows how user actions trigger processes at the backend, leading to responses generated by agent models and stored in the database. The system utilizes several components such as: Frontend (React.js based application), Backend (Django framework), SQLite database, Celery (for task queue management), Twilio (for SMS and notification), and Qwen-7b Coder Instruct model for the agent models.
read the caption
Figure 2: System flow diagram of the E2E application

🔼 The figure illustrates the data creation pipeline employed in the paper. It starts with collecting data, then standardizes the format for consistency across various use cases. Data generation involves creating synthetic trajectories that mimic realistic user interactions. This is followed by data enhancement techniques, including transformations to improve data quality and prevent overfitting. The data is then thoroughly verified to ensure accuracy and completeness, with any necessary updates being made to the generation pipeline for iterative improvement. Finally, data interleaving is performed to create separate datasets for the planner and caller agents, optimizing the training process of the models.
read the caption
Figure 3: Data Creation Process

🔼 This figure demonstrates the appointment booking process within the multi-agent healthcare assistant. It shows a multi-turn conversation between the user and the AI, where the user describes their symptoms (abdominal pain and nausea). The AI, through its planner and caller agents, interacts with the user, clarifying symptoms and suggesting a suitable specialist (Gastroenterologist) based on the user’s availability. The system then confirms the appointment time and date.
read the caption
Figure 4: Appointment Booking

🔼 This figure shows the user interface of the application, specifically focusing on the ‘Reminders’ section. The image shows how uploaded medication schedules are processed and displayed as reminders in the application. The left panel (a) shows how a prescription is uploaded, with the medication details extracted and automatically converted into reminders presented in a list-view in the right panel (b). The system uses the prescription data to create and schedule reminders for medication intake.
read the caption
(a) Medication Uploaded schedule Reminders

🔼 This figure shows the user interface after adding reminders from a prescription. The user has uploaded a prescription, and the system has automatically extracted relevant details such as medicine names, dosage timing, and frequency to create personalized reminders for the user. These reminders are displayed in a user-friendly format, making health management more convenient and efficient.
read the caption
(b) Reminder Added

🔼 This figure shows the process of adding medication reminders from a prescription. Panel (a) displays the interface for uploading a prescription image. Panel (b) shows the resulting added reminders in the application calendar, demonstrating how the system automatically extracts medication details from the uploaded prescription and creates reminders for the user.
read the caption
Figure 5: Adding Reminder from Prescription

🔼 This figure demonstrates a soft SOS alert triggered by the system due to abnormal vital signs detected from a simulated smartwatch. The abnormal vitals are displayed to the user. This soft SOS is a non-emergency alert designed to inform the user of potential health issues and does not automatically trigger emergency services.
read the caption
(a) Soft SOS triggered due to abnormal vitals (simulated)

🔼 This figure shows a snapshot of the application’s interface after the system has analyzed the user’s vital signs. The system continuously monitors vitals like heart rate, blood oxygen levels, etc. via a connected smartwatch or other device. If it detects irregularities, it triggers an alert. This display shows the processed vital sign data to the user and indicates whether an emergency response (SOS) was triggered. The system’s health monitoring functionality aids users in proactive health management.
read the caption
(b) User’s vitals analyzed

🔼 This figure shows a snapshot of the application’s interface when a soft SOS is triggered due to abnormal vital signs. The left panel displays the user’s vitals being monitored (heart rate, oxygen level, sleep stages etc.), showing values outside the normal range which triggers the soft SOS. The right panel shows a summary of the user’s vitals data with anomalies clearly indicated. The system doesn’t automatically trigger emergency services, but alerts the user about the unusual readings, prompting them to take necessary actions such as seeking medical advice.
read the caption
Figure 6: Soft SOS triggered and vitals analyzed

🔼 The figure shows the user interface for triggering a Hard SOS (hard emergency). It is part of a multi-agent healthcare assistant system designed for edge devices. The interface likely allows the user to initiate an emergency alert that would automatically notify emergency services and relevant contacts, potentially including location data via GPS.
read the caption
(a) Hard SOS-Interface

🔼 The figure shows a screenshot of an SMS message sent as part of the Hard SOS (Hard Software Override System) emergency response. The message is sent to the user’s designated emergency contacts, providing vital information such as the user’s name, location coordinates, and the urgent need for assistance. This SMS is triggered when a user activates the Hard SOS feature in the application, indicating a critical emergency situation requiring immediate attention. The inclusion of GPS coordinates facilitates rapid response and location of the user by emergency services.
read the caption
(b) Hard SOS-SMS

TL;DR#

Key Takeaways#

Why does it matter?#

Visual Insights#

In-depth insights#

Edge AI Healthcare#

Multi-Agent LAM#

Modular Scalability#

Qwen-Coder Agent#

Synthetic Data#

More visual insights#

Full paper#