CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

2503.00729

Mingcong Lei et el.

🤗 2025-03-04

TL;DR
#

Large Language Models (LLMs) have shown promise in task planning, but struggle with long-horizon tasks and dynamic environments. Existing systems lack adaptability and real-time environmental awareness, hindering their robustness in real-world scenarios. To tackle this, current integrations still face limits in feedback quality and struggle with continuous adaptation and deployment complexities.

This paper introduces CLEA, a novel architecture using four specialized LLMs for closed-loop task management. It features an interactive task planner for dynamic subtask generation and a multimodal execution critic for assessing action feasibility. CLEA enhances robustness in object search and manipulation tasks, outperforming baselines with significant improvements in success and completion rates.

Key Takeaways
#

Why does it matter?
#

This paper introduces CLEA, advancing embodied AI by enhancing task planning & execution in dynamic environments. Its closed-loop architecture & open-source LLMs offer researchers a robust, reproducible platform for complex robotic tasks, pushing forward real-world AI applications.

Visual Insights
#

🔼 CLEA’s architecture is depicted in this figure. Environmental data is received by the Observer (VLM), which then sends it to the Summarizer (LLM). This module creates a memory representation that is used by the Planner (LLM) to create an initial sequence of actions based on the robot’s skill set and memory. The Critic (VLM) examines the actions’ feasibility and proposes alternative plans as necessary in response to changing conditions in the environment. This closed-loop system enables continuous adaptation to real-time situations.
read the caption
Figure 2: Overview of CLEA. The observer (VLM) provides environmental data, which the summarizer (LLM) processes into memory. The planner (LLM) generates an initial action sequence based on the robot’s skill pool and memory, while the critic (VLM) evaluates action feasibility and offers re-plan recommendations in response to environmental dynamics.

Skill pool	Description
open(robot, openable_object)	robot open object
close(robot, openable_object)	robot close object
pick_from(robot, object, space)	robot pick object from space
release_to(robot, space)	robot release the object on its hand
	to space
go_to(robot, navi_point)	robot navigate to navigation point

🔼 This table lists the predefined skills available to the robots within the CLEA system’s environment. Each skill is a function call that the planner can use to direct the robots’ actions. These skills encompass basic functionalities such as opening and closing containers, picking up and releasing objects, and navigating to specific points in the environment. The descriptions provide concise explanations of each skill’s function.
read the caption
TABLE I: Predefined skill pool in the environment

In-depth insights
#

CLEA: Overview
#

Based on the text, the CLEA framework is illustrated as having three main components: an observer, a memory module, and a planner-critic agent. The observer is key for converting visual data into a format usable by the language models, bridging the gap between what the robot sees and what the language model can understand. The memory module maintains a structured belief about the environment, using a history buffer of interactions and a summarizer to create beliefs based on this history. The planner-critic agent then tackles dynamic planning, divided into two sub-parts: the planner, which sets sub-goals and action sequences, and the critic, which evaluates the plan at each step, re-adjusting as needed. CLEA helps the robot understand its behavior, recognize when its current strategy is suboptimal, and correct in real-time.

Planner-Critic
#

The Planner-Critic module is a crucial component for closed-loop decision-making in embodied agents. It likely involves two sub-modules: the Planner, responsible for generating action sequences based on current beliefs and environmental information, and the Critic, which evaluates the feasibility and effectiveness of those actions in real-time. The Planner likely employs hierarchical planning, generating sub-goals and action sequences to achieve them, while the Critic leverages sensory input and contextual understanding to assess action validity. The Critic’s feedback is essential for re-planning and adapting to dynamic environments, ensuring robustness and flexibility. The interaction between these modules enables the agent to dynamically adjust its plan in response to unexpected events or environmental changes, thereby facilitating successful task completion and error recovery.

Env. Dynamics
#

In embodied AI, environmental dynamics pose significant challenges. Traditional task planning often struggles with unpredictable changes in object states and spatial relationships, leading to failures in long-horizon tasks. Robustness necessitates continuous adaptation through closed-loop feedback mechanisms, where agents perceive the environment, reason about actions, and execute accordingly. Key considerations include handling partial observability, as robots only have limited sensory input, and maintaining consistent task state tracking despite environmental perturbations. Incorporating memory and predictive models can aid in anticipating changes and refining plans in real-time. Addressing these complexities is crucial for developing truly adaptable and reliable robotic systems operating in dynamic real-world scenarios where disturbances frequently occur.

Robustness study
#

Robustness in embodied AI systems, like CLEA, is paramount for real-world deployment. It entails the ability to maintain performance despite environmental changes, object misplacements, and unexpected robot behaviors. A rigorous robustness study involves testing the agent across diverse scenarios and tasks. Key metrics include success rate and average score, reflecting both task completion and action efficiency. Ablation studies are crucial for identifying the contribution of specific modules, such as the critic, in ensuring robustness. Analyzing failure modes, like outdated actions or critic errors, pinpoints areas for improvement. Benchmarking against simpler open-loop agents highlights the advantages of closed-loop planning in handling dynamic environments. Understanding robustness in embodied AI systems is essential for trustworthy application.

Failure analysis
#

The failure analysis section of the paper offers valuable insights into the limitations and potential areas for improvement in the CLEA framework. The identification of “Invalid actions” as the most frequent failure mode highlights a crucial area where the LLM struggles with adhering to the predefined action format. This suggests a need for refining the interface between the LLM planner and the robotic platform, potentially through improved prompt engineering or a more flexible action representation. The “Critic failures”, where the critic fails to identify improper actions, underscores the limitations in the VLM’s perceptual capabilities. This calls for exploring more advanced visual reasoning techniques or incorporating additional sensory input to enhance the critic’s ability to accurately assess the environment and action feasibility. The “Multi-robot collaboration issues” point to a challenge in coordinating multiple agents, indicating that the LLMs are not particularly adept at understanding and managing complex inter-robot relationships. This suggests a direction for future work involving incorporating more sophisticated multi-agent reasoning capabilities into the CLEA framework.

Type	Reason	Count	Total
Critic	Outdated actions	8	44.4%
	Redundant actions	6	33.3%
	Invalid actions	2	11.1%
	Wrong planning	2	11.1%

Type	Reason	Count	Total
Failure	Invalid actions	9	45%
	Critic failures	8	26.7%
	Multi-robot collaboration	3	15%

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

TL;DR
#

Key Takeaways
#

Why does it matter?
#

Visual Insights
#

In-depth insights
#

CLEA: Overview
#

Planner-Critic
#

Env. Dynamics
#

Robustness study
#

Failure analysis
#

More visual insights
#

Full paper
#

TL;DR#

Key Takeaways#

Why does it matter?#

Visual Insights#

In-depth insights#

CLEA: Overview#

Planner-Critic#

Env. Dynamics#

Robustness study#

Failure analysis#

More visual insights#

Full paper#

TL;DR
#

Key Takeaways
#

Why does it matter?
#

Visual Insights
#

In-depth insights
#

CLEA: Overview
#

Planner-Critic
#

Env. Dynamics
#

Robustness study
#

Failure analysis
#

More visual insights
#

Full paper
#