Skip to main content

Embodied AI

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
·3498 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Zhejiang University
Embodied-Reasoner: Integrates visual search, reasoning, and action for interactive tasks, outperforming existing models in embodied environments.
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
·327 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Menlo Research
AlphaSpace enables robotic actions via semantic tokenization and symbolic reasoning, enhancing spatial intelligence in LLMs.
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
·4040 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 NVIDIA
Cosmos-Reason1: Physical AI models that reason and act in the real world, bridging the gap between perception and embodied decision-making.
Free-form language-based robotic reasoning and grasping
·1651 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Fondazione Bruno Kessler
FreeGrasp: enabling robots to grasp by interpreting instructions and reasoning about object spatial relationships.
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
·4598 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Peking University
Being-0: A humanoid robot agent achieves complex tasks by integrating a vision-language model with modular skills, enhancing efficiency and real-time performance.
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
·3847 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Fudan University
D2PO: World modeling enhances embodied task planning by jointly optimizing state prediction and action selection, leading to more efficient execution.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
·2233 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Tsinghua University
UniGoal: A novel framework for universal zero-shot goal-oriented navigation, outperforming task-specific methods with a unified approach.
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments
·1626 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Shenzhen Future Network of Intelligence Institute
CLEA: Enhancing task execution in dynamic environments with a closed-loop embodied agent.
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
·2325 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 MAIS, Institute of Automation, Chinese Academy of Sciences, China
PC-Agent: A new hierarchical framework that significantly improves complex task automation on PCs by 32%!
Magma: A Foundation Model for Multimodal AI Agents
·5533 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Microsoft Research
Magma: a new foundation model for multimodal AI agents excels at bridging verbal and spatial intelligence, achieving state-of-the-art performance across various tasks, including UI navigation and robo…
GenEx: Generating an Explorable World
·2719 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Johns Hopkins University
GenEx generates explorable 3D worlds from a single image, enabling embodied AI agents to explore and learn.