Skip to main content

Embodied AI

Trajectory Diffusion for ObjectGoal Navigation
·2125 words·10 mins· loading · loading
Multimodal Learning Embodied AI 🏢 University of Chinese Academy of Sciences
Trajectory Diffusion (T-Diff) significantly improves object goal navigation by learning sequential planning through trajectory diffusion, resulting in more accurate and efficient navigation.
SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation
·2121 words·10 mins· loading · loading
Multimodal Learning Embodied AI 🏢 Tsinghua University
SG-Nav achieves state-of-the-art zero-shot object navigation by leveraging a novel 3D scene graph to provide rich context for LLM-based reasoning.
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation
·4206 words·20 mins· loading · loading
AI Generated Multimodal Learning Embodied AI 🏢 Peking University
MO-DDN: A new benchmark and coarse-to-fine exploration agent boosts embodied AI’s ability to handle multi-object, preference-based task planning.
Grounding Multimodal Large Language Models in Actions
·3629 words·18 mins· loading · loading
AI Generated Multimodal Learning Embodied AI 🏢 Apple
Researchers unveil unified architecture for grounding multimodal large language models in actions, showing superior performance with learned tokenization for continuous actions and semantic alignment …
GenRL: Multimodal-foundation world models for generalization in embodied agents
·2793 words·14 mins· loading · loading
Multimodal Learning Embodied AI 🏢 Ghent University
GenRL: Learn diverse embodied tasks from vision & language, without reward design, using multimodal imagination!
Exploratory Retrieval-Augmented Planning For Continual Embodied Instruction Following
·2661 words·13 mins· loading · loading
Multimodal Learning Embodied AI 🏢 Department of Computer Science and Engineering, Sungkyunkwan University
ExRAP: A novel framework boosts embodied AI’s continual instruction following by cleverly combining environment exploration with LLM-based planning, leading to significantly improved task success and …
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting
·2150 words·11 mins· loading · loading
Multimodal Learning Embodied AI 🏢 MIT
ARCHITECT: Generating realistic 3D scenes using hierarchical 2D inpainting!
Any2Policy: Learning Visuomotor Policy with Any-Modality
·1938 words·10 mins· loading · loading
AI Generated Multimodal Learning Embodied AI 🏢 Midea Group
Any2Policy: a unified multi-modal system enabling robots to perform tasks using diverse instruction and observation modalities (text, image, audio, video, point cloud).