Embodied AI

Trajectory Diffusion for ObjectGoal Navigation

26 September 2024·2125 words·10 mins· loading · loading

Multimodal Learning Embodied AI 🏢 University of Chinese Academy of Sciences

Trajectory Diffusion (T-Diff) significantly improves object goal navigation by learning sequential planning through trajectory diffusion, resulting in more accurate and efficient navigation.

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

26 September 2024·2121 words·10 mins· loading · loading

Multimodal Learning Embodied AI 🏢 Tsinghua University

SG-Nav achieves state-of-the-art zero-shot object navigation by leveraging a novel 3D scene graph to provide rich context for LLM-based reasoning.

MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation

26 September 2024·4206 words·20 mins· loading · loading

AI Generated Multimodal Learning Embodied AI 🏢 Peking University

MO-DDN: A new benchmark and coarse-to-fine exploration agent boosts embodied AI’s ability to handle multi-object, preference-based task planning.

Grounding Multimodal Large Language Models in Actions

26 September 2024·3629 words·18 mins· loading · loading

AI Generated Multimodal Learning Embodied AI 🏢 Apple

Researchers unveil unified architecture for grounding multimodal large language models in actions, showing superior performance with learned tokenization for continuous actions and semantic alignment …

GenRL: Multimodal-foundation world models for generalization in embodied agents

26 September 2024·2793 words·14 mins· loading · loading

Multimodal Learning Embodied AI 🏢 Ghent University

GenRL: Learn diverse embodied tasks from vision & language, without reward design, using multimodal imagination!

Exploratory Retrieval-Augmented Planning For Continual Embodied Instruction Following

26 September 2024·2661 words·13 mins· loading · loading

Multimodal Learning Embodied AI 🏢 Department of Computer Science and Engineering, Sungkyunkwan University

ExRAP: A novel framework boosts embodied AI’s continual instruction following by cleverly combining environment exploration with LLM-based planning, leading to significantly improved task success and …

Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting

26 September 2024·2150 words·11 mins· loading · loading

Multimodal Learning Embodied AI 🏢 MIT

ARCHITECT: Generating realistic 3D scenes using hierarchical 2D inpainting!

Any2Policy: Learning Visuomotor Policy with Any-Modality

26 September 2024·1938 words·10 mins· loading · loading

AI Generated Multimodal Learning Embodied AI 🏢 Midea Group

Any2Policy: a unified multi-modal system enabling robots to perform tasks using diverse instruction and observation modalities (text, image, audio, video, point cloud).