Skip to main content

AI Applications

FAST: Efficient Action Tokenization for Vision-Language-Action Models
·4290 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley
FAST: A novel action tokenization method using discrete cosine transform drastically improves autoregressive vision-language-action models’ training and performance, enabling dexterous and high-freque…
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
·3400 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 AgiBot
EnerVerse: A novel framework seamlessly integrates convolutional and attention mechanisms to generate embodied future spaces for enhanced robotic manipulation, mitigating data scarcity with a generati…
A3: Android Agent Arena for Mobile GUI Agents
·2276 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Hong Kong University of Science and Technology
Android Agent Arena (A3): A novel evaluation platform for mobile GUI agents offering diverse tasks, flexible action space, and automated LLM-based evaluation, advancing real-world AI agent research.
Training Software Engineering Agents and Verifiers with SWE-Gym
·3604 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley
SWE-Gym, a novel environment for training real-world software engineering agents using 2,438 real-world Python task instances, achieves new state-of-the-art performance and is publicly available.
LearnLM: Improving Gemini for Learning
·4335 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Education 🏢 Google DeepMind
LearnLM enhances Gemini for education by training it to follow pedagogical instructions, leading to significant preference improvements over GPT-40, Claude 3.5, and Gemini 1.5 Pro in diverse learning …
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
·5162 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 MIT
MoDE makes AI for robot control faster and more efficient.
Large Action Models: From Inception to Implementation
·2938 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Microsoft
From language models to action models: building AI that does things.
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning
·1675 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Princeton University
TidyBot++: Low-cost, open-source holonomic mobile base makes robot learning easier.
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
·3880 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Westlake University
CARP: A novel visuomotor policy learning paradigm achieves high accuracy and 10x faster inference than state-of-the-art by combining autoregressive efficiency and diffusion model precision through a c…
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
·3555 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 University of Hong Kong
Moto: Bridging language for robot manipulation using latent motion tokens, achieving superior performance with limited data.
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
·6193 words·30 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Peking University
Code-as-Monitor (CaM) uses vision-language models and constraint-aware visual programming to achieve both reactive and proactive robotic failure detection in real-time, improving success rates and red…
WildLMa: Long Horizon Loco-Manipulation in the Wild
·2396 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC San Diego
WildLMa enables robots to perform complex, long-horizon manipulation tasks in unstructured environments by combining language-conditioned imitation learning, a whole-body controller for efficient tele…
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
·2924 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 Institute of Artificial Intelligence, Huazhong University of Science and Technology
DiffusionDrive: a novel truncated diffusion model achieves real-time, high-quality end-to-end autonomous driving by leveraging multi-mode action distributions and significantly reducing computational …
Soft Robotic Dynamic In-Hand Pen Spinning
·2419 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Carnegie Mellon University
SWIFT, a new system, enables a soft robotic hand to learn dynamic pen spinning via real-world trial-and-error, achieving 100% success across diverse pen properties without explicit object modeling.
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
·614 words·3 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Show Lab, National University of Singapore
Claude 3.5 Computer Use: A groundbreaking AI model offering public beta graphical user interface (GUI) agent for computer use is comprehensively analyzed in this research. This study provides an out-o…
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks
·1636 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 Paris Research Center, Huawei Technologies
Hermes, a novel LLM-based framework, automates cellular network modeling by generating explainable ‘blueprints’ for constructing Network Digital Twins (NDTs), paving the way for fully autonomous netwo…
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
·2203 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 New York University
DynaMem empowers robots with online dynamic spatio-semantic memory, achieving a 2x improvement in pick-and-drop success rate on non-stationary objects compared to static systems.
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
·3111 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Tsinghua University
DeeR-VLA dynamically adjusts the size of a multimodal large language model based on task difficulty, significantly reducing computational cost and memory usage in robotic control without compromising …
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks
·6756 words·32 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Southeast University
Collaborative Assistant for Personalized Exploration (CARE) enhances LLM chatbots for exploratory tasks by combining a multi-agent framework with a structured interface, delivering tailored solutions …