Skip to main content

🏢 UC Berkeley

FAST: Efficient Action Tokenization for Vision-Language-Action Models
·4290 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley
FAST: A novel action tokenization method using discrete cosine transform drastically improves autoregressive vision-language-action models’ training and performance, enabling dexterous and high-freque…
An Empirical Study of Autoregressive Pre-training from Videos
·5733 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 UC Berkeley
Toto, a new autoregressive video model, achieves competitive performance across various benchmarks by pre-training on over 1 trillion visual tokens, demonstrating the effectiveness of scaling video mo…
Training Software Engineering Agents and Verifiers with SWE-Gym
·3604 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley
SWE-Gym, a novel environment for training real-world software engineering agents using 2,438 real-world Python task instances, achieves new state-of-the-art performance and is publicly available.
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment
·2984 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Robotics 🏢 UC Berkeley
RAPL efficiently aligns robots with human preferences using minimal feedback by aligning visual representations before reward learning.
Predicting Emergent Capabilities by Finetuning
·6002 words·29 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley
Predicting emergent LLM capabilities is now possible by finetuning smaller models; this approach shifts the emergence point, enabling accurate predictions of future model performance, even with up to …