🏢 UC Berkeley

Segment Any Motion in Videos

28 March 2025·2413 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 UC Berkeley

New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.

Scaling Vision Pre-Training to 4K Resolution

25 March 2025·6421 words·31 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 UC Berkeley

PS3 scales CLIP vision pre-training to 4K resolution with near-constant cost, achieving state-of-the-art performance in multi-modal LLMs.

TULIP: Towards Unified Language-Image Pretraining

19 March 2025·3271 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 UC Berkeley

TULIP enhances image-text pretraining by unifying generative data augmentation with contrastive learning, achieving state-of-the-art performance in visual understanding.

Why Do Multi-Agent LLM Systems Fail?

17 March 2025·2168 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Robustness 🏢 UC Berkeley

Multi-Agent Systems (MAS) often underperform despite enthusiasm. This paper analyzes 5 popular frameworks across 150+ tasks, identifying 14 failure modes categorized into specification/design, inter-a…

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids

27 February 2025·2441 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

Sim-to-real RL recipe achieves robust vision-based dexterous humanoid manipulation without human demos!

S*: Test Time Scaling for Code Generation

20 February 2025·2539 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 UC Berkeley

S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.

Autellix: An Efficient Serving Engine for LLM Agents as General Programs

19 February 2025·4705 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

Autellix: Efficient LLM Serving for Agents

Pre-training Auto-regressive Robotic Models with 4D Representations

18 February 2025·2752 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

ARM4R pre-trains autoregressive robotic models using low-level 4D representations from human videos, achieving efficient transfer learning and improved task performance across various environments.

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

11 February 2025·3137 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs can be effectively taught complex reasoning via efficient fine-tuning on demonstration data focusing on structure, not content, of the reasoning process.

Lifelong Sequential Knowledge Editing without Model Degradation

3 February 2025·13067 words·62 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

ENCORE enables lifelong sequential knowledge editing in LLMs without performance loss, achieving 10,000 edits while maintaining downstream accuracy.

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

28 January 2025·3663 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

Reinforcement learning (RL) surpasses supervised fine-tuning (SFT) in fostering generalization in foundation models, while SFT aids RL’s stability; a comparative study across text and visual domains r…

FAST: Efficient Action Tokenization for Vision-Language-Action Models

16 January 2025·4290 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

FAST: A novel action tokenization method using discrete cosine transform drastically improves autoregressive vision-language-action models’ training and performance, enabling dexterous and high-freque…

An Empirical Study of Autoregressive Pre-training from Videos

9 January 2025·5733 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 UC Berkeley

Toto, a new autoregressive video model, achieves competitive performance across various benchmarks by pre-training on over 1 trillion visual tokens, demonstrating the effectiveness of scaling video mo…

Training Software Engineering Agents and Verifiers with SWE-Gym

30 December 2024·3604 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 UC Berkeley

SWE-Gym, a novel environment for training real-world software engineering agents using 2,438 real-world Python task instances, achieves new state-of-the-art performance and is publicly available.

Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment

6 December 2024·2984 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Robotics 🏢 UC Berkeley

RAPL efficiently aligns robots with human preferences using minimal feedback by aligning visual representations before reward learning.

Predicting Emergent Capabilities by Finetuning

25 November 2024·6002 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

Predicting emergent LLM capabilities is now possible by finetuning smaller models; this approach shifts the emergence point, enabling accurate predictions of future model performance, even with up to …