🏢 UC Berkeley
Segment Any Motion in Videos
·2413 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 UC Berkeley
New method for moving object segmentation by combining long-range motion cues, semantic features, and SAM2, achieving state-of-the-art performance in challenging scenarios.
Scaling Vision Pre-Training to 4K Resolution
·6421 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 UC Berkeley
PS3 scales CLIP vision pre-training to 4K resolution with near-constant cost, achieving state-of-the-art performance in multi-modal LLMs.
TULIP: Towards Unified Language-Image Pretraining
·3271 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 UC Berkeley
TULIP enhances image-text pretraining by unifying generative data augmentation with contrastive learning, achieving state-of-the-art performance in visual understanding.
Why Do Multi-Agent LLM Systems Fail?
·2168 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Robustness
🏢 UC Berkeley
Multi-Agent Systems (MAS) often underperform despite enthusiasm. This paper analyzes 5 popular frameworks across 150+ tasks, identifying 14 failure modes categorized into specification/design, inter-a…
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
·2441 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 UC Berkeley
Sim-to-real RL recipe achieves robust vision-based dexterous humanoid manipulation without human demos!
S*: Test Time Scaling for Code Generation
·2539 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Machine Learning
Deep Learning
🏢 UC Berkeley
S*: Hybrid test-time scaling for code generation, boosting both coverage and selection accuracy.
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
·4705 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Berkeley
Autellix: Efficient LLM Serving for Agents
Pre-training Auto-regressive Robotic Models with 4D Representations
·2752 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 UC Berkeley
ARM4R pre-trains autoregressive robotic models using low-level 4D representations from human videos, achieving efficient transfer learning and improved task performance across various environments.
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
·3137 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Berkeley
LLMs can be effectively taught complex reasoning via efficient fine-tuning on demonstration data focusing on structure, not content, of the reasoning process.
Lifelong Sequential Knowledge Editing without Model Degradation
·13067 words·62 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Berkeley
ENCORE enables lifelong sequential knowledge editing in LLMs without performance loss, achieving 10,000 edits while maintaining downstream accuracy.
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
·3663 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Berkeley
Reinforcement learning (RL) surpasses supervised fine-tuning (SFT) in fostering generalization in foundation models, while SFT aids RL’s stability; a comparative study across text and visual domains r…
FAST: Efficient Action Tokenization for Vision-Language-Action Models
·4290 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 UC Berkeley
FAST: A novel action tokenization method using discrete cosine transform drastically improves autoregressive vision-language-action models’ training and performance, enabling dexterous and high-freque…
An Empirical Study of Autoregressive Pre-training from Videos
·5733 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 UC Berkeley
Toto, a new autoregressive video model, achieves competitive performance across various benchmarks by pre-training on over 1 trillion visual tokens, demonstrating the effectiveness of scaling video mo…
Training Software Engineering Agents and Verifiers with SWE-Gym
·3604 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 UC Berkeley
SWE-Gym, a novel environment for training real-world software engineering agents using 2,438 real-world Python task instances, achieves new state-of-the-art performance and is publicly available.
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment
·2984 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Robotics
🏢 UC Berkeley
RAPL efficiently aligns robots with human preferences using minimal feedback by aligning visual representations before reward learning.
Predicting Emergent Capabilities by Finetuning
·6002 words·29 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Berkeley
Predicting emergent LLM capabilities is now possible by finetuning smaller models; this approach shifts the emergence point, enabling accurate predictions of future model performance, even with up to …