🏢 Stanford University

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

25 March 2025·4505 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Stanford University

Opt-CWM: Self-supervised motion learning via counterfactual optimization, achieving state-of-the-art without labels!

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

17 March 2025·4473 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Stanford University

MicroVQA: A new benchmark to test visual-question-answering in microscopy-based research.

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

7 March 2025·5279 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Stanford University

BRS: Streamlining real-world whole-body manipulation for household activities. It introduces a robot suite tackling robot dexterity with bimanual coordination, navigation, and end-effector reach.

CrossOver: 3D Scene Cross-Modal Alignment

20 February 2025·5760 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 Stanford University

CrossOver: Flexible scene-level cross-modal alignment via modality-agnostic embeddings, unlocking robust 3D scene understanding.

Auditing Prompt Caching in Language Model APIs

11 February 2025·5759 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Researchers expose widespread prompt caching in LLMs via novel timing attacks, highlighting significant privacy risks and model architecture leakage.

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

9 February 2025·507 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Language models learn effective social deduction strategies in a virtual game by using their goal to predict useful information as a dense reward signal, doubling win rates compared to standard RL.

Temporal Preference Optimization for Long-Form Video Understanding

23 January 2025·2626 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Stanford University

Boosting long-form video understanding, Temporal Preference Optimization (TPO) enhances video-LLMs by leveraging preference learning. It achieves this through a self-training method using preference …

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

2 January 2025·4247 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

BoxingGym: A new benchmark rigorously evaluates AI agents’ ability to design experiments and discover scientific models, revealing current LLMs’ limitations and highlighting fertile research avenues.

Whisper-GPT: A Hybrid Representation Audio Large Language Model

16 December 2024·1640 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Speech and Audio Audio Generation 🏢 Stanford University

Whisper-GPT, a hybrid audio LLM, improves music/speech generation by combining audio waveforms and text.

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

10 December 2024·3186 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Stanford University

FiVA dataset and its adaptation framework enable unprecedented fine-grained control over visual attributes in text-to-image generation, empowering users to craft highly customized images.

SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation

21 November 2024·2952 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Stanford University

SegBook: a large-scale benchmark, reveals that fine-tuning full-body CT pre-trained models significantly improves performance on various downstream medical image segmentation tasks, particularly for s…

RedPajama: an Open Dataset for Training Large Language Models

19 November 2024·7625 words·36 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

RedPajama, two massive open-source datasets, are released for training LLMs, improving transparency and facilitating the development of high-performing open-source models.