🏢 University of Washington

Open Deep Search: Democratizing Search with Open-source Reasoning Agents

26 March 2025·1746 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of Washington

Open Deep Search (ODS): Democratizing Search with Open-source Reasoning Agents.

MusicInfuser: Making Video Diffusion Listen and Dance

18 March 2025·4650 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Washington

Sync your moves! MusicInfuser adapts video diffusion to make models listen and dance to music, preserving style and aligning movement.

Large-Scale Data Selection for Instruction Tuning

3 March 2025·2665 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

RDS+ is the unsung hero for scaling instruction tuning data selection!

Small Models Struggle to Learn from Strong Reasoners

17 February 2025·4149 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Washington

Small language models struggle to learn complex reasoning from large models, but a novel ‘Mix Distillation’ method balances complexity for effective capability transfer.

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

3 February 2025·2452 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

LLMs struggle with complex logical reasoning; ZebraLogic benchmark reveals a ‘curse of complexity’, highlighting inherent limitations and guiding future research.

Byte Latent Transformer: Patches Scale Better Than Tokens

13 December 2024·4848 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

BLT: tokenizer-free LLM for efficiency and robustness

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

4 December 2024·3120 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington

Boosting visual reasoning in multimodal language models, AURORA leverages novel ‘Perception Tokens’ for improved depth estimation and object counting.

Negative Token Merging: Image-based Adversarial Feature Guidance

2 December 2024·2311 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Washington

NegToMe: Image-based adversarial guidance improves image generation diversity and reduces similarity to copyrighted content without training, simply by using images instead of negative text prompts.

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

21 November 2024·4473 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington

GMAI-VL-5.5M & GMAI-VL: A new multimodal medical dataset and vision-language model achieve state-of-the-art results in various medical tasks.

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

18 November 2024·2784 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Washington

SAMURAI enhances the Segment Anything Model 2 for real-time, zero-shot visual object tracking by incorporating motion-aware memory and motion modeling, significantly improving accuracy and robustness.

Stronger Models are NOT Stronger Teachers for Instruction Tuning

11 November 2024·3212 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

Larger language models aren’t always better teachers for instruction tuning; a new metric, CAR, predicts teacher model effectiveness better than existing methods.