Skip to main content

🏢 University of Washington

Open Deep Search: Democratizing Search with Open-source Reasoning Agents
·1746 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of Washington
Open Deep Search (ODS): Democratizing Search with Open-source Reasoning Agents.
MusicInfuser: Making Video Diffusion Listen and Dance
·4650 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Washington
Sync your moves! MusicInfuser adapts video diffusion to make models listen and dance to music, preserving style and aligning movement.
Large-Scale Data Selection for Instruction Tuning
·2665 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
RDS+ is the unsung hero for scaling instruction tuning data selection!
Small Models Struggle to Learn from Strong Reasoners
·4149 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 University of Washington
Small language models struggle to learn complex reasoning from large models, but a novel ‘Mix Distillation’ method balances complexity for effective capability transfer.
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
·2452 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
LLMs struggle with complex logical reasoning; ZebraLogic benchmark reveals a ‘curse of complexity’, highlighting inherent limitations and guiding future research.
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
·3120 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington
Boosting visual reasoning in multimodal language models, AURORA leverages novel ‘Perception Tokens’ for improved depth estimation and object counting.
Negative Token Merging: Image-based Adversarial Feature Guidance
·2311 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Washington
NegToMe: Image-based adversarial guidance improves image generation diversity and reduces similarity to copyrighted content without training, simply by using images instead of negative text prompts.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
·4473 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington
GMAI-VL-5.5M & GMAI-VL: A new multimodal medical dataset and vision-language model achieve state-of-the-art results in various medical tasks.
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
·2784 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Washington
SAMURAI enhances the Segment Anything Model 2 for real-time, zero-shot visual object tracking by incorporating motion-aware memory and motion modeling, significantly improving accuracy and robustness.
Stronger Models are NOT Stronger Teachers for Instruction Tuning
·3212 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
Larger language models aren’t always better teachers for instruction tuning; a new metric, CAR, predicts teacher model effectiveness better than existing methods.