Skip to main content

🏢 University of Washington

Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
·3120 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington
Boosting visual reasoning in multimodal language models, AURORA leverages novel ‘Perception Tokens’ for improved depth estimation and object counting.
Negative Token Merging: Image-based Adversarial Feature Guidance
·2311 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Washington
NegToMe: Image-based adversarial guidance improves image generation diversity and reduces similarity to copyrighted content without training, simply by using images instead of negative text prompts.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
·4473 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Washington
GMAI-VL-5.5M & GMAI-VL: A new multimodal medical dataset and vision-language model achieve state-of-the-art results in various medical tasks.
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
·2784 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Washington
SAMURAI enhances the Segment Anything Model 2 for real-time, zero-shot visual object tracking by incorporating motion-aware memory and motion modeling, significantly improving accuracy and robustness.
Stronger Models are NOT Stronger Teachers for Instruction Tuning
·3212 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
Larger language models aren’t always better teachers for instruction tuning; a new metric, CAR, predicts teacher model effectiveness better than existing methods.