🏢 University of Washington
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
·3120 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Washington
Boosting visual reasoning in multimodal language models, AURORA leverages novel ‘Perception Tokens’ for improved depth estimation and object counting.
Negative Token Merging: Image-based Adversarial Feature Guidance
·2311 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Washington
NegToMe: Image-based adversarial guidance improves image generation diversity and reduces similarity to copyrighted content without training, simply by using images instead of negative text prompts.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI
·4473 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 University of Washington
GMAI-VL-5.5M & GMAI-VL: A new multimodal medical dataset and vision-language model achieve state-of-the-art results in various medical tasks.
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
·2784 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Washington
SAMURAI enhances the Segment Anything Model 2 for real-time, zero-shot visual object tracking by incorporating motion-aware memory and motion modeling, significantly improving accuracy and robustness.
Stronger Models are NOT Stronger Teachers for Instruction Tuning
·3212 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Washington
Larger language models aren’t always better teachers for instruction tuning; a new metric, CAR, predicts teacher model effectiveness better than existing methods.