🏢 Zhejiang University, China
Neighboring Autoregressive Modeling for Efficient Visual Generation
·3102 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Zhejiang University, China
NAR: Neighboring Autoregressive Modeling for efficient visual generation by locality-preserved, parallel decoding.
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
·2632 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University, China
SegAgent: Improves MLLMs’ pixel understanding by mimicking human annotation, enabling mask refinement without altering output space.