Skip to main content

🏢 Zhejiang University, China

Neighboring Autoregressive Modeling for Efficient Visual Generation
·3102 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University, China
NAR: Neighboring Autoregressive Modeling for efficient visual generation by locality-preserved, parallel decoding.
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
·2632 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhejiang University, China
SegAgent: Improves MLLMs’ pixel understanding by mimicking human annotation, enabling mask refinement without altering output space.