🏢 CUHK
STEVE: AStep Verification Pipeline for Computer-use Agent Training
·3895 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 CUHK
STEVE: Step-verifying computer-use agent training.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
·2686 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 CUHK
Seg-Zero: Cognitive Reinforcement for Reasoning-Chain Guided Segmentation!
VisionZip: Longer is Better but Not Necessary in Vision Language Models
·7032 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 CUHK
VisionZip boosts vision-language model efficiency by intelligently selecting key visual tokens, achieving near-state-of-the-art performance with drastically reduced computational costs.