Skip to main content

🏢 CUHK

STEVE: AStep Verification Pipeline for Computer-use Agent Training
·3895 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 CUHK
STEVE: Step-verifying computer-use agent training.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
·2686 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 CUHK
Seg-Zero: Cognitive Reinforcement for Reasoning-Chain Guided Segmentation!
VisionZip: Longer is Better but Not Necessary in Vision Language Models
·7032 words·34 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 CUHK
VisionZip boosts vision-language model efficiency by intelligently selecting key visual tokens, achieving near-state-of-the-art performance with drastically reduced computational costs.