Skip to main content

🏢 Zhongguancun Laboratory

VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
·2597 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhongguancun Laboratory
VisualSimpleQA: A new benchmark for fine-grained evaluation of visual and linguistic modules in fact-seeking LVLMs.