Skip to main content

🏢 National Key Laboratory for Novel Software Technology, Nanjing University

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
·4997 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 National Key Laboratory for Novel Software Technology, Nanjing University
CapArena: Detailed image caption benchmark in the LLM era, revealing metric biases and advancing automated evaluation.