Skip to main content

🏢 ECE & 2IPAI, Seoul National University

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
·2062 words·10 mins· loading · loading
Multimodal Learning Vision-Language Models 🏢 ECE & 2IPAI, Seoul National University
This paper introduces HVFA, a novel OCR-free document understanding framework using MLLMs and multi-scale visual features, achieving superior performance across various document understanding tasks.