↓Skip to main content

🏢 ECE & 2IPAI, Seoul National University

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

26 September 2024·2062 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 ECE & 2IPAI, Seoul National University

This paper introduces HVFA, a novel OCR-free document understanding framework using MLLMs and multi-scale visual features, achieving superior performance across various document understanding tasks.