🏢 ECE & 2IPAI, Seoul National University
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
·2062 words·10 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 ECE & 2IPAI, Seoul National University
This paper introduces HVFA, a novel OCR-free document understanding framework using MLLMs and multi-scale visual features, achieving superior performance across various document understanding tasks.