↓Skip to main content

🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence

Referencing Where to Focus: Improving Visual Grounding with Referential Query

26 September 2024·2958 words·14 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence

RefFormer boosts visual grounding accuracy by intelligently adapting queries using multi-level image features, effectively guiding the decoder towards the target object.

Grounded Answers for Multi-agent Decision-making Problem through Generative World Model

26 September 2024·2428 words·12 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence

Generative world models enhance multi-agent decision-making by simulating trial-and-error learning, improving answer accuracy and explainability.