🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence
Referencing Where to Focus: Improving Visual Grounding with Referential Query
·2958 words·14 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence
RefFormer boosts visual grounding accuracy by intelligently adapting queries using multi-level image features, effectively guiding the decoder towards the target object.
Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
·2428 words·12 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence
Generative world models enhance multi-agent decision-making by simulating trial-and-error learning, improving answer accuracy and explainability.