🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
·2803 words·14 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
MemVLT: Adaptive Vision-Language Tracking leverages memory to generate dynamic prompts, surpassing existing methods by adapting to changing target appearances.
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
·2454 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
CORY: a novel multi-agent RL framework boosts LLM fine-tuning!
Beyond Accuracy: Tracking more like Human via Visual Search
·2966 words·14 mins·
loading
·
loading
Computer Vision
Video Understanding
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
CPDTrack: Human-like Visual Search Boosts Object Tracking!