🏢 Shanghai Jiao Tong University
LEGION: Learning to Ground and Explain for Synthetic Image Detection
·3727 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai Jiao Tong University
LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.
Make Your Training Flexible: Towards Deployment-Efficient Video Models
·5609 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Shanghai Jiao Tong University
FluxViT: Flexible video models via adaptive token selection for efficient deployment!
Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
·2655 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Robotics
🏢 Shanghai Jiao Tong University
ADC: Human-robot collaboration revolutionizes data collection, slashing data needs and boosting robot learning!
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
·3985 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai Jiao Tong University
Q-Eval-100K: A new, large dataset for evaluating visual quality and text alignment in AI-generated content.
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
·2943 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai Jiao Tong University
Mask-DPO: Fine-grained Factuality Alignment improves LLMs’ factuality by masking sentence-level errors during DPO training for enhanced knowledge alignment.
SIFT: Grounding LLM Reasoning in Contexts via Stickers
·3144 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai Jiao Tong University
SIFT: Grounds LLM reasoning with ‘Stickers’ to highlight context and improve accuracy without extra training.
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
·3420 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Jiao Tong University
Show-o Turbo dramatically speeds up multimodal understanding and generation by leveraging parallel decoding and consistency distillation, achieving significant performance gains with fewer sampling st…
iFormer: Integrating ConvNet and Transformer for Mobile Application
·7046 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Shanghai Jiao Tong University
iFormer: A new family of mobile hybrid vision networks that expertly blends ConvNeXt’s fast local feature extraction with the efficient global modeling of self-attention, achieving top-tier accuracy a…
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
·3633 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 Shanghai Jiao Tong University
PC Agent: While you sleep, AI works! This AI system uses human cognition transfer to perform complex digital tasks, exceeding the capabilities of existing digital agents by efficiently learning from h…
Towards Universal Soccer Video Understanding
·2836 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Shanghai Jiao Tong University
Soccer video understanding gets a major boost with SoccerReplay-1988, the largest multi-modal dataset, and MatchVision, a new visual-language model achieving state-of-the-art performance on event clas…