🏢 Shanghai Jiao Tong University

LEGION: Learning to Ground and Explain for Synthetic Image Detection

19 March 2025·3727 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.

Make Your Training Flexible: Towards Deployment-Efficient Video Models

18 March 2025·5609 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Shanghai Jiao Tong University

FluxViT: Flexible video models via adaptive token selection for efficient deployment!

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

14 March 2025·2655 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Shanghai Jiao Tong University

ADC: Human-robot collaboration revolutionizes data collection, slashing data needs and boosting robot learning!

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

4 March 2025·3985 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

Q-Eval-100K: A new, large dataset for evaluating visual quality and text alignment in AI-generated content.

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

4 March 2025·2943 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

Mask-DPO: Fine-grained Factuality Alignment improves LLMs’ factuality by masking sentence-level errors during DPO training for enhanced knowledge alignment.

SIFT: Grounding LLM Reasoning in Contexts via Stickers

19 February 2025·3144 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

SIFT: Grounds LLM reasoning with ‘Stickers’ to highlight context and improve accuracy without extra training.

Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation

8 February 2025·3420 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Jiao Tong University

Show-o Turbo dramatically speeds up multimodal understanding and generation by leveraging parallel decoding and consistency distillation, achieving significant performance gains with fewer sampling st…

iFormer: Integrating ConvNet and Transformer for Mobile Application

26 January 2025·7046 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Shanghai Jiao Tong University

iFormer: A new family of mobile hybrid vision networks that expertly blends ConvNeXt’s fast local feature extraction with the efficient global modeling of self-attention, achieving top-tier accuracy a…

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

23 December 2024·3633 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Shanghai Jiao Tong University

PC Agent: While you sleep, AI works! This AI system uses human cognition transfer to perform complex digital tasks, exceeding the capabilities of existing digital agents by efficiently learning from h…

Towards Universal Soccer Video Understanding

2 December 2024·2836 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Shanghai Jiao Tong University

Soccer video understanding gets a major boost with SoccerReplay-1988, the largest multi-modal dataset, and MatchVision, a new visual-language model achieving state-of-the-art performance on event clas…