↓Skip to main content

🏢 Beihang University

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset

25 March 2025·2413 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Beihang University

AccVideo accelerates video diffusion by 8.5x with a synthetic dataset and trajectory-based distillation, maintaining quality and enabling higher resolution video generation.

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

24 March 2025·3661 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Beihang University

Diffusion-4K: Synthesizing ultra-high-resolution images with a new benchmark dataset and wavelet-based fine-tuning that makes 4K image creation more detailed and accessible!

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

18 February 2025·2399 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Speech and Audio Music Generation 🏢 Beihang University

SongGen: Single-stage autoregressive transformer for controllable text-to-song generation, simplifying the process and improving control.

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

22 November 2024·4108 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Beihang University

VideoEspresso: A new dataset and Hybrid LVLMs framework boost fine-grained video reasoning!