🏢 Beihang University
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
·2413 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Beihang University
AccVideo accelerates video diffusion by 8.5x with a synthetic dataset and trajectory-based distillation, maintaining quality and enabling higher resolution video generation.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
·3661 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Beihang University
Diffusion-4K: Synthesizing ultra-high-resolution images with a new benchmark dataset and wavelet-based fine-tuning that makes 4K image creation more detailed and accessible!
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
·2399 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Music Generation
🏢 Beihang University
SongGen: Single-stage autoregressive transformer for controllable text-to-song generation, simplifying the process and improving control.
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
·4108 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Beihang University
VideoEspresso: A new dataset and Hybrid LVLMs framework boost fine-grained video reasoning!