Skip to main content

🏢 Beihang University

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
·2399 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Music Generation 🏢 Beihang University
SongGen: Single-stage autoregressive transformer for controllable text-to-song generation, simplifying the process and improving control.
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
·4108 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Beihang University
VideoEspresso: A new dataset and Hybrid LVLMs framework boost fine-grained video reasoning!