Skip to main content

🏢 Nanyang Technological University

RepVideo: Rethinking Cross-Layer Representation for Video Generation
·2785 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University
RepVideo enhances text-to-video generation by enriching feature representations, resulting in significantly improved temporal coherence and spatial detail.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
·1895 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University
SeedVR: A novel diffusion transformer revolutionizes generic video restoration by efficiently handling arbitrary video lengths and resolutions, achieving state-of-the-art performance.
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·2611 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University
Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·2401 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University
FreeScale generates stunning 8K images and high-fidelity videos without retraining.
Arbitrary-steps Image Super-resolution via Diffusion Inversion
·3889 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University
InvSR: a novel image super-resolution technique using diffusion inversion, enabling flexible sampling steps for efficient and high-fidelity results.
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
·3506 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University
ObjCtrl-2.5D: Training-free, precise image-to-video object control using 3D trajectories and camera poses.
Trajectory Attention for Fine-grained Video Motion Control
·4421 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University
Trajectory Attention enhances video motion control by injecting trajectory information, improving precision and long-range consistency in video generation.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
·3637 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University
Omegance: One parameter precisely controls image detail in diffusion models, enabling flexible granularity adjustments without model changes or retraining.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
·2778 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University
SAR3D: Blazing-fast autoregressive 3D object generation and understanding using a multi-scale VQVAE, achieving sub-second generation and detailed multimodal comprehension.
Novel View Extrapolation with Video Diffusion Priors
·2381 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanyang Technological University
ViewExtrapolator leverages Stable Video Diffusion to realistically extrapolate novel views far beyond training data, dramatically improving the quality of 3D scene generation.
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
·3966 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University
VBench++: A new benchmark suite meticulously evaluates video generative models across 16 diverse dimensions, aligning with human perception for improved model development and fairer comparisons.