π’ Nanyang Technological University
MagicArticulate: Make Your 3D Models Articulation-Ready
·4321 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Nanyang Technological University
MagicArticulate automates 3D model animation preparation by generating skeletons and skinning weights, overcoming prior manual methods’ limitations, and introducing Articulation-XL, a large-scale benc…
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
·4575 words·22 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
Video-MMMU benchmark systematically evaluates Large Multimodal Modelsβ knowledge acquisition from videos across multiple disciplines and cognitive stages, revealing significant gaps between human and …
RepVideo: Rethinking Cross-Layer Representation for Video Generation
·2785 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
RepVideo enhances text-to-video generation by enriching feature representations, resulting in significantly improved temporal coherence and spatial detail.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
·1895 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
SeedVR: A novel diffusion transformer revolutionizes generic video restoration by efficiently handling arbitrary video lengths and resolutions, achieving state-of-the-art performance.
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·2611 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Nanyang Technological University
Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·2401 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Nanyang Technological University
FreeScale generates stunning 8K images and high-fidelity videos without retraining.
Arbitrary-steps Image Super-resolution via Diffusion Inversion
·3889 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Nanyang Technological University
InvSR: a novel image super-resolution technique using diffusion inversion, enabling flexible sampling steps for efficient and high-fidelity results.
ObjCtrl-2.5D: Training-free Object Control with Camera Poses
·3506 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
ObjCtrl-2.5D: Training-free, precise image-to-video object control using 3D trajectories and camera poses.
Trajectory Attention for Fine-grained Video Motion Control
·4421 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
Trajectory Attention enhances video motion control by injecting trajectory information, improving precision and long-range consistency in video generation.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
·3637 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Nanyang Technological University
Omegance: One parameter precisely controls image detail in diffusion models, enabling flexible granularity adjustments without model changes or retraining.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
·2778 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Nanyang Technological University
SAR3D: Blazing-fast autoregressive 3D object generation and understanding using a multi-scale VQVAE, achieving sub-second generation and detailed multimodal comprehension.
Novel View Extrapolation with Video Diffusion Priors
·2381 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Nanyang Technological University
ViewExtrapolator leverages Stable Video Diffusion to realistically extrapolate novel views far beyond training data, dramatically improving the quality of 3D scene generation.
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
·3966 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Nanyang Technological University
VBench++: A new benchmark suite meticulously evaluates video generative models across 16 diverse dimensions, aligning with human perception for improved model development and fairer comparisons.