Computer Vision
Adaptive Blind All-in-One Image Restoration
·3727 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Restoration
🏢 Computer Vision Center
Adaptive Blind All-in-One Image Restoration (ABAIR) efficiently handles diverse image degradations, generalizes well to unseen distortions, and easily incorporates new ones via efficient fine-tuning.
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
·2596 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Toronto
AC3D achieves precise 3D camera control in video diffusion transformers by analyzing camera motion’s spectral properties, optimizing pose conditioning, and using a curated dataset of dynamic videos.
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
·4024 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Peking University
WF-VAE boosts video VAE performance with wavelet-driven energy flow and causal caching, enabling 2x higher throughput and 4x lower memory usage in latent video diffusion models.
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
·3637 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
Omegance: One parameter precisely controls image detail in diffusion models, enabling flexible granularity adjustments without model changes or retraining.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
·4827 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 DFKI
MARVEL-40M+ & MARVEL-FX3D: 40M+ high-quality 3D annotations & a fast two-stage text-to-3D pipeline enabling high-fidelity 3D model generation within 15 seconds.
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
·2775 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
ConsisID achieves high-quality, identity-preserving text-to-video generation using a tuning-free diffusion transformer model that leverages frequency decomposition for effective identity control.
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting
·2489 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Dalian University of Technology
DreamMix enhances image inpainting by disentangling object attributes for precise editing, enabling both identity preservation and flexible text-driven modifications.
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
·3048 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Samsung R&D Institute UK
DreamCache enables efficient, high-quality personalized image generation without finetuning by caching reference image features and using lightweight conditioning adapters.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
·3600 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 National University of Singapore
Collaborative Decoding (CoDe) dramatically boosts visual auto-regressive model efficiency.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
·2950 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Xi'an Jiaotong University
ChatGen-Evo automates text-to-image generation from freestyle chatting, simplifying the process and significantly improving performance over existing methods.
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
·2812 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
AnchorCrafter animates cyber-anchors selling products via human-object interacting video generation, achieving high visual fidelity and controllable interactions.
VisualLens: Personalization through Visual History
·2160 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Visual Question Answering
🏢 Meta
VisualLens leverages user visual history for personalized recommendations, improving state-of-the-art by 5-10% and exceeding GPT-4’s performance.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
·3638 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Twelve Labs
SplatFlow: A novel multi-view rectified flow model enabling direct 3D Gaussian splatting generation & training-free editing for diverse 3D tasks.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
·2778 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Nanyang Technological University
SAR3D: Blazing-fast autoregressive 3D object generation and understanding using a multi-scale VQVAE, achieving sub-second generation and detailed multimodal comprehension.
Pathways on the Image Manifold: Image Editing via Video Generation
·3449 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Technion - Israel Institute of Technology
Image editing is revolutionized by Frame2Frame, which uses video generation to produce seamless and accurate edits, preserving image fidelity.
One Diffusion to Generate Them All
·4521 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 UC Irvine
OneDiffusion: A single diffusion model masters image synthesis & understanding across diverse tasks, from text-to-image to depth estimation, pushing the boundaries of AI.
Learning 3D Representations from Procedural 3D Programs
·4094 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Virginia
Self-supervised learning of 3D representations from procedurally generated synthetic shapes achieves comparable performance to models trained on real-world datasets, highlighting the potential of synt…
Factorized Visual Tokenization and Generation
·2519 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Amazon
FQGAN revitalizes image generation by introducing Factorized Quantization, enabling scalable and stable visual tokenization with state-of-the-art performance.
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
·3751 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 UNC Chapel Hill
DREAMRUNNER generates high-quality storytelling videos by using LLMs for hierarchical planning, motion retrieval, and a novel spatial-temporal region-based diffusion model for fine-grained control.
Controllable Human Image Generation with Personalized Multi-Garments
·4062 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 KAIST
BootComp: generate realistic human images wearing multiple garments using a novel synthetic data pipeline & diffusion model, enabling diverse applications like virtual try-on.