Computer Vision
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
·2778 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Nanyang Technological University
SAR3D: Blazing-fast autoregressive 3D object generation and understanding using a multi-scale VQVAE, achieving sub-second generation and detailed multimodal comprehension.
Pathways on the Image Manifold: Image Editing via Video Generation
·3449 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Technion - Israel Institute of Technology
Image editing is revolutionized by Frame2Frame, which uses video generation to produce seamless and accurate edits, preserving image fidelity.
One Diffusion to Generate Them All
·4521 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 UC Irvine
OneDiffusion: A single diffusion model masters image synthesis & understanding across diverse tasks, from text-to-image to depth estimation, pushing the boundaries of AI.
Learning 3D Representations from Procedural 3D Programs
·4094 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 University of Virginia
Self-supervised learning of 3D representations from procedurally generated synthetic shapes achieves comparable performance to models trained on real-world datasets, highlighting the potential of synt…
Factorized Visual Tokenization and Generation
·2519 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Amazon
FQGAN revitalizes image generation by introducing Factorized Quantization, enabling scalable and stable visual tokenization with state-of-the-art performance.
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
·3751 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 UNC Chapel Hill
DREAMRUNNER generates high-quality storytelling videos by using LLMs for hierarchical planning, motion retrieval, and a novel spatial-temporal region-based diffusion model for fine-grained control.
Controllable Human Image Generation with Personalized Multi-Garments
·4062 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 KAIST
BootComp: generate realistic human images wearing multiple garments using a novel synthetic data pipeline & diffusion model, enabling diverse applications like virtual try-on.
Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)
·195 words·1 min·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of South Carolina
New benchmark VCT² reveals limitations of AI-generated image detectors; Visual AI Index (VAI) provides a robust evaluation framework.
Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics
·1682 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
MedNeXt, a novel model ensemble, optimizes brain tumor segmentation in diverse populations, achieving state-of-the-art results on the BraTS 2024 SSA and pediatric datasets.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
·3474 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Seoul National University
Diptych Prompting: a novel zero-shot subject-driven image generator leveraging large-scale text-to-image models and inpainting for precise subject alignment and high-quality image synthesis.
TEXGen: a Generative Diffusion Model for Mesh Textures
·3720 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
TEXGen: A groundbreaking generative diffusion model creates high-resolution 3D mesh textures directly from text and image prompts, exceeding prior methods in quality and efficiency.
Style-Friendly SNR Sampler for Style-Driven Generation
·4866 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Seoul National University
Style-friendly SNR sampler biases diffusion model training towards higher noise levels, enabling it to learn and generate images with higher style fidelity.
OminiControl: Minimal and Universal Control for Diffusion Transformer
·3446 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 National University of Singapore
OminiControl: A minimal, universal framework efficiently integrates image conditions into diffusion transformers, enabling diverse and precise control over image generation.
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
·2160 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
Morph: a novel motion-free physics optimization framework drastically enhances human motion generation’s physical plausibility using synthetic data, achieving state-of-the-art quality.
Material Anything: Generating Materials for Any 3D Object via Diffusion
·4056 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Northwestern Polytechnical University
Material Anything: Generate realistic materials for ANY 3D object via diffusion!
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
·2991 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 KAIST
CoordTok: a novel video tokenizer drastically reduces token count for long videos, enabling memory-efficient training of diffusion models for high-quality, long video generation.
Stable Flow: Vital Layers for Training-Free Image Editing
·2773 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Snap Research
Stable Flow achieves diverse, consistent image editing without training by strategically injecting source image features into specific ‘vital’ layers of a diffusion transformer model. This training-f…
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation
·2952 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Stanford University
SegBook: a large-scale benchmark, reveals that fine-tuning full-body CT pre-trained models significantly improves performance on various downstream medical image segmentation tasks, particularly for s…
Novel View Extrapolation with Video Diffusion Priors
·2381 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Nanyang Technological University
ViewExtrapolator leverages Stable Video Diffusion to realistically extrapolate novel views far beyond training data, dramatically improving the quality of 3D scene generation.
MyTimeMachine: Personalized Facial Age Transformation
·3186 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of North Carolina at Chapel Hill
MyTimeMachine personalizes facial age transformation using just 50 personal photos, outperforming existing methods by generating re-aged faces that closely match a person’s actual appearance at variou…