Computer Vision
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
·3474 words·17 mins·
loading
·
loading
AI Generated
Computer Vision
Video Understanding
🏢 Northeastern University
Streamlined Inference, a novel training-free framework, dramatically reduces the computation and memory costs of video diffusion models without sacrificing quality, enabling high-resolution video gene…
FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models
·2387 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Zhejiang University
FashionR2R leverages diffusion models to realistically translate rendered fashion images into photorealistic counterparts, enhancing realism and preserving fine-grained clothing textures.
FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation
·4446 words·21 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Singapore University of Technology and Design
FairQueue improves fair text-to-image generation by addressing prompt learning’s quality issues through prompt queuing and attention amplification.
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
·3096 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
Video Understanding
🏢 Department of Computer Science, University College London
FactorizePhys leverages Non-negative Matrix Factorization for a novel multidimensional attention mechanism (FSAM) to improve remote PPG signal extraction from videos.
Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation
·4968 words·24 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Google
This paper presents a novel neural network architecture that simultaneously learns to generate and segment images in an unsupervised manner, achieving accurate results across multiple datasets without…
Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation
·2451 words·12 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Shanghai Jiao Tong University
Face2QR: A unified framework generates aesthetically pleasing, scannable QR codes that faithfully preserve facial features, solving the conflict between aesthetics, identity, and scannability.
F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning
·1519 words·8 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 South China University of Technology
F-OAL: Forward-only Online Analytic Learning achieves high accuracy and low memory usage in online class incremental learning by using a frozen encoder and recursive least squares to update a linear …
Extending Video Masked Autoencoders to 128 frames
·2466 words·12 mins·
loading
·
loading
Computer Vision
Video Understanding
🏢 Google Research
Long-video masked autoencoders (LVMAE) achieve state-of-the-art performance by using an adaptive masking strategy that prioritizes important video tokens, enabling efficient training on 128 frames.
Expressive Gaussian Human Avatars from Monocular RGB Video
·1431 words·7 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 University of Texas at Austin
EVA: a novel method generates expressive 3D Gaussian human avatars from monocular RGB videos, excelling in detailed hand and facial expressions via context-aware density control and improved SMPL-X al…
Exploring Token Pruning in Vision State Space Models
·1749 words·9 mins·
loading
·
loading
Computer Vision
Image Classification
🏢 Northeastern University
This paper introduces a novel token pruning method for vision state space models, achieving significant computational reduction with minimal performance impact, addressing the limitations of directly …
Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation
·2718 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
🏢 Beihang University
DUSA:Unlocking Diffusion Models’ Discriminative Power for Efficient Test-Time Adaptation
Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing
·2111 words·10 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 University of Michigan
LOCO Edit achieves precise, localized image editing in diffusion models via a single-step, training-free method leveraging low-dimensional semantic subspaces.
Exploring Fixed Point in Image Editing: Theoretical Support and Convergence Optimization
·2322 words·11 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 East China Normal University
This paper theoretically proves the existence and uniqueness of fixed points in DDIM inversion, optimizing the loss function for improved image editing and extending this approach to unsupervised imag…
Exploring DCN-like architecture for fast image generation with arbitrary resolution
·1909 words·9 mins·
loading
·
loading
Computer Vision
Image Generation
🏢 Nanjing University
FlowDCN: A purely convolutional generative model achieves state-of-the-art image generation speed and quality at arbitrary resolutions, surpassing transformer-based models.
Expanding Sparse Tuning for Low Memory Usage
·2517 words·12 mins·
loading
·
loading
Computer Vision
Transfer Learning
🏢 Tsinghua University
SNELL: Sparse tuning with kerNElized LoRA achieves state-of-the-art parameter-efficient fine-tuning performance with drastically reduced memory usage.
Exocentric-to-Egocentric Video Generation
·2698 words·13 mins·
loading
·
loading
AI Generated
Computer Vision
Video Understanding
🏢 National University of Singapore
Exo2Ego-V generates realistic egocentric videos from sparse exocentric views, significantly outperforming state-of-the-art methods on a challenging benchmark.
Event-3DGS: Event-based 3D Reconstruction Using 3D Gaussian Splatting
·2242 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Tsinghua University
Event-3DGS: First event-based 3D reconstruction using 3D Gaussian splatting, enabling high-quality, efficient, and robust 3D scene reconstruction in challenging real-world conditions.
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
·3089 words·15 mins·
loading
·
loading
AI Generated
Computer Vision
3D Vision
🏢 Purdue University
DSPoser: A novel two-stage approach accurately estimates full-body pose from doubly sparse egocentric video data using masked autoencoders for temporal completion and conditional diffusion models for …
Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation
·4177 words·20 mins·
loading
·
loading
AI Generated
Computer Vision
Image Generation
🏢 Monash University
This research introduces adversarial concept preservation, a novel method for safely erasing undesirable concepts from diffusion models, outperforming existing techniques by preserving related sensiti…
Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
·2478 words·12 mins·
loading
·
loading
Computer Vision
3D Vision
🏢 Hong Kong University of Science and Technology
Era3D: High-resolution multiview diffusion using efficient row-wise attention, generates high-quality multiview images from single views, overcoming prior limitations.