Computer Vision

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

26 September 2024·3474 words·17 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Northeastern University

Streamlined Inference, a novel training-free framework, dramatically reduces the computation and memory costs of video diffusion models without sacrificing quality, enabling high-resolution video gene…

FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models

26 September 2024·2387 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Zhejiang University

FashionR2R leverages diffusion models to realistically translate rendered fashion images into photorealistic counterparts, enhancing realism and preserving fine-grained clothing textures.

FairQueue: Rethinking Prompt Learning for Fair Text-to-Image Generation

26 September 2024·4446 words·21 mins· loading · loading

Computer Vision Image Generation 🏢 Singapore University of Technology and Design

FairQueue improves fair text-to-image generation by addressing prompt learning’s quality issues through prompt queuing and attention amplification.

FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing

26 September 2024·3096 words·15 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Department of Computer Science, University College London

FactorizePhys leverages Non-negative Matrix Factorization for a novel multidimensional attention mechanism (FSAM) to improve remote PPG signal extraction from videos.

Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation

26 September 2024·4968 words·24 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Google

This paper presents a novel neural network architecture that simultaneously learns to generate and segment images in an unsupervised manner, achieving accurate results across multiple datasets without…

Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

26 September 2024·2451 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

Face2QR: A unified framework generates aesthetically pleasing, scannable QR codes that faithfully preserve facial features, solving the conflict between aesthetics, identity, and scannability.

F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning

26 September 2024·1519 words·8 mins· loading · loading

Computer Vision Image Classification 🏢 South China University of Technology

F-OAL: Forward-only Online Analytic Learning achieves high accuracy and low memory usage in online class incremental learning by using a frozen encoder and recursive least squares to update a linear …

Extending Video Masked Autoencoders to 128 frames

26 September 2024·2466 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Google Research

Long-video masked autoencoders (LVMAE) achieve state-of-the-art performance by using an adaptive masking strategy that prioritizes important video tokens, enabling efficient training on 128 frames.

Expressive Gaussian Human Avatars from Monocular RGB Video

26 September 2024·1431 words·7 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Texas at Austin

EVA: a novel method generates expressive 3D Gaussian human avatars from monocular RGB videos, excelling in detailed hand and facial expressions via context-aware density control and improved SMPL-X al…

Exploring Token Pruning in Vision State Space Models

26 September 2024·1749 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 Northeastern University

This paper introduces a novel token pruning method for vision state space models, achieving significant computational reduction with minimal performance impact, addressing the limitations of directly …

Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation

26 September 2024·2718 words·13 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Beihang University

DUSA:Unlocking Diffusion Models’ Discriminative Power for Efficient Test-Time Adaptation

Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing

26 September 2024·2111 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 University of Michigan

LOCO Edit achieves precise, localized image editing in diffusion models via a single-step, training-free method leveraging low-dimensional semantic subspaces.

Exploring Fixed Point in Image Editing: Theoretical Support and Convergence Optimization

26 September 2024·2322 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 East China Normal University

This paper theoretically proves the existence and uniqueness of fixed points in DDIM inversion, optimizing the loss function for improved image editing and extending this approach to unsupervised imag…

Exploring DCN-like architecture for fast image generation with arbitrary resolution

26 September 2024·1909 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 Nanjing University

FlowDCN: A purely convolutional generative model achieves state-of-the-art image generation speed and quality at arbitrary resolutions, surpassing transformer-based models.

Expanding Sparse Tuning for Low Memory Usage

26 September 2024·2517 words·12 mins· loading · loading

Computer Vision Transfer Learning 🏢 Tsinghua University

SNELL: Sparse tuning with kerNElized LoRA achieves state-of-the-art parameter-efficient fine-tuning performance with drastically reduced memory usage.

Exocentric-to-Egocentric Video Generation

26 September 2024·2698 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

Exo2Ego-V generates realistic egocentric videos from sparse exocentric views, significantly outperforming state-of-the-art methods on a challenging benchmark.

Event-3DGS: Event-based 3D Reconstruction Using 3D Gaussian Splatting

26 September 2024·2242 words·11 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

Event-3DGS: First event-based 3D reconstruction using 3D Gaussian splatting, enabling high-quality, efficient, and robust 3D scene reconstruction in challenging real-world conditions.

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

26 September 2024·3089 words·15 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Purdue University

DSPoser: A novel two-stage approach accurately estimates full-body pose from doubly sparse egocentric video data using masked autoencoders for temporal completion and conditional diffusion models for …

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

26 September 2024·4177 words·20 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Monash University

This research introduces adversarial concept preservation, a novel method for safely erasing undesirable concepts from diffusion models, outperforming existing techniques by preserving related sensiti…

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

26 September 2024·2478 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Hong Kong University of Science and Technology

Era3D: High-resolution multiview diffusion using efficient row-wise attention, generates high-quality multiview images from single views, overcoming prior limitations.