Computer Vision

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

26 September 2024·3145 words·15 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

AnyFit: Controllable virtual try-on for any attire combination across any scenario, exceeding existing methods in accuracy and scalability.

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

26 September 2024·2384 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 DAMO Academy, Alibaba Group

Animate3D animates any 3D model using multi-view video diffusion, achieving superior spatiotemporal consistency and straightforward mesh animation.

An Image is Worth 32 Tokens for Reconstruction and Generation

26 September 2024·2076 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 ByteDance

Image generation gets a speed boost with TiTok, a novel 1D image tokenizer that uses just 32 tokens for high-quality image reconstruction and generation, achieving up to 410x faster processing than st…

An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations

26 September 2024·3657 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Peking University

EMDiffusion trains clean diffusion models from corrupted data using an expectation-maximization algorithm, achieving state-of-the-art results on diverse imaging tasks.

Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection

26 September 2024·2866 words·14 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Korea Aerospace University

RFPAR: A novel reinforcement learning-based attack enhances black-box pixel attacks by minimizing randomness and patch dependency, achieving state-of-the-art results in both image classification and o…

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

26 September 2024·2409 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

AlphaTablets revolutionizes 3D planar reconstruction from monocular videos with its novel rectangle-based representation featuring continuous surfaces and precise boundaries, achieving state-of-the-ar…

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization

26 September 2024·2692 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 ByteDance

DiMR boosts image generation fidelity by cleverly combining multi-resolution networks with time-dependent layer normalization in diffusion models, achieving state-of-the-art results on ImageNet.

All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

26 September 2024·5531 words·26 mins· loading · loading

AI Generated Computer Vision Image Coding 🏢 Nanjing University

Multi-Path Aggregation (MPA) achieves comparable performance to state-of-the-art methods in multi-task image coding, by unifying feature representations with a novel all-in-one architecture and a two-…

Aligning Diffusion Models by Optimizing Human Utility

26 September 2024·3826 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 UC Los Angeles

Diffusion-KTO: Aligning text-to-image models with human preferences using simple likes/dislikes, maximizing expected human utility.

AirSketch: Generative Motion to Sketch

26 September 2024·2122 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 University of Central Florida

AirSketch generates aesthetically pleasing sketches directly from noisy hand-motion tracking data using a self-supervised controllable diffusion model, eliminating the need for expensive AR/VR equipme…

AID: Attention Interpolation of Text-to-Image Diffusion

26 September 2024·3646 words·18 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 National University of Singapore

AID, a novel training-free method, significantly improves image interpolation by fusing inner/outer interpolated attention layers and using beta-distribution for coefficient selection, enhancing consi…

Adversarial Schrödinger Bridge Matching

26 September 2024·3165 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Skoltech

Accelerate Schrödinger Bridge Matching with Discrete-time IMF using only a few steps, achieving comparable results to existing hundred-step methods via D-GAN implementation.

Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation

26 September 2024·4124 words·20 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Reichman University

SaSPA, a novel data augmentation method, boosts fine-grained visual classification accuracy by generating diverse, class-consistent synthetic images using structural and subject-preserving techniques.

AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks

26 September 2024·2221 words·11 mins· loading · loading

Computer Vision Adversarial Attacks 🏢 Guangdong Key Lab of Information Security

AdvAD: A non-parametric diffusion process crafts imperceptible adversarial examples by subtly guiding an initial noise towards a target distribution, achieving high attack success rates with minimal p…

AdjointDEIS: Efficient Gradients for Diffusion Models

26 September 2024·1797 words·9 mins· loading · loading

Computer Vision Face Recognition 🏢 Clarkson University

AdjointDEIS: Efficient gradients for diffusion models via bespoke ODE solvers, simplifying backpropagation and improving guided generation.

AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection

26 September 2024·3853 words·19 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Shanghai AI Laboratory

AdaptiveISP uses reinforcement learning to create a scene-adaptive ISP pipeline that dynamically optimizes for object detection, surpassing existing methods in accuracy and efficiency.

Adaptive Important Region Selection with Reinforced Hierarchical Search for Dense Object Detection

26 September 2024·2760 words·13 mins· loading · loading

Computer Vision Object Detection 🏢 Rochester Institute of Technology

AIRS framework, guided by Evidential Q-learning, dynamically balances exploration and exploitation to achieve superior dense object detection accuracy by adaptively selecting important regions.

Adaptive Domain Learning for Cross-domain Image Denoising

26 September 2024·2302 words·11 mins· loading · loading

Computer Vision Image Denoising 🏢 Hong Kong University of Science and Technology

Adaptive Domain Learning (ADL) efficiently trains a cross-domain RAW image denoising model using limited target data and existing source data by intelligently discarding harmful source data and levera…

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis

26 September 2024·3442 words·17 mins· loading · loading

Computer Vision Image Generation 🏢 UC San Diego

FG-DMs revolutionize image synthesis by jointly modeling image and condition distributions, achieving higher object recall and enabling flexible editing.

AdaPKC: PeakConv with Adaptive Peak Receptive Field for Radar Semantic Segmentation

26 September 2024·2841 words·14 mins· loading · loading

Computer Vision Image Segmentation 🏢 Tsinghua University

AdaPKC upgrades PeakConv for superior radar semantic segmentation by dynamically adjusting its receptive field, outperforming current state-of-the-art methods.