Computer Vision

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

26 September 2024·2348 words·12 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 ByteDance

PeRFlow accelerates diffusion models by straightening their sampling trajectories using a piecewise reflow operation, enabling fast and high-quality image generation with minimal computational cost.

Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

26 September 2024·3763 words·18 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 University of Melbourne

BiXT, a novel bi-directional cross-attention Transformer, scales linearly with input size, achieving competitive performance across various tasks by efficiently processing longer sequences.

Pedestrian-Centric 3D Pre-collision Pose and Shape Estimation from Dashcam Perspective

26 September 2024·2531 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Science and Technology Beijing

New Pedestrian-Vehicle Collision Pose dataset (PVCP) and Pose Estimation Network (PPSENet) improve pedestrian pre-collision pose estimation from dashcam video.

PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding

26 September 2024·2469 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Bournemouth University

PCoTTA: A novel framework enables multi-task point cloud models to seamlessly adapt to continuously changing target domains during testing, overcoming catastrophic forgetting and error accumulation.

Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts

26 September 2024·3390 words·16 mins· loading · loading

Computer Vision Image Restoration 🏢 Tsinghua University

AdaptIR: A novel parameter-efficient method for generalized image restoration using a heterogeneous Mixture-of-Experts (MoE) architecture that achieves superior performance and generalization.

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

26 September 2024·2966 words·14 mins· loading · loading

Computer Vision Image Generation 🏢 Stanford University

PaGoDA: Train high-resolution image generators efficiently by progressively growing a one-step generator from a low-resolution diffusion model. This innovative pipeline drastically cuts training cost…

OPUS: Occupancy Prediction Using a Sparse Set

26 September 2024·2458 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Nankai University

OPUS: a novel, real-time occupancy prediction framework using a sparse set prediction paradigm, outperforms state-of-the-art methods on Occ3D-nuScenes.

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

26 September 2024·2037 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Department of Electrical Engineering, Linköping University

OSDCap: Online optimal-state dynamics estimation selectively incorporates physics models with kinematic observations to achieve highly accurate, physically-plausible human motion capture from videos.

Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification

26 September 2024·3342 words·16 mins· loading · loading

AI Generated Computer Vision Image Re-Identification 🏢 Harbin Institute of Technology

Optimal Transport-based Labor-free Text Prompt Modeling (OLTM) leverages VQA and optimal transport for highly accurate sketch-based person re-identification without manual labeling.

Optical Diffusion Models for Image Generation

26 September 2024·1966 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 Google Research

Researchers created an energy-efficient optical system for generating images using light propagation, drastically reducing the latency and energy consumption of diffusion models.

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

26 September 2024·2396 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Peking University

OpenGaussian achieves 3D point-level open vocabulary understanding using 3D Gaussian Splatting by training 3D instance features with high 3D consistency, employing a two-level codebook for feature dis…

OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images

26 September 2024·2441 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Imperial College London

OpenDlign uses novel depth-aligned images from a diffusion model to boost open-world 3D understanding, achieving significant performance gains on diverse benchmarks.

Open-Vocabulary Object Detection via Language Hierarchy

26 September 2024·2960 words·14 mins· loading · loading

Computer Vision Object Detection 🏢 Nanyang Technological University

Language Hierarchical Self-training (LHST) enhances weakly-supervised object detection by integrating language hierarchy, mitigating label mismatch, and improving generalization across diverse dataset…

OPEL: Optimal Transport Guided ProcedurE Learning

26 September 2024·2652 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Purdue University

OPEL: a novel optimal transport framework for procedure learning, significantly outperforms SOTA methods by aligning similar video frames and relaxing strict temporal assumptions.

OnlineTAS: An Online Baseline for Temporal Action Segmentation

26 September 2024·2736 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

OnlineTAS, a novel framework, achieves state-of-the-art performance in online temporal action segmentation by using an adaptive memory and a post-processing method to mitigate over-segmentation.

OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance

26 September 2024·3168 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Xi'an Jiaotong University

OneActor: One-shot tuning for consistent subject image generation, bypassing laborious backbone tuning via semantic guidance, achieving 4x faster speed.

One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection

26 September 2024·2365 words·12 mins· loading · loading

Computer Vision Anomaly Detection 🏢 West China Biomedical Big Data Center, West China Hospital, Sichuan University

One-to-Normal: Anomaly personalization boosts few-shot anomaly detection accuracy by transforming query images to match normal data, enabling precise, robust comparisons and flexible integration with …

One-to-Multiple: A Progressive Style Transfer Unsupervised Domain-Adaptive Framework for Kidney Tumor Segmentation

26 September 2024·2746 words·13 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 Xiangtan University

PSTUDA, a novel progressive style transfer framework, efficiently segments kidney tumors across multiple MRI sequences using unsupervised domain adaptation, achieving higher accuracy and efficiency th…

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

26 September 2024·2247 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Hong Kong Polytechnic University

OSEDiff: One-step diffusion network for real-world image super-resolution, achieving comparable or better results than multi-step methods with significantly reduced computational cost and improved ima…

One-Step Diffusion Distillation through Score Implicit Matching

26 September 2024·2065 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 Peking University

Score Implicit Matching (SIM) revolutionizes diffusion model distillation by creating high-quality, single-step generators from complex, multi-step models, achieving comparable performance and enablin…