Skip to main content

Computer Vision

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
·2348 words·12 mins· loading · loading
AI Generated Computer Vision Image Generation 🏒 ByteDance
PeRFlow accelerates diffusion models by straightening their sampling trajectories using a piecewise reflow operation, enabling fast and high-quality image generation with minimal computational cost.
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
·3763 words·18 mins· loading · loading
AI Generated Computer Vision Image Classification 🏒 University of Melbourne
BiXT, a novel bi-directional cross-attention Transformer, scales linearly with input size, achieving competitive performance across various tasks by efficiently processing longer sequences.
Pedestrian-Centric 3D Pre-collision Pose and Shape Estimation from Dashcam Perspective
·2531 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 University of Science and Technology Beijing
New Pedestrian-Vehicle Collision Pose dataset (PVCP) and Pose Estimation Network (PPSENet) improve pedestrian pre-collision pose estimation from dashcam video.
PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding
·2469 words·12 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏒 Bournemouth University
PCoTTA: A novel framework enables multi-task point cloud models to seamlessly adapt to continuously changing target domains during testing, overcoming catastrophic forgetting and error accumulation.
Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts
·3390 words·16 mins· loading · loading
Computer Vision Image Restoration 🏒 Tsinghua University
AdaptIR: A novel parameter-efficient method for generalized image restoration using a heterogeneous Mixture-of-Experts (MoE) architecture that achieves superior performance and generalization.
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
·2966 words·14 mins· loading · loading
Computer Vision Image Generation 🏒 Stanford University
PaGoDA: Train high-resolution image generators efficiently by progressively growing a one-step generator from a low-resolution diffusion model. This innovative pipeline drastically cuts training cost…
OPUS: Occupancy Prediction Using a Sparse Set
·2458 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Nankai University
OPUS: a novel, real-time occupancy prediction framework using a sparse set prediction paradigm, outperforms state-of-the-art methods on Occ3D-nuScenes.
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
·2037 words·10 mins· loading · loading
Computer Vision 3D Vision 🏒 Department of Electrical Engineering, Linkâping University
OSDCap: Online optimal-state dynamics estimation selectively incorporates physics models with kinematic observations to achieve highly accurate, physically-plausible human motion capture from videos.
Optimal Transport-based Labor-free Text Prompt Modeling for Sketch Re-identification
·3342 words·16 mins· loading · loading
AI Generated Computer Vision Image Re-Identification 🏒 Harbin Institute of Technology
Optimal Transport-based Labor-free Text Prompt Modeling (OLTM) leverages VQA and optimal transport for highly accurate sketch-based person re-identification without manual labeling.
Optical Diffusion Models for Image Generation
·1966 words·10 mins· loading · loading
Computer Vision Image Generation 🏒 Google Research
Researchers created an energy-efficient optical system for generating images using light propagation, drastically reducing the latency and energy consumption of diffusion models.
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
·2396 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Peking University
OpenGaussian achieves 3D point-level open vocabulary understanding using 3D Gaussian Splatting by training 3D instance features with high 3D consistency, employing a two-level codebook for feature dis…
OpenDlign: Open-World Point Cloud Understanding with Depth-Aligned Images
·2441 words·12 mins· loading · loading
Computer Vision 3D Vision 🏒 Imperial College London
OpenDlign uses novel depth-aligned images from a diffusion model to boost open-world 3D understanding, achieving significant performance gains on diverse benchmarks.
Open-Vocabulary Object Detection via Language Hierarchy
·2960 words·14 mins· loading · loading
Computer Vision Object Detection 🏒 Nanyang Technological University
Language Hierarchical Self-training (LHST) enhances weakly-supervised object detection by integrating language hierarchy, mitigating label mismatch, and improving generalization across diverse dataset…
OPEL: Optimal Transport Guided ProcedurE Learning
·2652 words·13 mins· loading · loading
Computer Vision Video Understanding 🏒 Purdue University
OPEL: a novel optimal transport framework for procedure learning, significantly outperforms SOTA methods by aligning similar video frames and relaxing strict temporal assumptions.
OnlineTAS: An Online Baseline for Temporal Action Segmentation
·2736 words·13 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏒 National University of Singapore
OnlineTAS, a novel framework, achieves state-of-the-art performance in online temporal action segmentation by using an adaptive memory and a post-processing method to mitigate over-segmentation.
OneActor: Consistent Subject Generation via Cluster-Conditioned Guidance
·3168 words·15 mins· loading · loading
Computer Vision Image Generation 🏒 Xi'an Jiaotong University
OneActor: One-shot tuning for consistent subject image generation, bypassing laborious backbone tuning via semantic guidance, achieving 4x faster speed.
One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection
·2365 words·12 mins· loading · loading
Computer Vision Anomaly Detection 🏒 West China Biomedical Big Data Center, West China Hospital, Sichuan University
One-to-Normal: Anomaly personalization boosts few-shot anomaly detection accuracy by transforming query images to match normal data, enabling precise, robust comparisons and flexible integration with …
One-to-Multiple: A Progressive Style Transfer Unsupervised Domain-Adaptive Framework for Kidney Tumor Segmentation
·2746 words·13 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏒 Xiangtan University
PSTUDA, a novel progressive style transfer framework, efficiently segments kidney tumors across multiple MRI sequences using unsupervised domain adaptation, achieving higher accuracy and efficiency th…
One-Step Effective Diffusion Network for Real-World Image Super-Resolution
·2247 words·11 mins· loading · loading
Computer Vision Image Generation 🏒 Hong Kong Polytechnic University
OSEDiff: One-step diffusion network for real-world image super-resolution, achieving comparable or better results than multi-step methods with significantly reduced computational cost and improved ima…
One-Step Diffusion Distillation through Score Implicit Matching
·2065 words·10 mins· loading · loading
Computer Vision Image Generation 🏒 Peking University
Score Implicit Matching (SIM) revolutionizes diffusion model distillation by creating high-quality, single-step generators from complex, multi-step models, achieving comparable performance and enablin…