Skip to main content

Computer Vision

Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
·2916 words·14 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Beijing University of Posts and Telecommunications
LucidDrag: Semantic-aware dragging transforms image editing with an intention reasoner and collaborative guidance, achieving superior accuracy, image fidelity, and semantic diversity.
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models
·3223 words·16 mins· loading · loading
Computer Vision Image Generation 🏢 ETH Zurich
LiteVAE: A new autoencoder design for latent diffusion models boosts efficiency sixfold without sacrificing image quality, achieving faster training and lower memory needs via the 2D discrete wavelet …
LION: Linear Group RNN for 3D Object Detection in Point Clouds
·3911 words·19 mins· loading · loading
AI Generated Computer Vision Object Detection 🏢 Huazhong University of Science and Technology
LION: Linear Group RNNs conquer 3D object detection in sparse point clouds by enabling efficient long-range feature interaction, significantly outperforming transformer-based methods.
LinNet: Linear Network for Efficient Point Cloud Representation Learning
·2362 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 Northwest University
LinNet: A linear-time point cloud network achieving 10x speedup over PointNeXt, with state-of-the-art accuracy on various benchmarks.
Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models
·2125 words·10 mins· loading · loading
Computer Vision Image Classification 🏢 School of Computer Science and Engineering, Southeast University
Linearly decompose & recompose Vision Transformers to create diverse-scale models efficiently, reducing computational costs & improving flexibility for various applications.
Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation
·3232 words·16 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏢 Huazhong University of Science and Technology
Lightweight Frequency Masker significantly improves cross-domain few-shot semantic segmentation by cleverly filtering frequency components of images, thereby reducing inter-channel correlation and enh…
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis
·3953 words·19 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 Nankai University
LE3D: Real-time HDR view synthesis from noisy RAW images is achieved using 3DGS, significantly reducing training time and improving rendering speed.
LG-CAV: Train Any Concept Activation Vector with Language Guidance
·3860 words·19 mins· loading · loading
AI Generated Computer Vision Vision-Language Models 🏢 Zhejiang University
LG-CAV: Train any Concept Activation Vector with Language Guidance, leverages vision-language models to train CAVs without labeled data, achieving superior accuracy and enabling state-of-the-art model…
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
·2610 words·13 mins· loading · loading
Computer Vision Scene Understanding 🏢 University of Illinois Urbana-Champaign
Lexicon3D: a first comprehensive study probing diverse visual foundation models for superior 3D scene understanding, revealing that unsupervised image models outperform others across various tasks.
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
·3134 words·15 mins· loading · loading
AI Generated Computer Vision Image Segmentation 🏢 School of Electronic Engineering and Computer Science, Queen Mary University of London
ProMaC leverages MLLM hallucinations in an iterative framework to generate precise prompts for accurate object segmentation, minimizing manual prompt dependency.
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
·2463 words·12 mins· loading · loading
Computer Vision Image Generation 🏢 National University of Singapore
Learning-to-Cache (L2C) dramatically accelerates diffusion transformers by intelligently caching layer computations, achieving significant speedups with minimal performance loss.
Learning Where to Edit Vision Transformers
·3346 words·16 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 City University of Hong Kong
Meta-learning a hypernetwork on CutMix-augmented data enables data-efficient and precise correction of vision transformer errors by identifying optimal parameters for fine-tuning.
Learning Truncated Causal History Model for Video Restoration
·2473 words·12 mins· loading · loading
Computer Vision Video Understanding 🏢 University of Alberta
TURTLE: a novel video restoration framework that learns a truncated causal history model for efficient and high-performing video restoration, achieving state-of-the-art results on various benchmark ta…
Learning Transferable Features for Implicit Neural Representations
·4038 words·19 mins· loading · loading
AI Generated Computer Vision Image Generation 🏢 Rice University
STRAINER: A new framework enabling faster, higher-quality INR fitting by leveraging transferable features across similar signals, significantly boosting INR performance.
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
·3286 words·16 mins· loading · loading
AI Generated Computer Vision Image Classification 🏢 KAIST
Decoupled Token Embedding for Merging (DTEM) significantly improves Vision Transformer efficiency by using a decoupled embedding module for relaxed token merging, achieving consistent performance gain…
Learning to Edit Visual Programs with Self-Supervision
·2121 words·10 mins· loading · loading
Computer Vision Visual Question Answering 🏢 Brown University
AI learns to edit visual programs more accurately using a self-supervised method that combines one-shot program generation with iterative local edits, significantly boosting performance, especially wi…
Learning to Decouple the Lights for 3D Face Texture Modeling
·3463 words·17 mins· loading · loading
Computer Vision Face Recognition 🏢 School of Computing, National University of Singapore
Researchers developed Light Decoupling, a novel approach to model 3D facial textures under complex illumination, achieving more realistic and accurate results by decoupling unnatural lighting into mul…
Learning to be Smooth: An End-to-End Differentiable Particle Smoother
·2507 words·12 mins· loading · loading
Computer Vision 3D Vision 🏢 UC Irvine
Learned Mixture Density Particle Smoother (MDPS) surpasses state-of-the-art for accurate, differentiable city-scale vehicle localization.
Learning Structured Representations with Hyperbolic Embeddings
·3560 words·17 mins· loading · loading
Computer Vision Representation Learning 🏢 University of Illinois, Urbana-Champaign
HypStructure boosts representation learning by embedding label hierarchies into hyperbolic space, improving accuracy and interpretability.
Learning Optimal Lattice Vector Quantizers for End-to-end Neural Image Compression
·1532 words·8 mins· loading · loading
Computer Vision Image Compression 🏢 Department of Electronic Engineering, Shanghai Jiao Tong University
Learned optimal lattice vector quantization (OLVQ) drastically boosts neural image compression efficiency by adapting quantizer structures to latent feature distributions, achieving significant rate-d…