Computer Vision

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

26 September 2024·2012 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University

eFreeSplat: a novel, epipolar-free 3D Gaussian splatting model for generalizable novel view synthesis, surpassing state-of-the-art methods by achieving superior geometry reconstruction and novel view …

EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models

26 September 2024·2906 words·14 mins· loading · loading

Computer Vision Image Restoration 🏢 Samsung Research

EnsIR: Training-free image restoration ensemble via Gaussian mixture models, boosting accuracy efficiently.

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers

26 September 2024·3757 words·18 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 Boston University

DiChaViT boosts channel-adaptive vision transformers by enhancing feature diversity, yielding a 1.5-5% accuracy gain over state-of-the-art MCI models on diverse datasets.

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

26 September 2024·2128 words·10 mins· loading · loading

Computer Vision Image Generation 🏢 Technion

This paper introduces a novel post-processing technique that significantly boosts the perceptual quality of images generated by consistency models using a joint classifier-discriminator adversarially …

End-to-End Video Semantic Segmentation in Adverse Weather using Fusion Blocks and Temporal-Spatial Teacher-Student Learning

26 September 2024·2581 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

Optical-flow-free video semantic segmentation excels in adverse weather by merging adjacent frame information via a fusion block and a novel temporal-spatial teacher-student learning strategy.

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

26 September 2024·2466 words·12 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

EfficientNAT: a novel approach to token-based image synthesis boosts performance and slashes computational costs by cleverly disentangling and optimizing spatial-temporal interactions between image to…

EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

26 September 2024·3286 words·16 mins· loading · loading

AI Generated Computer Vision Visual Question Answering 🏢 State Key Lab of CAD&CG, Zhejiang University

EMVP: A novel PEFT pipeline boosts Visual Place Recognition accuracy by 97.6% using Centroid-Free Probing & Dynamic Power Normalization, saving 64.3% of parameters.

EM Distillation for One-step Diffusion Models

26 September 2024·3404 words·16 mins· loading · loading

Computer Vision Image Generation 🏢 Google DeepMind

EM Distillation (EMD) efficiently trains one-step diffusion models by using an Expectation-Maximization approach, achieving state-of-the-art image generation quality and outperforming existing methods…

Elucidating the Design Space of Dataset Condensation

26 September 2024·4063 words·20 mins· loading · loading

Computer Vision Image Classification 🏢 Tsinghua University

Elucidating Dataset Condensation (EDC) achieves state-of-the-art accuracy in dataset condensation by implementing soft category-aware matching and a smoothing learning rate schedule, improving model t…

EGSST: Event-based Graph Spatiotemporal Sensitive Transformer for Object Detection

26 September 2024·2236 words·11 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 School of Information Science and Technology, Fudan University

EGSST: a novel framework for event-based object detection, uses graph structures and transformers to efficiently process event data, achieving high accuracy and speed in dynamic scenes.

EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views

26 September 2024·2364 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 University of Science and Technology of China

EgoChoir: a novel framework harmonizes visual appearance, head motion, and 3D objects to accurately estimate 3D human contact and object affordance from egocentric videos, surpassing existing methods.

EfficientCAPER: An End-to-End Framework for Fast and Robust Category-Level Articulated Object Pose Estimation

26 September 2024·2239 words·11 mins· loading · loading

Computer Vision 3D Vision 🏢 Zhejiang University of Technology

EfficientCAPER: A novel end-to-end framework achieves fast & robust category-level articulated object pose estimation by using a joint-centric approach, eliminating post-processing optimization and en…

Efficient Temporal Action Segmentation via Boundary-aware Query Voting

26 September 2024·3348 words·16 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Stony Brook University

BaFormer: a novel boundary-aware Transformer network achieves efficient and accurate temporal action segmentation by using instance and global queries for segment classification and boundary predictio…

Efficient Lifelong Model Evaluation in an Era of Rapid Progress

26 September 2024·2830 words·14 mins· loading · loading

AI Generated Computer Vision Image Classification 🏢 University of Cambridge

Sort & Search: 1000x faster lifelong model evaluation!

Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation

26 September 2024·1907 words·9 mins· loading · loading

Computer Vision Image Classification 🏢 College of Information and Control Engineering, Xi'an University of Architecture and Technology

Boosting Vision Transformer adaptation! Householder Transformation-based Adaptor (HTA) outperforms existing methods by dynamically adjusting adaptation matrix ranks across layers, improving efficiency…

Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting

26 September 2024·2803 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 KAIST

Effective rank regularization enhances 3D Gaussian splatting, resolving needle-like artifacts and improving 3D model quality.

EEG2Video: Towards Decoding Dynamic Visual Perception from EEG Signals

26 September 2024·2189 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Microsoft Research

EEG2Video reconstructs dynamic videos from EEG signals, achieving 79.8% accuracy in semantic classification and 0.256 SSIM in video reconstruction.

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

26 September 2024·3842 words·19 mins· loading · loading

AI Generated Computer Vision Image Generation 🏢 Midea Group

The Efficient Diffusion Transformer (EDT) framework significantly speeds up and improves image generation by leveraging a lightweight architecture, human-like sketching-inspired Attention Modulation M…

ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction

26 September 2024·1754 words·9 mins· loading · loading

Computer Vision Image Generation 🏢 McMaster University

ECMamba: A novel dual-branch framework efficiently corrects multiple exposure images by integrating Retinex theory and an innovative 2D selective state-space layer, achieving state-of-the-art performa…

EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

26 September 2024·2574 words·13 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 University of Arkansas

EAGLE: A novel unsupervised cross-view adaptation method for semantic segmentation achieves state-of-the-art performance by efficiently modeling geometric structural changes across different camera vi…