🏢 NVIDIA Research

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

26 September 2024·2035 words·10 mins· loading · loading

Computer Vision Image Segmentation 🏢 NVIDIA Research

Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

26 September 2024·2366 words·12 mins· loading · loading

Natural Language Processing Speech Recognition 🏢 NVIDIA Research

STAR, a novel unsupervised adaptation framework, drastically improves automatic speech recognition (ASR) robustness across diverse domains using only unlabeled data and outperforms existing self-train…

Large Spatial Model: End-to-end Unposed Images to Semantic 3D

26 September 2024·1766 words·9 mins· loading · loading

Computer Vision 3D Vision 🏢 NVIDIA Research

Large Spatial Model (LSM) achieves real-time semantic 3D reconstruction from just two unposed images, unifying multiple 3D vision tasks in a single feed-forward pass.

GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting

26 September 2024·2093 words·10 mins· loading · loading

Computer Vision 3D Vision 🏢 NVIDIA Research

GaussianMarker: A novel uncertainty-aware watermarking method ensures robust copyright protection for 3D Gaussian Splatting assets, invisibly embedding messages into model parameters and extractable …

Fast Encoder-Based 3D from Casual Videos via Point Track Processing

26 September 2024·2766 words·13 mins· loading · loading

Computer Vision 3D Vision 🏢 NVIDIA Research

TRACKSTO4D: Fast & accurate 3D reconstruction from casual videos using 2D point tracks, drastically reducing runtime by up to 95% while matching state-of-the-art accuracy.

ESPACE: Dimensionality Reduction of Activations for Model Compression

26 September 2024·2254 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 NVIDIA Research

ESPACE: A novel LLM compression technique achieving 50% model size reduction with minimal accuracy loss by cleverly projecting activations onto principal components.

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

26 September 2024·2827 words·14 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 NVIDIA Research

DistillNeRF: a self-supervised learning framework enabling accurate 3D scene reconstruction from sparse, single-frame images by cleverly distilling features from offline NeRFs and 2D foundation models…

CosAE: Learnable Fourier Series for Image Restoration

26 September 2024·2867 words·14 mins· loading · loading

Computer Vision Image Restoration 🏢 NVIDIA Research

CosAE: a novel autoencoder using learnable Fourier series achieves state-of-the-art image restoration by encoding frequency coefficients in its narrow bottleneck, preserving fine details even with ext…