Skip to main content

🏢 NVIDIA Research

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval
·2035 words·10 mins· loading · loading
Computer Vision Image Segmentation 🏢 NVIDIA Research
Unlocking personalized image retrieval and segmentation, a novel approach uses pre-trained text-to-image diffusion models to surpass supervised methods, addressing limitations of existing self-supervi…
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
·2366 words·12 mins· loading · loading
Natural Language Processing Speech Recognition 🏢 NVIDIA Research
STAR, a novel unsupervised adaptation framework, drastically improves automatic speech recognition (ASR) robustness across diverse domains using only unlabeled data and outperforms existing self-train…
Large Spatial Model: End-to-end Unposed Images to Semantic 3D
·1766 words·9 mins· loading · loading
Computer Vision 3D Vision 🏢 NVIDIA Research
Large Spatial Model (LSM) achieves real-time semantic 3D reconstruction from just two unposed images, unifying multiple 3D vision tasks in a single feed-forward pass.
GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting
·2093 words·10 mins· loading · loading
Computer Vision 3D Vision 🏢 NVIDIA Research
GaussianMarker: A novel uncertainty-aware watermarking method ensures robust copyright protection for 3D Gaussian Splatting assets, invisibly embedding messages into model parameters and extractable …
Fast Encoder-Based 3D from Casual Videos via Point Track Processing
·2766 words·13 mins· loading · loading
Computer Vision 3D Vision 🏢 NVIDIA Research
TRACKSTO4D: Fast & accurate 3D reconstruction from casual videos using 2D point tracks, drastically reducing runtime by up to 95% while matching state-of-the-art accuracy.
ESPACE: Dimensionality Reduction of Activations for Model Compression
·2254 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 NVIDIA Research
ESPACE: A novel LLM compression technique achieving 50% model size reduction with minimal accuracy loss by cleverly projecting activations onto principal components.
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features
·2827 words·14 mins· loading · loading
AI Generated Computer Vision 3D Vision 🏢 NVIDIA Research
DistillNeRF: a self-supervised learning framework enabling accurate 3D scene reconstruction from sparse, single-frame images by cleverly distilling features from offline NeRFs and 2D foundation models…
CosAE: Learnable Fourier Series for Image Restoration
·2867 words·14 mins· loading · loading
Computer Vision Image Restoration 🏢 NVIDIA Research
CosAE: a novel autoencoder using learnable Fourier series achieves state-of-the-art image restoration by encoding frequency coefficients in its narrow bottleneck, preserving fine details even with ext…