Skip to main content

Spotlight Others

2024

Moving Off-the-Grid: Scene-Grounded Video Representations
·2151 words·11 mins· loading · loading
Video Understanding 🏒 Google DeepMind
MooG: Self-supervised video model learns off-the-grid representations, enabling consistent scene element tracking even with motion; outperforming grid-based baselines on various vision tasks.
MotionBooth: Motion-Aware Customized Text-to-Video Generation
·2883 words·14 mins· loading · loading
🏒 Peking University
MotionBooth: A new framework enabling precise control over both object and camera movements in customized text-to-video generation, achieving high-quality video while maintaining training efficiency.
Motion Forecasting in Continuous Driving
·1965 words·10 mins· loading · loading
AI Applications Autonomous Vehicles 🏒 Fudan University
RealMotion: a novel motion forecasting framework for continuous driving that outperforms existing methods by accumulating historical scene information and sequentially refining predictions, achieving …
Monte Carlo Tree Search based Space Transfer for Black Box Optimization
·2970 words·14 mins· loading · loading
Transfer Learning 🏒 Nanjing University
MCTS-transfer: Iteratively refining Bayesian optimization via Monte Carlo tree search for efficient black-box optimization using transfer learning.
Molecule Design by Latent Prompt Transformer
·2788 words·14 mins· loading · loading
🏒 University of California, Los Angeles
Latent Prompt Transformer (LPT) revolutionizes molecule design by unifying generation and optimization, achieving high efficiency in discovering novel molecules with desired properties.
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
·3662 words·18 mins· loading · loading
3D Vision 🏒 NVIDIA
3D Gaussian Mapping (3DGM) achieves self-supervised camera-only 3D scene decomposition by leveraging multi-traverse driving data, memorizing permanent structures while filtering out transient objects.
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
·2752 words·13 mins· loading · loading
Video Understanding 🏒 Shanghai Jiao Tong University
MECD: A new task and dataset unlocks multi-event causal discovery in videos, enabling a novel framework that outperforms existing models by efficiently identifying causal relationships between chronol…
MambaTree: Tree Topology is All You Need in State Space Model
·1962 words·10 mins· loading · loading
Image Classification 🏒 Tsinghua Shenzhen International Graduate School
MambaTree: A novel tree-topology-based state space model surpasses existing methods by dynamically generating input-aware topologies for enhanced long-range dependencies in vision and language.
LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation
·1957 words·10 mins· loading · loading
AI Applications Recommendation Systems 🏒 City University of Hong Kong
LLM-ESR enhances sequential recommendation by integrating semantic information from LLMs, significantly improving performance on long-tail users and items.
Linear Regression using Heterogeneous Data Batches
·1554 words·8 mins· loading · loading
Meta Learning 🏒 Google Research
New algorithm efficiently solves linear regression with heterogeneous data batches, handling diverse input distributions and achieving high accuracy with fewer samples.
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS
·2198 words·11 mins· loading · loading
3D Vision 🏒 University of Texas at Austin
LightGaussian achieves 15x compression of 3D Gaussian scene representations, boosting rendering speed to 200+ FPS while maintaining visual quality, solving storage and efficiency issues in real-time n…
Leveraging Catastrophic Forgetting to Develop Safe Diffusion Models against Malicious Finetuning
·2397 words·12 mins· loading · loading
🏒 Institute of Computing Technology, Chinese Academy of Sciences
This paper introduces a novel training policy that leverages catastrophic forgetting to make diffusion models resilient against malicious fine-tuning, effectively preventing the generation of harmful …
Learning Segmentation from Point Trajectories
·2329 words·11 mins· loading · loading
Image Segmentation 🏒 University of Oxford
This paper introduces a novel unsupervised video object segmentation method using long-term point trajectories and optical flow, outperforming prior art by effectively combining sparse, long-term moti…
Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random
·289 words·2 mins· loading · loading
Active Learning 🏒 University of Texas at Austin
Proper learning of noisy halfspaces with margins is achievable with sample complexity matching random classification noise, defying prior expectations.
Latent Intrinsics Emerge from Training to Relight
·1743 words·9 mins· loading · loading
Image Generation 🏒 University of Chicago
A novel data-driven relighting model achieves state-of-the-art accuracy by learning latent intrinsic and extrinsic scene properties, even recovering albedo without explicit supervision.
Latent Diffusion for Neural Spiking Data
·3092 words·15 mins· loading · loading
🏒 University of Tübingen
LDNS: a new generative model for neural spiking data, enabling high-fidelity sampling and low-dimensional latent inference, paving the way for simulating realistic brain activity.
Language Generation in the Limit
·244 words·2 mins· loading · loading
🏒 Cornell University
This paper proves that language generation in the limit is always possible, even with an adversarial setting, contrasting with the impossibility of language identification in the limit.
Kermut: Composite kernel regression for protein variant effects
·4659 words·22 mins· loading · loading
🏒 University of Copenhagen
Kermut: A novel Gaussian process regression model achieves state-of-the-art accuracy in predicting protein variant effects and provides reliable uncertainty estimates, crucial for protein engineering …
Is Your LiDAR Placement Optimized for 3D Scene Understanding?
·3150 words·15 mins· loading · loading
AI Applications Autonomous Vehicles 🏒 University of Michigan
Place3D optimizes LiDAR placement for superior 3D scene understanding.
Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification
·3714 words·18 mins· loading · loading
Image Classification 🏒 Xi'an-Jiaotong Liverpool University
This paper introduces L-Reg, a novel logical regularization technique, to improve generalization in visual classification. L-Reg effectively reduces model complexity and improves interpretability by f…