Spotlight Others

Moving Off-the-Grid: Scene-Grounded Video Representations

26 September 2024·2151 words·11 mins· loading · loading

Video Understanding 🏢 Google DeepMind

MooG: Self-supervised video model learns off-the-grid representations, enabling consistent scene element tracking even with motion; outperforming grid-based baselines on various vision tasks.

MotionBooth: Motion-Aware Customized Text-to-Video Generation

26 September 2024·2883 words·14 mins· loading · loading

🏢 Peking University

MotionBooth: A new framework enabling precise control over both object and camera movements in customized text-to-video generation, achieving high-quality video while maintaining training efficiency.

Motion Forecasting in Continuous Driving

26 September 2024·1965 words·10 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 Fudan University

RealMotion: a novel motion forecasting framework for continuous driving that outperforms existing methods by accumulating historical scene information and sequentially refining predictions, achieving …

Monte Carlo Tree Search based Space Transfer for Black Box Optimization

26 September 2024·2970 words·14 mins· loading · loading

Transfer Learning 🏢 Nanjing University

MCTS-transfer: Iteratively refining Bayesian optimization via Monte Carlo tree search for efficient black-box optimization using transfer learning.

Molecule Design by Latent Prompt Transformer

26 September 2024·2788 words·14 mins· loading · loading

🏢 University of California, Los Angeles

Latent Prompt Transformer (LPT) revolutionizes molecule design by unifying generation and optimization, achieving high efficiency in discovering novel molecules with desired properties.

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

26 September 2024·3662 words·18 mins· loading · loading

3D Vision 🏢 NVIDIA

3D Gaussian Mapping (3DGM) achieves self-supervised camera-only 3D scene decomposition by leveraging multi-traverse driving data, memorizing permanent structures while filtering out transient objects.

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

26 September 2024·2752 words·13 mins· loading · loading

Video Understanding 🏢 Shanghai Jiao Tong University

MECD: A new task and dataset unlocks multi-event causal discovery in videos, enabling a novel framework that outperforms existing models by efficiently identifying causal relationships between chronol…

MambaTree: Tree Topology is All You Need in State Space Model

26 September 2024·1962 words·10 mins· loading · loading

Image Classification 🏢 Tsinghua Shenzhen International Graduate School

MambaTree: A novel tree-topology-based state space model surpasses existing methods by dynamically generating input-aware topologies for enhanced long-range dependencies in vision and language.

LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation

26 September 2024·1957 words·10 mins· loading · loading

AI Applications Recommendation Systems 🏢 City University of Hong Kong

LLM-ESR enhances sequential recommendation by integrating semantic information from LLMs, significantly improving performance on long-tail users and items.

Linear Regression using Heterogeneous Data Batches

26 September 2024·1554 words·8 mins· loading · loading

Meta Learning 🏢 Google Research

New algorithm efficiently solves linear regression with heterogeneous data batches, handling diverse input distributions and achieving high accuracy with fewer samples.

LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS

26 September 2024·2198 words·11 mins· loading · loading

3D Vision 🏢 University of Texas at Austin

LightGaussian achieves 15x compression of 3D Gaussian scene representations, boosting rendering speed to 200+ FPS while maintaining visual quality, solving storage and efficiency issues in real-time n…

Leveraging Catastrophic Forgetting to Develop Safe Diffusion Models against Malicious Finetuning

26 September 2024·2397 words·12 mins· loading · loading

🏢 Institute of Computing Technology, Chinese Academy of Sciences

This paper introduces a novel training policy that leverages catastrophic forgetting to make diffusion models resilient against malicious fine-tuning, effectively preventing the generation of harmful …

Learning Segmentation from Point Trajectories

26 September 2024·2329 words·11 mins· loading · loading

Image Segmentation 🏢 University of Oxford

This paper introduces a novel unsupervised video object segmentation method using long-term point trajectories and optical flow, outperforming prior art by effectively combining sparse, long-term moti…

Learning Noisy Halfspaces with a Margin: Massart is No Harder than Random

26 September 2024·289 words·2 mins· loading · loading

Active Learning 🏢 University of Texas at Austin

Proper learning of noisy halfspaces with margins is achievable with sample complexity matching random classification noise, defying prior expectations.

Latent Intrinsics Emerge from Training to Relight

26 September 2024·1743 words·9 mins· loading · loading

Image Generation 🏢 University of Chicago

A novel data-driven relighting model achieves state-of-the-art accuracy by learning latent intrinsic and extrinsic scene properties, even recovering albedo without explicit supervision.

Latent Diffusion for Neural Spiking Data

26 September 2024·3092 words·15 mins· loading · loading

🏢 University of Tübingen

LDNS: a new generative model for neural spiking data, enabling high-fidelity sampling and low-dimensional latent inference, paving the way for simulating realistic brain activity.

Language Generation in the Limit

26 September 2024·244 words·2 mins· loading · loading

🏢 Cornell University

This paper proves that language generation in the limit is always possible, even with an adversarial setting, contrasting with the impossibility of language identification in the limit.

Kermut: Composite kernel regression for protein variant effects

26 September 2024·4659 words·22 mins· loading · loading

🏢 University of Copenhagen

Kermut: A novel Gaussian process regression model achieves state-of-the-art accuracy in predicting protein variant effects and provides reliable uncertainty estimates, crucial for protein engineering …

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

26 September 2024·3150 words·15 mins· loading · loading

AI Applications Autonomous Vehicles 🏢 University of Michigan

Place3D optimizes LiDAR placement for superior 3D scene understanding.

Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

26 September 2024·3714 words·18 mins· loading · loading

Image Classification 🏢 Xi'an-Jiaotong Liverpool University

This paper introduces L-Reg, a novel logical regularization technique, to improve generalization in visual classification. L-Reg effectively reduces model complexity and improves interpretability by f…