Spotlight Others
2024
Input-to-State Stable Coupled Oscillator Networks for Closed-form Model-based Control in Latent Space
·4386 words·21 mins·
loading
·
loading
AI Applications
Robotics
🏢 Delft University of Technology
Stable closed-loop control in latent space is achieved using a novel Coupled Oscillator Network, offering efficient model-based control for complex nonlinear systems directly from image data.
In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness
·1771 words·9 mins·
loading
·
loading
Meta Learning
🏢 University of Texas at Austin
Softmax attention in transformers adapts its attention window to function Lipschitzness and noise, enabling efficient in-context learning.
Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity
·1943 words·10 mins·
loading
·
loading
Federated Learning
🏢 KAUST
MARINA-P and M3 algorithms drastically cut downlink and overall communication costs in nonconvex distributed optimization, scaling efficiently with the number of worker nodes.
Improving robustness to corruptions with multiplicative weight perturbations
·1713 words·9 mins·
loading
·
loading
Image Classification
🏢 Aalto University
Boost DNN robustness to corruptions without sacrificing clean image accuracy using Data Augmentation via Multiplicative Perturbations (DAMP)!
Humanoid Locomotion as Next Token Prediction
·1485 words·7 mins·
loading
·
loading
AI Applications
Robotics
🏢 University of California, Berkeley
Humanoid robots now walk in San Francisco zero-shot, thanks to a novel ’next token prediction’ approach trained on diverse sensorimotor data, enabling real-world generalization and data efficiency.
Hardness of Learning Neural Networks under the Manifold Hypothesis
·2154 words·11 mins·
loading
·
loading
🏢 Harvard University
Neural network learnability under the manifold hypothesis is hard except for efficiently sampleable manifolds.
Gradients of Functions of Large Matrices
·2389 words·12 mins·
loading
·
loading
🏢 Technical University of Denmark
This research presents novel adjoint methods for efficiently differentiating Lanczos and Arnoldi iterations, unlocking accurate gradients for large-matrix functions in machine learning.
Geodesic Optimization for Predictive Shift Adaptation on EEG data
·2001 words·10 mins·
loading
·
loading
Transfer Learning
🏢 Inria
GOPSA: a novel geodesic optimization method significantly improves cross-site age prediction from EEG data by jointly handling shifts in data and predictive variables.
Generative Retrieval Meets Multi-Graded Relevance
·2003 words·10 mins·
loading
·
loading
Information Retrieval
🏢 University of Chinese Academy of Sciences
GR2, a novel framework, extends generative retrieval to handle multi-graded relevance, addressing limitations of existing binary-relevance approaches by enhancing docid distinctness and implementing m…
Generalized Protein Pocket Generation with Prior-Informed Flow Matching
·1970 words·10 mins·
loading
·
loading
🏢 Harvard University
PocketFlow: a novel generative model designs high-affinity protein pockets using prior-informed flow matching, outperforming existing methods.
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
·2413 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 Tsinghua University
GenArtist uses a multimodal large language model as an AI agent to unify image generation and editing, achieving state-of-the-art performance by decomposing complex tasks and leveraging a comprehensiv…
FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion
·2415 words·12 mins·
loading
·
loading
Federated Learning
🏢 Hong Kong Baptist University
FuseFL achieves superior one-shot federated learning performance by leveraging a causal view of data heterogeneity and progressively fusing model blocks, significantly outperforming existing methods w…
FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images
·1953 words·10 mins·
loading
·
loading
Image Generation
🏢 Shanghai Jiao Tong University
FuseAnyPart: Swap facial parts seamlessly using multiple reference images via diffusion, achieving high-fidelity results.
Flexible task abstractions emerge in linear networks with fast and bounded units
·3629 words·18 mins·
loading
·
loading
🏢 Massachusetts Institute of Technology
Linear gated neural networks with fast, bounded units self-organize into modular weight structures and unique gating representations, enabling flexible task switching and compositional generalization.
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
·1871 words·9 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 University of North Carolina at Chapel Hill
Flex-MoE: A novel framework flexibly handles arbitrary modality combinations in multimodal learning, even with missing data, achieving robust performance.
Fine Tuning Out-of-Vocabulary Item Recommendation with User Sequence Imagination
·2088 words·10 mins·
loading
·
loading
Recommendation Systems
🏢 Central South University
User Sequence Imagination (USIM) revolutionizes out-of-vocabulary item recommendation by leveraging user sequence imagination and RL fine-tuning, achieving superior performance in real-world e-commerc…
Finding Transformer Circuits With Edge Pruning
·2284 words·11 mins·
loading
·
loading
Interpretability
🏢 Princeton University
Edge Pruning efficiently discovers sparse, yet accurate, computational subgraphs (circuits) in large language models via gradient-based edge pruning, advancing mechanistic interpretability research.
Fearless Stochasticity in Expectation Propagation
·1730 words·9 mins·
loading
·
loading
🏢 University of Cambridge
This paper introduces EP-η and EP-μ, novel EP variants remarkably robust to Monte Carlo noise, achieving improved speed and accuracy.
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
·1899 words·9 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 University of Science and Technology of China
Frolic: A label-free framework boosts zero-shot vision model accuracy by learning prompt distributions and correcting label bias, achieving state-of-the-art performance across multiple datasets.
Enhancing LLM Reasoning via Vision-Augmented Prompting
·2157 words·11 mins·
loading
·
loading
Multimodal Learning
Multimodal Reasoning
🏢 Zhejiang University
Vision-Augmented Prompting (VAP) boosts LLM reasoning by automatically generating images from textual problem descriptions, incorporating visual-spatial clues to significantly improve accuracy across …