🏢 IBM Research
Worst-Case Offline Reinforcement Learning with Arbitrary Data Support
·450 words·3 mins·
loading
·
loading
AI Generated
Machine Learning
Reinforcement Learning
🏢 IBM Research
Worst-case offline RL guarantees near-optimal policy performance without data support assumptions, achieving a sample complexity bound of O(ε⁻²).
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
·2439 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 IBM Research
WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series
·3445 words·17 mins·
loading
·
loading
Machine Learning
Few-Shot Learning
🏢 IBM Research
Tiny Time Mixers (TTMs) achieve state-of-the-art zero/few-shot multivariate time series forecasting, outperforming existing benchmarks while drastically reducing computational requirements.
Thought of Search: Planning with Language Models Through The Lens of Efficiency
·282 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 IBM Research
This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…
Shuffling Gradient-Based Methods for Nonconvex-Concave Minimax Optimization
·337 words·2 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 IBM Research
New shuffling gradient methods achieve state-of-the-art oracle complexity for nonconvex-concave minimax optimization problems, offering improved performance and efficiency.
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
·4651 words·22 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 IBM Research
NeuralFuse: A novel add-on module learns input transformations to maintain accuracy in low-voltage DNN inference, achieving up to 57% accuracy recovery and 24% energy savings without retraining.
Neural Network Reparametrization for Accelerated Optimization in Molecular Simulations
·2783 words·14 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 IBM Research
Accelerate molecular simulations using neural network reparametrization! This flexible method adjusts system complexity, enhances optimization, and maintains continuous access to fine-grained modes, o…
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
·2386 words·12 mins·
loading
·
loading
Multimodal Learning
Vision-Language Models
🏢 IBM Research
Large Multimodal Models (LMMs) are limited by their context length during many-shot in-context learning. This paper introduces Multimodal Task Vectors (MTV), a method to compress numerous in-context …
Limits of Transformer Language Models on Learning to Compose Algorithms
·2755 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 IBM Research
Large Language Models struggle with compositional tasks, requiring exponentially more data than expected for learning compared to learning sub-tasks individually. This paper reveals surprising sample …
Invariant subspaces and PCA in nearly matrix multiplication time
·336 words·2 mins·
loading
·
loading
AI Theory
Optimization
🏢 IBM Research
Generalized eigenvalue problems get solved in nearly matrix multiplication time, providing new, faster PCA algorithms!
Geometry of naturalistic object representations in recurrent neural network models of working memory
·4025 words·19 mins·
loading
·
loading
AI Generated
Machine Learning
Deep Learning
🏢 IBM Research
RNNs represent naturalistic objects in WM using chronological subspaces, defying traditional slot models; object features are less orthogonalized in RNNs vs. perceptual space.
Distributional Preference Alignment of LLMs via Optimal Transport
·2204 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 IBM Research
LLMs are aligned to human preferences distributionally using Optimal Transport, achieving state-of-the-art performance.
Dense Associative Memory Through the Lens of Random Features
·1742 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 IBM Research
Boost associative memory capacity without extra parameters! DrDAM uses random features to approximate Dense Associative Memories, enabling efficient memory addition and retrieval.
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
·1724 words·9 mins·
loading
·
loading
Machine Learning
Reinforcement Learning
🏢 IBM Research
Longer context in RL boosts generalization but slows down learning; this paper reveals the crucial tradeoff and offers theoretical insights.
Abductive Reasoning in Logical Credal Networks
·1676 words·8 mins·
loading
·
loading
AI Generated
AI Theory
Optimization
🏢 IBM Research
This paper presents efficient algorithms for abductive reasoning in Logical Credal Networks (LCNs), addressing the MAP and Marginal MAP inference tasks to enable scalable solutions for complex real-wo…
A two-scale Complexity Measure for Deep Learning Models
·1709 words·9 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 IBM Research
New 2sED measure effectively bounds deep learning model complexity, correlating well with training error and offering efficient computation, particularly for deep models via a layerwise approach.
A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation
·2178 words·11 mins·
loading
·
loading
AI Generated
Computer Vision
Image Segmentation
🏢 IBM Research
Simple rule-based base-class mining (BCM) significantly boosts generalized few-shot semantic segmentation (GFSS) performance, surpassing complex existing methods.