🏢 IBM Research

Worst-Case Offline Reinforcement Learning with Arbitrary Data Support

26 September 2024·450 words·3 mins· loading · loading

AI Generated Machine Learning Reinforcement Learning 🏢 IBM Research

Worst-case offline RL guarantees near-optimal policy performance without data support assumptions, achieving a sample complexity bound of O(ε⁻²).

WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models

26 September 2024·2439 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

WAGLE: A novel weight attribution-guided LLM unlearning framework boosts unlearning performance by strategically identifying and manipulating influential model weights, achieving a better balance betw…

Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series

26 September 2024·3445 words·17 mins· loading · loading

Machine Learning Few-Shot Learning 🏢 IBM Research

Tiny Time Mixers (TTMs) achieve state-of-the-art zero/few-shot multivariate time series forecasting, outperforming existing benchmarks while drastically reducing computational requirements.

Thought of Search: Planning with Language Models Through The Lens of Efficiency

26 September 2024·282 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…

Shuffling Gradient-Based Methods for Nonconvex-Concave Minimax Optimization

26 September 2024·337 words·2 mins· loading · loading

AI Generated AI Theory Optimization 🏢 IBM Research

New shuffling gradient methods achieve state-of-the-art oracle complexity for nonconvex-concave minimax optimization problems, offering improved performance and efficiency.

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

26 September 2024·4651 words·22 mins· loading · loading

Machine Learning Deep Learning 🏢 IBM Research

NeuralFuse: A novel add-on module learns input transformations to maintain accuracy in low-voltage DNN inference, achieving up to 57% accuracy recovery and 24% energy savings without retraining.

Neural Network Reparametrization for Accelerated Optimization in Molecular Simulations

26 September 2024·2783 words·14 mins· loading · loading

AI Generated AI Theory Optimization 🏢 IBM Research

Accelerate molecular simulations using neural network reparametrization! This flexible method adjusts system complexity, enhances optimization, and maintains continuous access to fine-grained modes, o…

Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

26 September 2024·2386 words·12 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 IBM Research

Large Multimodal Models (LMMs) are limited by their context length during many-shot in-context learning. This paper introduces Multimodal Task Vectors (MTV), a method to compress numerous in-context …

Limits of Transformer Language Models on Learning to Compose Algorithms

26 September 2024·2755 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

Large Language Models struggle with compositional tasks, requiring exponentially more data than expected for learning compared to learning sub-tasks individually. This paper reveals surprising sample …

Invariant subspaces and PCA in nearly matrix multiplication time

26 September 2024·336 words·2 mins· loading · loading

AI Theory Optimization 🏢 IBM Research

Generalized eigenvalue problems get solved in nearly matrix multiplication time, providing new, faster PCA algorithms!

Geometry of naturalistic object representations in recurrent neural network models of working memory

26 September 2024·4025 words·19 mins· loading · loading

AI Generated Machine Learning Deep Learning 🏢 IBM Research

RNNs represent naturalistic objects in WM using chronological subspaces, defying traditional slot models; object features are less orthogonalized in RNNs vs. perceptual space.

Distributional Preference Alignment of LLMs via Optimal Transport

26 September 2024·2204 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

LLMs are aligned to human preferences distributionally using Optimal Transport, achieving state-of-the-art performance.

Dense Associative Memory Through the Lens of Random Features

26 September 2024·1742 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 IBM Research

Boost associative memory capacity without extra parameters! DrDAM uses random features to approximate Dense Associative Memories, enabling efficient memory addition and retrieval.

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

26 September 2024·1724 words·9 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 IBM Research

Longer context in RL boosts generalization but slows down learning; this paper reveals the crucial tradeoff and offers theoretical insights.

Abductive Reasoning in Logical Credal Networks

26 September 2024·1676 words·8 mins· loading · loading

AI Generated AI Theory Optimization 🏢 IBM Research

This paper presents efficient algorithms for abductive reasoning in Logical Credal Networks (LCNs), addressing the MAP and Marginal MAP inference tasks to enable scalable solutions for complex real-wo…

A two-scale Complexity Measure for Deep Learning Models

26 September 2024·1709 words·9 mins· loading · loading

Machine Learning Deep Learning 🏢 IBM Research

New 2sED measure effectively bounds deep learning model complexity, correlating well with training error and offering efficient computation, particularly for deep models via a layerwise approach.

A Surprisingly Simple Approach to Generalized Few-Shot Semantic Segmentation

26 September 2024·2178 words·11 mins· loading · loading

AI Generated Computer Vision Image Segmentation 🏢 IBM Research

Simple rule-based base-class mining (BCM) significantly boosts generalized few-shot semantic segmentation (GFSS) performance, surpassing complex existing methods.