Skip to main content

Interpretability

Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference
·1827 words·9 mins· loading · loading
AI Theory Interpretability 🏒 Nankai University
Zipper: A novel statistical device resolves the degeneracy issue in algorithm-agnostic inference, enabling reliable goodness-of-fit tests with enhanced power.
What makes unlearning hard and what to do about it
·5453 words·26 mins· loading · loading
AI Theory Interpretability 🏒 University of Warwick
Researchers developed RUM, a refined unlearning meta-algorithm, that significantly improves existing unlearning methods by strategically refining forget sets and employing appropriate unlearning algor…
Utilizing Human Behavior Modeling to Manipulate Explanations in AI-Assisted Decision Making: The Good, the Bad, and the Scary
·3625 words·18 mins· loading · loading
AI Generated AI Theory Interpretability 🏒 Purdue University
AI explanations can be subtly manipulated to influence human decisions, highlighting the urgent need for more robust and ethical AI explanation design.
Using Noise to Infer Aspects of Simplicity Without Learning
·2004 words·10 mins· loading · loading
AI Theory Interpretability 🏒 Department of Computer Science, Duke University
Noise in data surprisingly simplifies machine learning models, improving their interpretability without sacrificing accuracy; this paper quantifies this effect across various hypothesis spaces.
Training for Stable Explanation for Free
·2565 words·13 mins· loading · loading
AI Theory Interpretability 🏒 Hong Kong University of Science and Technology
R2ET: training for robust ranking explanations by an effective regularizer.
Towards the Dynamics of a DNN Learning Symbolic Interactions
·1849 words·9 mins· loading · loading
AI Theory Interpretability 🏒 Shanghai Jiao Tong University
DNNs learn interactions in two phases: initially removing complex interactions, then gradually learning higher-order ones, leading to overfitting.
The Intelligible and Effective Graph Neural Additive Network
·2248 words·11 mins· loading · loading
AI Theory Interpretability 🏒 Tel Aviv University
GNAN: a novel interpretable graph neural network achieving accuracy comparable to black-box models.
Testing Calibration in Nearly-Linear Time
·1823 words·9 mins· loading · loading
AI Generated AI Theory Interpretability 🏒 Harvard University
This paper presents nearly-linear time algorithms for testing model calibration, improving upon existing methods and providing theoretical lower bounds for various calibration measures.
Stochastic Concept Bottleneck Models
·2532 words·12 mins· loading · loading
AI Generated AI Theory Interpretability 🏒 ETH Zurich
Stochastic Concept Bottleneck Models (SCBMs) revolutionize interpretable ML by efficiently modeling concept dependencies, drastically improving intervention effectiveness and enabling CLIP-based conce…
Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution
·2842 words·14 mins· loading · loading
AI Theory Interpretability 🏒 Stanford University
Stochastic Amortization accelerates feature and data attribution by training amortized models using noisy, yet unbiased, labels, achieving order-of-magnitude speedups over existing methods.
RegExplainer: Generating Explanations for Graph Neural Networks in Regression Tasks
·2208 words·11 mins· loading · loading
AI Theory Interpretability 🏒 New Jersey Institute of Technology
RegExplainer unveils a novel method for interpreting graph neural networks in regression tasks, bridging the explanation gap by addressing distribution shifts and tackling continuously ordered decisio…
Optimal ablation for interpretability
·3425 words·17 mins· loading · loading
AI Theory Interpretability 🏒 Harvard University
Optimal ablation (OA) improves model interpretability by precisely measuring component importance, outperforming existing methods. OA-based importance shines in circuit discovery, factual recall, and …
One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently
·1941 words·10 mins· loading · loading
AI Generated AI Theory Interpretability 🏒 National University of Singapore
One-Sample-Fits-All (OFA) framework efficiently approximates all probabilistic values simultaneously, achieving faster convergence rates than existing methods.
On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models
·2116 words·10 mins· loading · loading
AI Theory Interpretability 🏒 Duke University
DNNs are powerful but lack the clear semantics of PGMs. This paper innovatively constructs infinite tree-structured PGMs that exactly correspond to DNNs, revealing that DNN forward propagation approxi…
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
·4050 words·20 mins· loading · loading
AI Theory Interpretability 🏒 Queen Mary University of London
Multilinear Mixture of Experts (ΞΌMoE) achieves scalable expert specialization in deep neural networks through tensor factorization, enabling efficient fine-tuning and interpretable model editing.
Most Influential Subset Selection: Challenges, Promises, and Beyond
·1721 words·9 mins· loading · loading
AI Theory Interpretability 🏒 University of Illinois Urbana-Champaign
Adaptive greedy algorithms significantly improve the accuracy of identifying the most influential subset of training data, overcoming limitations of existing methods that fail to capture complex inter…
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
·4015 words·19 mins· loading · loading
AI Generated AI Theory Interpretability 🏒 University of Maryland
Counterfactual Clamping Attack (CCA) improves model reconstruction using counterfactual explanations by leveraging decision boundary proximity, offering theoretical guarantees and enhanced fidelity.
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
·2097 words·10 mins· loading · loading
Natural Language Processing Interpretability 🏒 MIT
New metrics and p-annealing improve sparse autoencoder training for better language model interpretability.
Measuring Per-Unit Interpretability at Scale Without Humans
·4136 words·20 mins· loading · loading
Computer Vision Interpretability 🏒 Tübingen AI Center
New scalable method measures per-unit interpretability in vision DNNs without human evaluation, revealing anti-correlation between model performance and interpretability.
MambaLRP: Explaining Selective State Space Sequence Models
·3148 words·15 mins· loading · loading
AI Theory Interpretability 🏒 Google DeepMind
MambaLRP enhances explainability of Mamba sequence models by ensuring faithful relevance propagation, achieving state-of-the-art explanation performance, and uncovering model biases.