Interpretability

Zipper: Addressing Degeneracy in Algorithm-Agnostic Inference

26 September 2024·1827 words·9 mins· loading · loading

AI Theory Interpretability 🏢 Nankai University

Zipper: A novel statistical device resolves the degeneracy issue in algorithm-agnostic inference, enabling reliable goodness-of-fit tests with enhanced power.

What makes unlearning hard and what to do about it

26 September 2024·5453 words·26 mins· loading · loading

AI Theory Interpretability 🏢 University of Warwick

Researchers developed RUM, a refined unlearning meta-algorithm, that significantly improves existing unlearning methods by strategically refining forget sets and employing appropriate unlearning algor…

Utilizing Human Behavior Modeling to Manipulate Explanations in AI-Assisted Decision Making: The Good, the Bad, and the Scary

26 September 2024·3625 words·18 mins· loading · loading

AI Generated AI Theory Interpretability 🏢 Purdue University

AI explanations can be subtly manipulated to influence human decisions, highlighting the urgent need for more robust and ethical AI explanation design.

Using Noise to Infer Aspects of Simplicity Without Learning

26 September 2024·2004 words·10 mins· loading · loading

AI Theory Interpretability 🏢 Department of Computer Science, Duke University

Noise in data surprisingly simplifies machine learning models, improving their interpretability without sacrificing accuracy; this paper quantifies this effect across various hypothesis spaces.

Training for Stable Explanation for Free

26 September 2024·2565 words·13 mins· loading · loading

AI Theory Interpretability 🏢 Hong Kong University of Science and Technology

R2ET: training for robust ranking explanations by an effective regularizer.

Towards the Dynamics of a DNN Learning Symbolic Interactions

26 September 2024·1849 words·9 mins· loading · loading

AI Theory Interpretability 🏢 Shanghai Jiao Tong University

DNNs learn interactions in two phases: initially removing complex interactions, then gradually learning higher-order ones, leading to overfitting.

The Intelligible and Effective Graph Neural Additive Network

26 September 2024·2248 words·11 mins· loading · loading

AI Theory Interpretability 🏢 Tel Aviv University

GNAN: a novel interpretable graph neural network achieving accuracy comparable to black-box models.

Testing Calibration in Nearly-Linear Time

26 September 2024·1823 words·9 mins· loading · loading

AI Generated AI Theory Interpretability 🏢 Harvard University

This paper presents nearly-linear time algorithms for testing model calibration, improving upon existing methods and providing theoretical lower bounds for various calibration measures.

Stochastic Concept Bottleneck Models

26 September 2024·2532 words·12 mins· loading · loading

AI Generated AI Theory Interpretability 🏢 ETH Zurich

Stochastic Concept Bottleneck Models (SCBMs) revolutionize interpretable ML by efficiently modeling concept dependencies, drastically improving intervention effectiveness and enabling CLIP-based conce…

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

26 September 2024·2842 words·14 mins· loading · loading

AI Theory Interpretability 🏢 Stanford University

Stochastic Amortization accelerates feature and data attribution by training amortized models using noisy, yet unbiased, labels, achieving order-of-magnitude speedups over existing methods.

RegExplainer: Generating Explanations for Graph Neural Networks in Regression Tasks

26 September 2024·2208 words·11 mins· loading · loading

AI Theory Interpretability 🏢 New Jersey Institute of Technology

RegExplainer unveils a novel method for interpreting graph neural networks in regression tasks, bridging the explanation gap by addressing distribution shifts and tackling continuously ordered decisio…

Optimal ablation for interpretability

26 September 2024·3425 words·17 mins· loading · loading

AI Theory Interpretability 🏢 Harvard University

Optimal ablation (OA) improves model interpretability by precisely measuring component importance, outperforming existing methods. OA-based importance shines in circuit discovery, factual recall, and …

One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently

26 September 2024·1941 words·10 mins· loading · loading

AI Generated AI Theory Interpretability 🏢 National University of Singapore

One-Sample-Fits-All (OFA) framework efficiently approximates all probabilistic values simultaneously, achieving faster convergence rates than existing methods.

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

26 September 2024·2116 words·10 mins· loading · loading

AI Theory Interpretability 🏢 Duke University

DNNs are powerful but lack the clear semantics of PGMs. This paper innovatively constructs infinite tree-structured PGMs that exactly correspond to DNNs, revealing that DNN forward propagation approxi…

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

26 September 2024·4050 words·20 mins· loading · loading

AI Theory Interpretability 🏢 Queen Mary University of London

Multilinear Mixture of Experts (μMoE) achieves scalable expert specialization in deep neural networks through tensor factorization, enabling efficient fine-tuning and interpretable model editing.

Most Influential Subset Selection: Challenges, Promises, and Beyond

26 September 2024·1721 words·9 mins· loading · loading

AI Theory Interpretability 🏢 University of Illinois Urbana-Champaign

Adaptive greedy algorithms significantly improve the accuracy of identifying the most influential subset of training data, overcoming limitations of existing methods that fail to capture complex inter…

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

26 September 2024·4015 words·19 mins· loading · loading

AI Generated AI Theory Interpretability 🏢 University of Maryland

Counterfactual Clamping Attack (CCA) improves model reconstruction using counterfactual explanations by leveraging decision boundary proximity, offering theoretical guarantees and enhanced fidelity.

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

26 September 2024·2097 words·10 mins· loading · loading

Natural Language Processing Interpretability 🏢 MIT

New metrics and p-annealing improve sparse autoencoder training for better language model interpretability.

Measuring Per-Unit Interpretability at Scale Without Humans

26 September 2024·4136 words·20 mins· loading · loading

Computer Vision Interpretability 🏢 Tübingen AI Center

New scalable method measures per-unit interpretability in vision DNNs without human evaluation, revealing anti-correlation between model performance and interpretability.

MambaLRP: Explaining Selective State Space Sequence Models

26 September 2024·3148 words·15 mins· loading · loading

AI Theory Interpretability 🏢 Google DeepMind

MambaLRP enhances explainability of Mamba sequence models by ensuring faithful relevance propagation, achieving state-of-the-art explanation performance, and uncovering model biases.