Mechanistic interpretability

Definition

Mechanistic interpretability is a subfield of research within explainable artificial intelligence that aims to understand the internal workings of neural networks by analyzing the mechanisms present in their computations. The approach seeks to analyze neural networks in a manner similar to how binary computer programs can be reverse-engineered to understand their functions.

Related concepts

AAAI Conference on Artificial Intelligence ACM Computing Surveys AI safety Active learning (machine learning)AlexNet Anomaly detection Anthropic Apprenticeship learning Artificial neural network Association for Computational Linguistics Association rule learning Atlanta Autoencoder Automated machine learning BIRCH Batch learning Bayesian network Bias–variance tradeoff Binary file Boltzmann machine Boosting (machine learning)Bootstrap aggregating CURE algorithm Canonical correlation Chris Olah Christopher Potts Circuit (neural network)Cluster analysis Coefficient of determination Computational learning theory Conditional random field Conference on Neural Information Processing Systems Confusion matrix Convolutional neural network Crowdsourcing Curriculum learning DBSCAN Data cleaning Data mining Decision tree learning DeepDream Deep learning Density estimation Diffusion model Dimensionality reduction Distill (journal)ECML PKDD Echo state network Electrochemical RAM Empirical risk minimization Ensemble learning Expectation–maximization algorithm Explainable artificial intelligence Factor analysis Feature engineering Feature learning Feedforward neural network Fuzzy clustering Gated recurrent unit Generative adversarial network Generative model Glossary of artificial intelligence Grammar induction Graphical model Hidden Markov model Hierarchical clustering Human-in-the-loop Inception (deep learning architecture)Independent component analysis International Conference on Learning Representations International Conference on Machine Learning International Joint Conference on Artificial Intelligence Isolation forest Journal of Machine Learning Research K-means clustering K-nearest neighbors algorithm Kernel machines LeNet Learning curve (machine learning)Learning to rank Linear discriminant analysis Linear regression List of datasets for machine-learning research List of datasets in computer vision and image processing Local outlier factor Logistic regression Long short-term memory Machine Learning (journal)Machine learning Mamba (deep learning architecture)Mean shift Memtransistor Meta-learning (computer science)Multi-agent reinforcement learning Multimodal learning Naive Bayes classifier Neural field Neural network (machine learning)Neural radiance field Neuro-symbolic AI Neuromorphic engineering Non-negative matrix factorization North American Chapter of the Association for Computational Linguistics OPTICS algorithm Occam learning Online machine learning Ontology learning Outline of machine learning Perceptron Physics-informed neural networks Policy gradient method Principal component analysis Probably approximately correct learning Proper generalized decomposition Q-learning Quantum machine learning Random forest Random sample consensus Receiver operating characteristic Recurrent neural network Red Hook, New York Regression analysis Reinforcement learning Reinforcement learning from human feedback Relevance vector machine Reservoir computing Restricted Boltzmann machine Reverse engineering Rule-based machine learning Saliency map Self-organizing map Self-play (reinforcement learning technique)Self-supervised learning Semantic analysis (machine learning)Semi-supervised learning Sparse autoencoder Sparse dictionary learning Spiking neural network State–action–reward–state–action Statistical classification Statistical learning theory Structured prediction Supervised learning Support vector machine T-distributed stochastic neighbor embedding Temporal difference learning The Economist Tomáš Mikolov Topological deep learning Transformer (deep learning architecture)U-Net Unsupervised learning Vapnik–Chervonenkis theory Vienna Vision transformer Wired (magazine)Word embeddings

8 concepts already in your glossary