Definition
Mechanistic interpretability is a subfield of research within explainable artificial intelligence that aims to understand the internal workings of neural networks by analyzing the mechanisms present in their computations. The approach seeks to analyze neural networks in a manner similar to how binary computer programs can be reverse-engineered to understand their functions.
Related concepts
AAAI Conference on Artificial IntelligenceACM Computing SurveysAI safetyActive learning (machine learning)AlexNetAnomaly detectionAnthropicApprenticeship learningArtificial neural networkAssociation for Computational LinguisticsAssociation rule learningAtlantaAutoencoderAutomated machine learningBIRCHBatch learningBayesian networkBias–variance tradeoffBinary fileBoltzmann machineBoosting (machine learning)Bootstrap aggregatingCURE algorithmCanonical correlationChris OlahChristopher PottsCircuit (neural network)Cluster analysisCoefficient of determinationComputational learning theoryConditional random fieldConference on Neural Information Processing SystemsConfusion matrixConvolutional neural networkCrowdsourcingCurriculum learningDBSCANData cleaningData miningDecision tree learningDeepDreamDeep learningDensity estimationDiffusion modelDimensionality reductionDistill (journal)ECML PKDDEcho state networkElectrochemical RAMEmpirical risk minimizationEnsemble learningExpectation–maximization algorithmExplainable artificial intelligenceFactor analysisFeature engineeringFeature learningFeedforward neural networkFuzzy clusteringGated recurrent unitGenerative adversarial networkGenerative modelGlossary of artificial intelligenceGrammar inductionGraphical modelHidden Markov modelHierarchical clusteringHuman-in-the-loopInception (deep learning architecture)Independent component analysisInternational Conference on Learning RepresentationsInternational Conference on Machine LearningInternational Joint Conference on Artificial IntelligenceIsolation forestJournal of Machine Learning ResearchK-means clusteringK-nearest neighbors algorithmKernel machinesLeNetLearning curve (machine learning)Learning to rankLinear discriminant analysisLinear regressionList of datasets for machine-learning researchList of datasets in computer vision and image processingLocal outlier factorLogistic regressionLong short-term memoryMachine Learning (journal)Machine learningMamba (deep learning architecture)Mean shiftMemtransistorMeta-learning (computer science)Multi-agent reinforcement learningMultimodal learningNaive Bayes classifierNeural fieldNeural network (machine learning)Neural radiance fieldNeuro-symbolic AINeuromorphic engineeringNon-negative matrix factorizationNorth American Chapter of the Association for Computational LinguisticsOPTICS algorithmOccam learningOnline machine learningOntology learningOutline of machine learningPerceptronPhysics-informed neural networksPolicy gradient methodPrincipal component analysisProbably approximately correct learningProper generalized decompositionQ-learningQuantum machine learningRandom forestRandom sample consensusReceiver operating characteristicRecurrent neural networkRed Hook, New YorkRegression analysisReinforcement learningReinforcement learning from human feedbackRelevance vector machineReservoir computingRestricted Boltzmann machineReverse engineeringRule-based machine learningSaliency mapSelf-organizing mapSelf-play (reinforcement learning technique)Self-supervised learningSemantic analysis (machine learning)Semi-supervised learningSparse autoencoderSparse dictionary learningSpiking neural networkState–action–reward–state–actionStatistical classificationStatistical learning theoryStructured predictionSupervised learningSupport vector machineT-distributed stochastic neighbor embeddingTemporal difference learningThe EconomistTomáš MikolovTopological deep learningTransformer (deep learning architecture)U-NetUnsupervised learningVapnik–Chervonenkis theoryViennaVision transformerWired (magazine)Word embeddings
8 concepts already in your glossary