Definition
In deep learning, the transformer is a family of artificial neural network architectures based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.
Related concepts
15.aiAAAI Conference on Artificial IntelligenceAI21 LabsAI acceleratorAI agentAI alignmentAI anthropomorphismAI boomAI bubbleAI data centerAI effectAI literacyAI nationalismAI safetyAI slopAI takeoverAI veganismAI winterAction selectionActivation functionActive learning (machine learning)Adobe FireflyAdversarial machine learningAgent2AgentAidan GomezAlan TuringAlec RadfordAlexNetAlex Graves (computer scientist)Alex KrizhevskyAlibaba GroupAllen NewellAlphaDevAlphaFoldAlphaGenomeAlphaGeometryAlphaGoAlphaGo (film)AlphaGo ZeroAlphaGo versus Fan HuiAlphaGo versus Ke JieAlphaGo versus Lee SedolAlphaStar (software)AlphaZeroAndrej KarpathyAndrew NgAnomaly detectionAnthropicApplications of artificial intelligenceApprenticeship learningArthur MenschArtificial Intelligence ActArtificial Intelligence Cold WarArtificial general intelligenceArtificial human companionArtificial intelligenceArtificial intelligence and copyrightArtificial intelligence and electionsArtificial intelligence arms raceArtificial intelligence content detectionArtificial intelligence in architectureArtificial intelligence in educationArtificial intelligence in fictionArtificial intelligence in healthcareArtificial intelligence in mental healthArtificial intelligence in video gamesArtificial intelligence visual artArtificial neural networkArtificial superintelligenceAshish VaswaniAssociation rule learningAttention (machine learning)Attention Is All You NeedAttention is all you needAudio signal processingAurora (text-to-image model)AutoGPTAutoencoderAutomated machine learningAutomated reasoningAutomated theorem provingAutomatic summarizationAutoregressive modelAutoregressive modelsBERT (language model)BF16BIRCHBLOOM (language model)BackpropagationBag-of-words modelBaiduBatch learningBatch normalizationBayesian networkBernard WidrowBias–variance tradeoffBlackwell (microarchitecture)Block matrixBoltzmann machineBoosting (machine learning)Bootstrap aggregatingByte pair encodingCUDACURE algorithmCVPRCache (computing)Canonical correlationChain-of-thought promptingChatGPTChatbotChatbot psychosisChinchilla (language model)Christopher D. ManningChromaDBCircuit (neural network)Claude (language model)Claude ShannonCliff ShawCluster analysisCode generationCoefficient of determinationCohereCommon CrawlComparison of deep learning softwareCompetition in artificial intelligenceComplex numberComputational learning theoryComputational linguisticsComputer chessComputer programmingComputer visionConditional random fieldConference on Neural Information Processing SystemsConfusion matrixConjugate gradient methodConstitutional AIContext windowContextualization (computer science)ConvolutionConvolutional neural networkCrewAICrowdsourcingCurriculum learningDALL-EDBRXDBSCANDaniel Kokotajlo (researcher)Dario AmodeiData augmentationData cleaningData miningData setDavid Silver (computer scientist)Decision tree learningDeepDreamDeepSeekDeepSeek (chatbot)Deep learningDeep learning speech synthesisDemis HassabisDensity estimationDifferentiable neural computerDiffusion modelDiffusion processDimensionality reductionDot-product attentionDot productDouble descentDreamBoothDream Machine (text-to-video model)ECML PKDDELMoEcho state networkEfficientNetElectrochemical RAMEleutherAIElevenLabsElman networkElo rating systemEmpirical risk minimizationEnsemble learningEnvironmental impact of artificial intelligenceErnie BotEthics of artificial intelligenceEvaluation functionExistential risk from artificial intelligenceExpectation–maximization algorithmExplainable artificial intelligenceFP16Facial recognition systemFactor analysisFast weightFeature engineeringFeature learningFeedforward neural networkFei-Fei LiFine-tuning (deep learning)Fine-tuning (machine learning)Floating-point arithmeticFlux (text-to-image model)Foundation modelFramework (computer science)Frank RosenblattFrançois CholletFuture of Go SummitFuzzy clusteringGPT-1GPT-2GPT-3GPT-4GPT-4.1GPT-4.5GPT-4oGPT-5GPT-5.5GPT-JGPTZeroGPT ImageGated Linear UnitGated recurrent unitGated recurrent unitsGating mechanismGato (DeepMind)Gemini (chatbot)Gemini (language model)Gemini RoboticsGemma (language model)Generative AIGenerative adversarial networkGenerative artificial intelligenceGenerative engine optimizationGenerative modelGenerative pre-trained transformerGenie (AI model)Genie (world model)Geoffrey HintonGlitch tokenGloVeGlossary of artificial intelligenceGoogleGoogle AIGoogle AntigravityGoogle AssistantGoogle BrainGoogle DeepMindGoogle GeminiGoogle LabsGoogle Neural Machine TranslationGoogle PixelGoogle TranslateGoogle VidsGoogle WorkspaceGradient descentGrammar inductionGram–Schmidt processGrandmaster (chess)Graph neural networkGraphical modelGraphics processing unitGrok (chatbot)Hallucination (artificial intelligence)Handwriting recognitionHans UszkoreitHerbert A. SimonHidden Markov modelHierarchical clusteringHigh Bandwidth MemoryHigher-order neural networkHighway networkHistory of artificial intelligenceHuawei PanGuHugging FaceHuman-in-the-loopHuman image synthesisHumanity's Last ExamHyperparameter (machine learning)Hyperscale computingIBM GraniteIBM WatsonIBM WatsonxIan GoodfellowIdeogram (text-to-image model)Ilya SutskeverImagen (text-to-image model)Imitation learningIn-context learningInception (deep learning architecture)Independent component analysisInductive biasInference (machine learning)Inference engineInstruction tuningIntegerIntelligent agentInternational Conference on Learning RepresentationsInternational Conference on Machine LearningInternational Joint Conference on Artificial IntelligenceIsolation forestJais (language model)James GoodnightJan LeikeJohn HopfieldJohn McCarthy (computer scientist)John SchulmanJohn von NeumannJoseph WeizenbaumJournal of Machine Learning ResearchJuergen SchmidhuberJürgen SchmidhuberK-means clusteringK-nearest neighbors algorithmKernel machinesKling AIKnowledge distillationKunihiko FukushimaLLM-as-a-JudgeLLMs in higher educationLMArenaLSTMsLaMDALangChainLanguage modelLanguage model benchmarkLanguage modelingLarge language modelLatent diffusion modelLayer normalizationLeNetLearning curve (machine learning)Learning rateLearning to rankLethal autonomous weaponLexical analysisLinear discriminant analysisLinear regressionList of artificial intelligence companiesList of artificial intelligence projectsList of chatbotsList of datasets for machine-learning researchList of datasets in computer vision and image processingList of large language modelsLists of open-source artificial intelligence softwareLlama.cppLlama (language model)Local outlier factorLocality-sensitive hashingLogistic regressionLong short-term memoryLookup tableLoss functionLoss functions for classificationLotfi A. ZadehLow-rank approximationMMLUMachine Learning (journal)Machine learningMachine translationMamba (deep learning architecture)Man bites dogManus (AI agent)Marvin MinskyMaster (software)Mean shiftMechanistic interpretabilityMemory pagingMemtransistorMeta-learning (computer science)Meta AIMicrosoft AIMicrosoft CopilotMidjourneyMinerva (model)Ming-Hsuan YangMiniMax (company)MinimaxMira MuratiMistral AIMixture of expertsMobileNetModel Context ProtocolModel compressionMuZeroMulti-agent reinforcement learningMultilayer perceptronMultimodal learningMusic and artificial intelligenceMustafa SuleymanNaive Bayes classifierNamed-entity recognitionNano BananaNathaniel Rochester (computer scientist)Natural-language understandingNatural languageNatural language generationNatural language processingNemotronNetwork throughputNeural Turing machineNeural fieldNeural machine translationNeural network (machine learning)Neural radiance fieldNeural scaling lawNeuro-symbolic AINeuromorphic engineeringNoam ShazeerNon-negative matrix factorizationNormalization (machine learning)NotebookLMNvidiaNvidia A100Nvidia H100OPTICS algorithmOasis (Minecraft clone)Object-oriented programmingObject hierarchyOccam learningOliver SelfridgeOne-hotOnline machine learningOntology learningOpenAIOpenAI Codex (AI agent)OpenAI Codex (language model)OpenAI FiveOpenAI o1OpenAI o3OpenAI o4-miniOpenVINOOpen Neural Network ExchangeOptical character recognitionOriol VinyalsOutline of machine learningOverfittingPaLMPagedAttentionParallel computingParameterPaul WerbosPerceiverPerceptronPercy LiangPermutation matrixPerplexityPerplexity AIPhi (language model)Physics-informed neural networksPipeline (Unix)Policy gradient methodPrecautionary principlePrincipal component analysisProbably approximately correct learningProject DebaterProjection matrixPrompt engineeringPrompt injectionProper generalized decompositionProtein structure predictionPyTorchQ-learningQuadratic functionQuantum Artificial Intelligence LabQuantum machine learningQuasi-Newton methodQuestion answeringQuoc V. LeQwenRMSNormRadial basis function kernelRandom forestRandom sample consensusReLUReasoning modelReceiver operating characteristicRecraftRectifier (neural networks)Recurrent neural networkRecursive self-improvementReflection (artificial intelligence)Regression analysisRegularization (mathematics)Regulation of artificial intelligenceRegulation of artificial intelligence in the United StatesReinforcement learningReinforcement learning from human feedbackRelevance vector machineReservoir computingResidual neural networkRestricted Boltzmann machineRetrieval-augmented generationRiffusionRobot ConstitutionRobot controlRoboticsRule-based machine learningRunway (company)SGLangSam AltmanSeedance 2.0Self-driving carSelf-organizing mapSelf-play (reinforcement learning technique)Self-supervised learningSemantic analysis (machine learning)Semi-supervised learningSentiment analysisSepp HochreiterSeppo LinnainmaaSeq2seqSequence analysisSeymour PapertShun'ichi AmariSigmoid functionSmall-world networkSmall language modelSoftmax functionSora (text-to-video model)Sparrow (chatbot)Sparse dictionary learningSpectrogramSpeculative decodingSpeculative executionSpeech-to-textSpeech recognitionSpiking neural networkStable DiffusionState of the artState–action–reward–state–actionStatistical classificationStatistical learning theoryStatistical machine translationStephen GrossbergStochastic gradient descentStochastic parrotStructured predictionSuno (platform)Supervised learningSupport vector machineSymbolic artificial intelligenceSynthetic dataT-distributed stochastic neighbor embeddingT5 (language model)Takeo KanadeTechnology Innovation InstituteTemporal difference learningTensorFlowTensorRTTensor Processing UnitText-to-image modelText-to-video modelText SummariesText corpusText summarizationTextual entailmentThe MANIACThe Pile (dataset)Time seriesTimeline of artificial intelligenceTimeline of machine learningToeplitz matrixTokenizationTokenization (lexical analysis)Topological deep learningTraining, validation, and test data setsTransfer learningTransformer (deep learning architecture)U-NetUdioUncanny valleyUndetectable.aiUnsupervised learningVLLMVanishing-gradient problemVapnik–Chervonenkis theoryVariational autoencoderVector databaseVeo (text-to-video model)Vibe codingVicuna LLMVideoPoetVirtual assistantVirtual politicianVision transformerWalter PittsWarren Sturgis McCullochWaveNetWeak artificial intelligenceWeb scrapingWeight initializationWhisper (speech recognition system)Word2vecWord embeddingWorkplace impact of artificial intelligenceWorld model (artificial intelligence)XAI (company)XLNetXiaomi MiMoYann LeCunYoshua BengioYou.com
21 concepts already in your glossary