Transformer Decoder-Only Architecture

Overview

This project implements a transformer decoder-only architecture, closely following the "Attention is All You Need" paper by Vaswani et al. It leverages the PyTorch library and is inspired by the "Zero to Hero" series by Andrej Karpathy. The primary goal is to provide a foundational understanding of transformers and to serve as a basis for applying neural data to transformers and conducting mechanistic interpretability.

A second implementation now superceeds the first after learnings from the Arena course. This will hope as an NLP test bed as the second stage in my learnings. Next I will move onto a neural implementation of transformers.

Features

Transformer Decoder-Only Architecture: Implements a transformer model focusing solely on the decoder.
Multi-Head Attention: Multiple heads of self-attention in parallel for capturing different aspects of the input.
Feed-Forward Neural Networks: For each token, allowing complex relationships within the sequence.
Layer Normalization: Applied before attention and feed-forward layers for stability.
Dropout Regularization: Used to prevent overfitting.
Text Generation: Capability to generate new tokens based on a given context.

Installation

Clone the repository:

git clone https://github.com/your-username/transformer-decoder.git
cd transformer-decoder

Bigram note - As recommended by Anthropic

There is a seperate script which removes any attentional layers and MLP to soley focus on the bigram statistics. Which, as stated in "A Mathematical Framework for Transformer Circuits" is a good step towards pulling apart transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
practice_instances		practice_instances
utils		utils
wandb		wandb
README.md		README.md
configs.py		configs.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Decoder-Only Architecture

Overview

Features

Installation

Bigram note - As recommended by Anthropic

About

Releases

Packages

Languages

lrfreeman/transformer-NLP-base

Folders and files

Latest commit

History

Repository files navigation

Transformer Decoder-Only Architecture

Overview

Features

Installation

Bigram note - As recommended by Anthropic

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages