Transformers for Natural Language Processing and Computer Vision: Take Generative AI and LLMs to the next level with Hugging Face, Google Vertex AI, ChatGPT, GPT-4V, and DALL-E 3 3rd Edition

by Denis Rothman

Last updated: October 30, 2024

This repo is continually updated and upgraded.
📝 For details on updates and improvements, see the Changelog.
🚩If you see anything that doesn't run as expected, raise an issue, and we'll work on it!

Look for 🐬 to explore new bonus notebooks such as OpenAI o1's reasoning models, Midjourney's API, Google Vertex AI Gemini's API, OpenAI asynchronous batch API calls!
Look for 🎏 to explore existing notebooks for the latest model or platform releases, such as OpenAI's latest GPT-4o and GPT-4o-mini models.
Look for 🛠 to run existing notebooks with new dependency versions and platform API constraints and tweaks.

Transformers-for-NLP-and-Computer-Vision-3rd-Edition

This is the code repository for Transformers for Natural Language Processing and Computer Vision, published by Packt.

Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3

About the book

Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, applications, and various platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV).

Dive into generative vision transformers and multimodal model architectures and build applications, such as image and video-to-text classifiers. Go further by combining different models and platforms and learning about AI agent replication.

What you will learn

Learn how to pretrain and fine-tune LLMs
Learn how to work with multiple platforms, such as Hugging Face, OpenAI, and Google Vertex AI
Learn about different tokenizers and the best practices for preprocessing language data
Implement Retrieval Augmented Generation and rules bases to mitigate hallucinations
Visualize transformer model activity for deeper insights using BertViz, LIME, and SHAP
Create and implement cross-platform chained models, such as HuggingGPT
Go in-depth into vision transformers with CLIP, DALL-E 2, DALL-E 3, and GPT-4V

What Are Transformers?
Getting Started with the Architecture of the Transformer Model
Emergent vs Downstream Tasks: The Unseen Depths of Transformers
Advancements in Translations with Google Trax, Google Translate, and Gemini
Diving into Fine-Tuning through BERT
Pretraining a Transformer from Scratch through RoBERTa
The Generative AI Revolution with ChatGPT
Fine-Tuning OpenAI GPT Models
Shattering the Black Box with Interpretable Tools
Investigating the Role of Tokenizers in Shaping Transformer Models
Leveraging LLM Embeddings as an Alternative to Fine-Tuning
Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
Summarization with T5 and ChatGPT
Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
Guarding the Giants: Mitigating Risks in Large Language Models
Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
Transcending the Image-Text Boundary with Stable Diffusion
Hugging Face AutoTrain: Training Vision Models without Coding
On the Road to Functional AGI with HuggingGPT and its Peers
Beyond Human-Designed Prompts with Generative Ideation

Appendix

Appendix: Answers to the Questions

Platforms

You can run the notebooks directly from the table below:

Chapter	Colab	Kaggle	Gradient	StudioLab

Part I The Foundations of Transformer Models
Chapter 1: What are Transformers?
🛠O_1_and_Accelerators.ipynb ChatGPT_Plus_writes_and_explains_AI.ipynb
Chapter 2: Getting Started with the Architecture of the Transformer Model
🛠Multi_Head_Attention_Sub_Layer.ipynb positional_encoding.ipynb
Chapter 3: Emergent vs Downstream Tasks: the Unseen Depths of Transformers
From_training_to_emergence.ipynb Transformer_tasks_with_Hugging_Face.ipynb
Chapter 4: Advancements in Translations with Google Trax, Google Translate, and Google Bard
WMT_translations.ipynb Trax_Google_Translate.ipynb
Chapter 5: Diving into Fine-Tuning through BERT
BERT_Fine_Tuning_Sentence_Classification_GPU.ipynb
Chapter 6: Pretraining a Transformer from Scratch through RoBERTa
🎏 KantaiBERT.ipynb 🎏🛠 Customer_Support_for_X.ipynb
Part II: The Rise of Suprahuman NLP
Chapter 7: The Generative AI Revolution with ChatGPT
OpenAI_Models.ipynb OpenAI_GPT_4_Assistant.ipynb 🎏Getting_Started_GPT_4_API.ipynb(GPT-4o) 🎏GPT_4_RAG.ipynb(GPT-4o)
OpenAI Reasoning models: the o1-preview API
🐬OpenAI_Reasoning_models_o1_API.ipynb
Chapter 8: Fine-tuning OpenAI Models
Fine_tuning_OpenAI_Models.ipynb 🎏Fine_tuning_GPT_4o_mini_SQuAd.ipynb
Chapter 9: Shattering the Black Box with Interpretable tools
BertViz_Interactive.ipynb Hugging_Face_SHAP.ipynb
Chapter 10: Investigating the Role of Tokenizers in Shaping Transformer Models
Tokenizers.ipynb Sub_word_tokenizers.ipynb 🛠Exploring_tokenizers.ipynb
Chapter 11: Leveraging LLM Embeddings as an Alternative to Fine-Tuning
🛠Embedding_with_NLKT_Gensim.ipynb 🎏Question_answering_with_embeddings.ipynb 🛠Transfer_Learning_with_Ada_Embeddings.ipynb
Chapter 12: Towards Syntax-Free Semantic Role Labeling with BERT and OpenAI's ChatGPT
Semantic_Role_Labeling_GPT-4.ipynb
Chapter 13: Summarization with T5 and ChatGPT
🛠Summerizing_Text_T5.ipynb Summarizing_ChatGPT.ipynb
Chapter 14: Exploring Cutting-Edge NLP with Google Vertex AI(PaLM and🐬Gemini with gemini-1.5-flash-001
Google_Vertex_AI.ipynb 🐬Google_Vertex_AI_Gemini.ipynb
Chapter 15: Guarding the Giants: Mitigating Risks in Large Language Models<
🎏Auto_Big_bench.ipynb(GPT-4o,synchronous) 🎏Auto_Big_bench.ipynb(GPT-4o-mini,synchronous) 🐬GPT API Speed++ with Asynchronous Batch Calls! 🛠WandB_Prompts_Quickstart.ipynb Encoder_decoder_transformer.ipynb Mitigating_Generative_AI.ipynb
Part III: Generative Computer Vision: A New Way to See the World
Chapter 16: Vision Transformers in the Dawn of Revolutionary AI
ViT_CLIP.ipynb Getting_Started_DALL_E_API.ipynb 🎏GPT-4V.ipynb(GPT-4o)
Chapter 17: Transcending the Image-Text Boundary with Stable Diffusion
Stable_Diffusion_Keras.ipynb Stable__Vision_Stability_AI.ipynb Stable__Vision_Stability_AI_Animation.ipynb Text_to_video_synthesis.ipynb TimeSformer.ipynb
Stable Diffusion with Hugging Face
🐬Stable_Diffusion_Hugging_Face.ipynb
Chapter 18: Automated Vision Transformer Training
🛠Hugging_Face_AutoTrain.ipynb
Chapter 19: On the Road to Functional AGI with HuggingGPT and its Peers
Computer_Vision_Analysis.ipynb
Chapter 20: Generative AI Ideation Vertex AI, Langchain, and Stable Diffusion
Automated_Design.ipynb Midjourney_bot.ipynb 🎏Automated_Ideation.ipynb 🐬 MyMidjourney_API.ipynb

Chat with this repository's Input-Augmented GPT-4 Chatbot

Chat with my custom GPT4 bot for this repository .

You can ask questions about this repository. You can also copy the code from the notebooks into my chat GPT and ask for explanations.

This is a cutting-edge input-augmented Chatbot built on OpenAI for this GitHub repository. OpenAI requires a ChatGPT Plus subscription to explore it.

Limitations: This is an experimental chatbot. It is dedicated to this GitHub repository and does not replace the explanations provided in the book. But you can surely have some interesting educational interactions with my GPT-4 chatbot.

Raise an issue

You can create an issue We will be glad to provide support!in this repository if you encounter one in the notebooks.

Get my copy

If you feel this book is for you, get your copy today!

Know more on the Discord server

You can get more engaged on the Discord server for more latest updates and discussions in the community at Discord

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click on the link to claim your Free PDF

We also provide a PDF file that has color images of the screenshots/diagrams used in this book at ColorImages

Get to Know the Author

Denis Rothman graduated from Sorbonne University and Paris-Cité University, designing one of the first patented encoding and embedding systems and teaching at Paris-I Panthéon Sorbonne.He authored one of the first patented word encoding and AI bots/robots. He began his career delivering a Natural Language Processing (NLP) chatbot for Moët et Chandon(LVMH) and an AI tactical defense optimizer for Airbus (formerly Aerospatiale). Denis then authored an AI optimizer for IBM and luxury brands, leading to an Advanced Planning and Scheduling (APS) solution used worldwide. LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 1,286 Commits
Chapter01		Chapter01
Chapter02		Chapter02
Chapter03		Chapter03
Chapter04		Chapter04
Chapter05		Chapter05
Chapter06		Chapter06
Chapter07		Chapter07
Chapter08		Chapter08
Chapter09		Chapter09
Chapter10		Chapter10
Chapter11		Chapter11
Chapter12		Chapter12
Chapter13		Chapter13
Chapter14		Chapter14
Chapter15		Chapter15
Chapter16		Chapter16
Chapter17		Chapter17
Chapter18		Chapter18
Chapter19		Chapter19
Chapter20		Chapter20
Notebook images		Notebook images
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
Transformers_3rd_Edition.jpg		Transformers_3rd_Edition.jpg
errata.md		errata.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers for Natural Language Processing and Computer Vision: Take Generative AI and LLMs to the next level with Hugging Face, Google Vertex AI, ChatGPT, GPT-4V, and DALL-E 3 3rd Edition

Transformers-for-NLP-and-Computer-Vision-3rd-Edition

About the book

What you will learn

Table of Contents