Name		Name	Last commit message	Last commit date
parent directory ..
L1		L1
L2		L2
L3		L3
L4		L4
L5		L5
L6		L6
README.md		README.md

README.md

Multimodal RAG: Chat with Videos

Dear learner,

Introducing Multimodal RAG: Chat with Videos, a short course made in collaboration with Intel!

This course, taught by Vasudev Lal, Principal AI Research Scientist, Intel Labs, teaches you to build an interactive system for querying video content using multimodal AI. You'll create a sophisticated question-answering system that processes, understands, and interacts with video.

You'll learn to create a Q&A system that interacts with a collection of videos. You’ll use multimodal transformer models, like the BridgeTower model, to combine visual and textual data into a unified semantic space. You will generate embeddings from text and images and store them in a vector database. Then, you'll build a RAG pipeline to retrieve relevant content and use a Large Vision-Language Model (LVLM) to generate responses.

In this course, you will make API calls to access multimodal models hosted by Prediction Guard on Intel’s cloud.

By the end, you'll have the expertise to create AI systems that can intelligently interact with video content.

Throughout the course, you'll get hands-on and build a complete multimodal RAG system that:

Processes and embeds video content (frames, transcripts, and captions)
Stores multimodal data in a vector database
Retrieves relevant video segments given text queries
Generates contextual responses using LVLMs
Maintains multi-turn conversations about video content

Whether you're looking to enhance content management systems, improve accessibility features, or push the boundaries of human-AI interaction, the techniques learned in this course will provide a solid foundation for innovation in multimodal AI applications.

Details

Create a sophisticated question-answering system that processes, understands, and interacts with complex multimodal data.
Explore the concept of multimodal semantic space and its importance in AI.
Learn the differences between traditional RAG and multimodal RAG systems, focusing on the complexities of integrating different models.

Lesson	Video	Code
Introduction	video
Interactive Demo and Multimodal RAG System Architecture	video	code
Multimodal Embeddings	video	code
Preprocessing Videos for Multimodal RAG	video	code
Multimodal Retrieval from Vector Stores	video	code
Large Vision - Language Models (LVLMs)	video	code
Multimodal RAG with Multimodal Langchain	video	code
Conclusion	video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultimodalRAGChatwithVideos

MultimodalRAGChatwithVideos

README.md

Multimodal RAG: Chat with Videos

Details

Files

MultimodalRAGChatwithVideos

Directory actions

More options

Directory actions

More options

Latest commit

History

MultimodalRAGChatwithVideos

Folders and files

parent directory

README.md

Multimodal RAG: Chat with Videos

Details