DesCap: Descriptive Captioning

Nice to open Descriptive Captioning module.

Descriptive Captioning means, you can get sentence not only the object focused but also adjective, adverb. The starting point of our technique is BLIP(Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation) The upgraded point from original BLIP is the preprocessing part especially preserving the pixel and spatial information

We also open the pretrained model trained with MS COCO train dataset. Two kinds of traine models, and each one means trained from the empty model, and another is trained from the BLIP pretrained. Although the necessary size of image in the model is 384, but users do not have to control or change the image size.

Approach

Inference demo:

Run our interactive demo using Colab notebook (no GPU needed). The demo includes code for:

Trained checkpoints:

data	Trained Model	image size
MS COCO 2017 (Trained)	Download	384
CC3M (Conceptual Captions)	Download	384
Visual Genome	Download	384
SBU Captions	Download	384

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DesCap: Descriptive Captioning

Approach

Inference demo:

Trained checkpoints:

About

Releases

Packages

dreamyou070/DesCap

Folders and files

Latest commit

History

Repository files navigation

DesCap: Descriptive Captioning

Approach

Inference demo:

Trained checkpoints:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages