PyTorchVideo Version 0.1.3

Latest

Latest

kalyanvasudev released this 10 Sep 17:02

This release includes several significant new features, bug fixes and tutorials
New additions include the following,

Models

Multi-Scale Vision Transformers (MViT) along with it's model builders and associated pre-trained models in the model zoo.
MViT is a new state of the art vision transformer model that beats the existing baselines all while requiring lesser compute resources.
Audio Visual SlowFast Model - This would enable you to work with Audio and Video Modalities simultaneously
Video action detection Resnet model and associated pre-trained models in the model zoo.

Transform

Simply, adding AugMix and MixUp transforms to your existing training recipes should boost your models baseline accuracy.

Datasets

Ava dataset and it's associated benchmarks and tutorials

Assets 2