Condition Contrastive Alignment (CCA): Autoregressive Visual Generation Without Guided Sampling

This repo contains model weights and training/sampling PyTorch codes used in

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Huayu Chen, Hang Su, Peize Sun, Jun Zhu
Tsinghua, HKU

🔥 Update

[2024.10.16] Model weights are now released!
[2024.10.14] Code and arxiv paper are now publicly available!

🌿 Introduction

(TL;DR) We propose CCA as a finetuning technique for AR visual models so that they can generate high-quality images without CFG, cutting sampling costs by half. CCA and CFG has the same theoretical foundations and thus similar features, though CCA is inspired from LLM alignment instead of guided sampling.

Features of CCA:

High performance. CCA can vastly improve guidance-free performance of all tested AR visual models, largely removing the need for CFG. (Figure below)
Convenient to deploy. CCA does not require any additional datasets other than the one used for pretraining.
Fast to train. CCA requires only finetuning pretrained models for 1 epoch to achieve ideal performance (~1% computation of pretraining).
Consistency with LLM Alignment. CCA is theoretically foundationed on exitsing LLM alignment methods, and bridges the gap between visual-targeted guidance and language-targeted alignment, offering a unified framework for mixed-modal modeling.

Model Zoo

CCA only finetunes conditional AR visual models. Weights for pretrained VAR and LlamaGen models, as well as tokenizers, are publicly accessible in their respective repos.

If you are only interested in evaluating our CCA-finetuned models, please checkout the released ckpts below.

VAR+CCA

Base model	reso.	#params	FID (w/o CFG)	HF weights🤗
VAR-d16+CCA	256	310M	4.03	var_d16.pth
VAR-d20+CCA	256	600M	3.02	var_d20.pth
VAR-d24+CCA	256	1.0B	2.63	var_d24.pth
VAR-d30+CCA	256	2.0B	2.54	var_d30.pth
All

LlamaGen+CCA

model	reso.	#params	FID (w/o CFG)	HF weights🤗
LlamaGen-B+CCA	384	111M	7.04	c2i_B_384.pt
LlamaGen-L+CCA	384	343M	3.43	c2i_L_384.pt
LlamaGen-XL+CCA	384	775M	3.10	c2i_XL_384.pt
LlamaGen-XXL+CCA	384	1.4B	3.12	c2i_XXL_384.pt
LlamaGen-3B+CCA	384	3.0B	2.69	c2i_3B_384.pt
All

Training

Before proceed, please download the ImageNet dataset and pretrained VAR or LlamaGen models as well their respective tokenizers.

VAR Command

LlamaGen Command

Evaluation

BibTeX

If you find our project helpful, please cite

@article{chen2024CCA,
  title={Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment},
  author={Chen, Huayu and Su, Hang and Sun, Peize and Zhu, Jun},
  journal={arXiv preprint arXiv:2410.09347},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LlamaGen @ ce98ec4		LlamaGen @ ce98ec4
VAR @ cf0321a		VAR @ cf0321a
assets		assets
.gitmodules		.gitmodules
LICENSE		LICENSE
LLamaGen_sample_ddp.py		LLamaGen_sample_ddp.py
LlamaGen_finetune.py		LlamaGen_finetune.py
LlamaGen_finetune_fsdp.py		LlamaGen_finetune_fsdp.py
README.md		README.md
VAR_CCA_trainer.py		VAR_CCA_trainer.py
VAR_finetune.py		VAR_finetune.py
VAR_sample.py		VAR_sample.py
evaluation.py		evaluation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Condition Contrastive Alignment (CCA): Autoregressive Visual Generation Without Guided Sampling

🔥 Update

🌿 Introduction

Model Zoo

VAR+CCA

LlamaGen+CCA

Training

VAR Command

LlamaGen Command

Evaluation

BibTeX

About

Releases

Packages

Languages

License

thu-ml/CCA

Folders and files

Latest commit

History

Repository files navigation

Condition Contrastive Alignment (CCA): Autoregressive Visual Generation Without Guided Sampling

🔥 Update

🌿 Introduction

Model Zoo

VAR+CCA

LlamaGen+CCA

Training

VAR Command

LlamaGen Command

Evaluation

BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages