-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLI script #153
base: main
Are you sure you want to change the base?
Add CLI script #153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks so much for the contribution. I'm just getting over covid so I may be slow to respond, but I left some comments below.
Feel free to ask any questions you may have/rebuttal any points I made. I'm not too picky, and can be convinced otherwise if I made some opinionated points you disagree with.
from stable_diffusion_videos import StableDiffusionWalkPipeline | ||
|
||
|
||
def init_arg_parser(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're already installing fire
with the requirements of the package, maybe lets just use that instead? I can update to do this so its not a hassle for you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too familiar with fire
but I can give it a try. Tho after quickly skimming the docs, while this would considerably reduce boilerplate, I think I prefer the flexibility of argparse
. E.g. I prefer calling
python scripts/make_video.py --prompts "a cat" "a dog" --seeds 42 1337
over
python scripts/make_video.py --prompts="['a cat', 'a dog']" --seeds=[42,1337] # note that --seeds=[42, 1337] would fail!
Moreover I can feel some dirty hacking would be required to keep support for argument provision through config file using the --cfg
option, which is an important feature IMO.
Let me know what you think. If this is something you really require then I will give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I think I agree with you! will have a look when I can
if args.prompts is None: | ||
raise ValueError('no prompt provided') | ||
if args.seeds is None: | ||
args.seeds = [random.getrandbits(16) for _ in args.prompts] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using randint instead in this scenario, kinda like this though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using multiple methods for random numbers seems like a good idea the more I think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym @Atomic-Germ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdym @Atomic-Germ ?
I rescind my comment, it was a little half-baked..
|
||
# check audio arguments | ||
if args.audio_filepath is not None and args.audio_offsets is None: | ||
raise ValueError('must provide audio_offsets when providing ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes me wonder if this should just be raised in the pipeline code itself instead of the parser (if its not already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same goes for many of the other raised errors in this script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. That's a design question IMO. Do we want to raise errors to the unadvised CLI user as early as possible, while trusting that the developer who writes his owns scripts knows what they are doing? Or do we want to raise errors as close to the problematic code/as late as possible but such that it propagates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm fine with the way you did it here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason I say it though is that walk
used to be a CLI interface when I first made this repo, so it should be the fn catching all the cases...but we can do it this way for now instead, I'm not picky.
pipe = StableDiffusionWalkPipeline.from_pretrained( | ||
args.checkpoint_id, | ||
torch_dtype=torch.float16, | ||
revision="fp16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think guidance in diffusers these days is erring towards not specifying a revision. Need to check if that only applies to newest versions, etc.
Definitely hardcoding here is a no-no though.
revision="fp16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add this as an option
|
||
pipe = StableDiffusionWalkPipeline.from_pretrained( | ||
args.checkpoint_id, | ||
torch_dtype=torch.float16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hardcoding dtype here is also a no-no I'm afraid. Let's think of a nicer way to infer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has to support MPS/GPU/TPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, no tpu as you'd have to use the other pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.backends.cuda.is_available() else "cpu"
torch_dtype = torch.float32 if torch.backends.mps.is_available() else torch.float16
then use to(device) in place of to("cuda") and torch_dtype=torch_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will change that
feature_extractor=None, | ||
safety_checker=None, | ||
).to("cuda") | ||
pipe.scheduler = DPMSolverMultistepScheduler.from_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding this likely bad idea too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops that slipped my mind, these should be options too
Maybe worth noting, but the batch_size option set to anything but 1 is going to break on mps. |
Right. We could hard set |
Still haven't started working in applying the suggested changes, will do it soon |
No rush :) whenever you get to it. I appreciate your contributions ❤️ |
This PR adds a script
scripts/make_video.py
to make videos from the command line, for those like me who prefer that over notebooks, to e.g. run from a cluster node. The script takes as argument most if not all the arguments featured inREADME.md
. Help message looks like this:The user can also directly provide a YAML configuration file containing all the arguments to overwrite using
python scripts/make_video.py --cfg <config_file>
. The file should contain fields with the same name as the arguments.The script is the same whether the user wants to add audio or not. If the user wants to add audio, he should provide the
--audio_filepath
and--audio_offsets
arguments.In my opinion, this deprecates
examples/make_music_video.py
. That file seems to be broken anyway (see #150). If the purpose of that script is to serve as a code example, then the snippets inREADME.md
are currently doing a better job. If its purpose is to have a standalone script ready to run from the command line, then this PR implements that and more.Updated
README.md
with an example.