-
Notifications
You must be signed in to change notification settings - Fork 191
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
meta device issue with float8 delayed scale
bug
Something isn't working
#654
opened Oct 25, 2024 by
weifengpy
torch.distributed.breakpoint(rank=1) hangs because of --local-ranks-filter 0
bug
Something isn't working
#652
opened Oct 25, 2024 by
weifengpy
FP8Linear saves new parameters in ckpt and I cannot load the saved ckpt
bug
Something isn't working
#651
opened Oct 24, 2024 by
goldhuang
[Config] Make FSDP New feature or request
reshard_after_forward: bool
configurable
enhancement
#644
opened Oct 22, 2024 by
awgu
What is the expected inference steps after I apply torchao in training?
question
Further information is requested
#638
opened Oct 21, 2024 by
goldhuang
DDP + Pipeline parallelism
question
Further information is requested
#636
opened Oct 20, 2024 by
prathameshtd
add H100 in CI
better_engineering
Repo code quality improvements
integration test
Adding integration tests
create a note on torchtitan official release
documentation
Improvements or additions to documentation
release_blocking
Issues that are blocking the milestone / release completion
Non-DP runs default to float32 precision
enhancement
New feature or request
#630
opened Oct 18, 2024 by
carmocca
[Triton] Implement Liger Kernels
enhancement
New feature or request
#623
opened Oct 17, 2024 by
casper-hansen
Is there way to offload training memory to DRAM (using FSDP2?) for training Llama3-8B with torchtitan?
question
Further information is requested
#620
opened Oct 15, 2024 by
0781532
Question about torch.compile has better throughput with 128-GPUs than 8-GPUs
question
Further information is requested
#619
opened Oct 15, 2024 by
dz1iang
redundant checks in Repo code quality improvements
good first issue
Good for newcomers
checkpoint.py
better_engineering
Ability to train based on epoch
enhancement
New feature or request
good first issue
Good for newcomers
#613
opened Oct 13, 2024 by
abatilo
[Compile] Understand why FSDP2 saves both SDPA out and wo in for bwd
question
Further information is requested
#610
opened Oct 11, 2024 by
awgu
why is xformers not used for attention computation?
question
Further information is requested
#608
opened Oct 9, 2024 by
jason718
Granular layer selection during Pipeline Parallelism
question
Further information is requested
#598
opened Oct 3, 2024 by
bhuvan777
Gradient norm clipping with pipeline parallelism (PP)
bug
Something isn't working
release_blocking
Issues that are blocking the milestone / release completion
Support Gemma2 in torchtitan
enhancement
New feature or request
#594
opened Oct 1, 2024 by
pansershrek
reproducable numerics for loss, weights and gradients for single node (8 GPUs)
enhancement
New feature or request
#593
opened Oct 1, 2024 by
weifengpy
Inference with the checkpoint
enhancement
New feature or request
#586
opened Sep 23, 2024 by
mathmax12
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.