Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throughput logger #798

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

galrotem
Copy link
Contributor

Summary:
Introduce throughput logger.

Internal

Context

The stack adds a throughput logger that can be used to log generic throughput per second, based on user config.

This diff will add the throughput logger including logging per step. The next diff will add throughput on an epoch granularity.

This diff

Adds throughput logger:

  1. It uses the already collected iteration time and data wait time timers to get the step time.
  2. It's slightly confusing but when on_train_step_end is called, the iteration time timer hasn't been populated yet, while the data wait time timer has been populated, hence there's a difference between the two when we are logging for (step-1). On the on_train_end both lists are fully populated so we can just use the last element safely.

Differential Revision: D56496451

@galrotem galrotem force-pushed the export-D56496451 branch 2 times, most recently from b77c415 to 33fc106 Compare April 25, 2024 20:26
galrotem and others added 2 commits April 25, 2024 13:31
Differential Revision: D56496429
Summary:
Introduce throughput logger.

Internal
# Context
The stack adds a throughput logger that can be used to log generic throughput per second, based on user config.

This diff will add the throughput logger including logging per step. The next diff will add throughput on an epoch granularity.

# This diff
Adds throughput logger:
1. It uses the already collected iteration time and data wait time timers to get the step time.
2. It's slightly confusing but when `on_train_step_end` is called, the iteration time timer hasn't been populated yet, while the data wait time timer has been populated, hence there's a difference between the two when we are logging for (step-1). On the `on_train_end` both lists are fully populated so we can just use the last element safely.

Reviewed By: JKSenthil

Differential Revision: D56496451
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants