Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set proper logging levels #70

Open
TimotheeMickus opened this issue May 23, 2024 · 1 comment
Open

Set proper logging levels #70

TimotheeMickus opened this issue May 23, 2024 · 1 comment
Labels
bug Something isn't working enhancement New feature or request

Comments

@TimotheeMickus
Copy link
Collaborator

Currently logs are overwhelming and not human-readable.
Would be great to sift through the current messages and set appropriate logging levels (also remove sneaky prints that are surely still around).

@TimotheeMickus TimotheeMickus added the enhancement New feature or request label May 23, 2024
@Waino
Copy link
Collaborator

Waino commented May 27, 2024

The underlying issue is that we are not setting the logging level correctly when creating the logger. It is always set to show warning and above. Because of this, all our logging happens on warning or error levels, even if it is debug.
There is a --verbose flag, which does not interact with the log level, and doesn't really work that well.

Before the level of individual logged messages can be adjusted, we need to fix the underlying issue.

@TimotheeMickus TimotheeMickus added the bug Something isn't working label May 27, 2024
Waino added a commit that referenced this issue May 27, 2024
Due to the removal of the grad hook, MultipleOptimizer no longer has a
method step, it has been replaced with externally_managed_step which
takes information about which optimizers need to be stepped. This means
that it is no longer compatible with torch.cuda.amp.GradScaler.

While fixing this issue, the MultipleOptimizer system was also
refactored.
- MultipleOptimizer and the OpenNMT Optimizer wrapper switched places:
  MultipleOptimizer now wraps the other one, instead of the reverse.
- The OpenNMT Optimizer was renamed to SubOptimizer for clarity.
- SubOptimizer handles learning rate scheduling and grad clipping.
- MultipleOptimizer handles creation of multiple optimizers, grad scaling,
  restoring from checkpoint, backward, zero_grad, deciding which
  suboptimizers to step, and reporting.
- Each optimizer now individually controls its learning rate schedule.
  When new components with freshly initialized parameters are introduced
  by the curriculum, they now apply warmup to the LR of these
  parameters. This should improve stability.
- As each optimizer has its own learning rate, it is not obvious what to
  log in the report_training one-liner. Learning rate was removed.
  Instead, all optimizers log their learning rates. This is currently
  log spam, but will be lowered to debug in #70.

Each sub-optimizer having its own GradScaler leads to multiple backward
passes and RuntimeError. There can only be one GradScaler, which must
therefore be the responsibility of MultipleOptimizer.

Closes: #71
Waino added a commit that referenced this issue May 27, 2024
Due to the removal of the grad hook, MultipleOptimizer no longer has a
method step, it has been replaced with externally_managed_step which
takes information about which optimizers need to be stepped. This means
that it is no longer compatible with torch.cuda.amp.GradScaler.

While fixing this issue, the MultipleOptimizer system was also
refactored.
- MultipleOptimizer and the OpenNMT Optimizer wrapper switched places:
  MultipleOptimizer now wraps the other one, instead of the reverse.
- The OpenNMT Optimizer was renamed to SubOptimizer for clarity.
- SubOptimizer handles learning rate scheduling and grad clipping.
- MultipleOptimizer handles creation of multiple optimizers, grad scaling,
  restoring from checkpoint, backward, zero_grad, deciding which
  suboptimizers to step, and reporting.
- Each optimizer now individually controls its learning rate schedule.
  When new components with freshly initialized parameters are introduced
  by the curriculum, they now apply warmup to the LR of these
  parameters. This should improve stability.
- As each optimizer has its own learning rate, it is not obvious what to
  log in the report_training one-liner. Learning rate was removed.
  Instead, all optimizers log their learning rates. This is currently
  log spam, but will be lowered to debug in #70.

Each sub-optimizer having its own GradScaler leads to multiple backward
passes and RuntimeError. There can only be one GradScaler, which must
therefore be the responsibility of MultipleOptimizer.

Closes: #71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants