Set proper logging levels #70

TimotheeMickus · 2024-05-23T06:31:31Z

Currently logs are overwhelming and not human-readable.
Would be great to sift through the current messages and set appropriate logging levels (also remove sneaky prints that are surely still around).

Waino · 2024-05-27T11:08:19Z

The underlying issue is that we are not setting the logging level correctly when creating the logger. It is always set to show warning and above. Because of this, all our logging happens on warning or error levels, even if it is debug.
There is a --verbose flag, which does not interact with the log level, and doesn't really work that well.

Before the level of individual logged messages can be adjusted, we need to fix the underlying issue.

Due to the removal of the grad hook, MultipleOptimizer no longer has a method step, it has been replaced with externally_managed_step which takes information about which optimizers need to be stepped. This means that it is no longer compatible with torch.cuda.amp.GradScaler. While fixing this issue, the MultipleOptimizer system was also refactored. - MultipleOptimizer and the OpenNMT Optimizer wrapper switched places: MultipleOptimizer now wraps the other one, instead of the reverse. - The OpenNMT Optimizer was renamed to SubOptimizer for clarity. - SubOptimizer handles learning rate scheduling and grad clipping. - MultipleOptimizer handles creation of multiple optimizers, grad scaling, restoring from checkpoint, backward, zero_grad, deciding which suboptimizers to step, and reporting. - Each optimizer now individually controls its learning rate schedule. When new components with freshly initialized parameters are introduced by the curriculum, they now apply warmup to the LR of these parameters. This should improve stability. - As each optimizer has its own learning rate, it is not obvious what to log in the report_training one-liner. Learning rate was removed. Instead, all optimizers log their learning rates. This is currently log spam, but will be lowered to debug in #70. Each sub-optimizer having its own GradScaler leads to multiple backward passes and RuntimeError. There can only be one GradScaler, which must therefore be the responsibility of MultipleOptimizer. Closes: #71

TimotheeMickus added the enhancement New feature or request label May 23, 2024

TimotheeMickus added the bug Something isn't working label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set proper logging levels #70

Set proper logging levels #70

TimotheeMickus commented May 23, 2024

Waino commented May 27, 2024

Set proper logging levels #70

Set proper logging levels #70

Comments

TimotheeMickus commented May 23, 2024

Waino commented May 27, 2024