check user-specified model_max_len with hf derived max_model_len #1778
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This PR aims to improve the handling of context length in the
ModelConfig
class. It addresses potential issues when users specify a context length exceeding the model's derived maximum, enhances error messaging, and introduces a controlled override mechanism. These changes will prevent decoding nan errors, which I have met recently.Modifications
The
ModelConfig
class inmodel_config.py
has been updated to include a check comparing user-specifiedcontext_length
against the model'sderived_context_len
. It now handles theSGLANG_ALLOW_LONG_MAX_MODEL_LEN
environment variable, allowing controlled overrides of maximum context length. The changes include logic to raise aValueError
or print a warning based on the specified length and override settings, with improved error and warning messages for clear user guidance.Checklist