-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dateparser.search_dates OverflowError: date value out of range with Russian text and settings={'PREFER_DATES_FROM': 'past'} #1194
Comments
I'm new to open-source but I've noticed after some debugging that the error is being caused due to the following lines specifically this portion: if days[day_index] == day:
if self.settings.PREFER_DATES_FROM == "past":
steps = 7 # Too large if dateobj.month & dateobj.year are at their minimum value of 1
else:
steps = 0
else:
while days[day_index] != day:
day_index -= 1
steps += 1
delta = timedelta(days=-steps)
dateobj = dateobj + delta # This is the offending line After the if statement steps = 7 (which only runs if PREFER_DATES_FROM, hence it's behavior), if dateobj.day < 7 and dateobj.month and datetobj.year are their minimum value of 1, when the line dateobj = dateobj + delta is run, dateobj, will have a year attribute below it's allowed minimum. According to the datetime documentation this results in the Overflow Error and explains the message "date value out of range" as dateobj wants it's year attribute to be between dateobj,min and dateobj.max. Notably, all these conditions (days attribute value less than 7, months & years attribute set to 1 and days[day_index] == day) seems to only occur for the very last instance of "среду" (Wednesday) as stepping through the debugger, it seems all these conditions dont line up for the other parsed items. I'm not familiar enough with the codebase to tell why this is but I found that the steps = 7 line came in #559 |
dateparser version: 1.1.8
Python version: 3.12.0
When searching for dates in a large chunk of Russian text (see example below) with the
'PREFER_DATES_FROM': 'past'
setting, dateparser throws theOverflowError: date value out of range
error.Additional observations:
среду
- meaning Wednesday), the code works fine. Removing other chunks from the string also prevents the error from being thrown.'PREFER_DATES_FROM'
setting to 'future' or removing it altogether also prevents the error from happening.Code to reproduce (the text makes no sense as I minimized the size of string as much as I could to still be able to reproduce the error):
The text was updated successfully, but these errors were encountered: