Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print titles of pages which dumpgenerator.py failed to download for MemoryError or other fatals #282

Closed
AdaptiveAlchemist opened this issue Oct 17, 2016 · 6 comments

Comments

@AdaptiveAlchemist
Copy link

Is it possible to add a function so it output's a list of pages that are missing and were not downloaded? I use this software often and i have pages missing. If I have a list of which pages failed I can download them with special:Export.

@emijrp
Copy link
Member

emijrp commented Oct 17, 2016

It is usually saved in the errors log. Can you see it?

@AdaptiveAlchemist
Copy link
Author

No. I know that. I am saying that some pages are missing from the dump and from the list of pages downloaded. The error log you are talking about is different. If i know which pages went missing when downloading the dump i can download them seperately. Thanks

@AdaptiveAlchemist
Copy link
Author

emijrp

@AdaptiveAlchemist
Copy link
Author

@emijrp see usually it probably happenes with only very large wikis. The wiki i am talking about has over thirty thousand pages.

@AdaptiveAlchemist
Copy link
Author

AdaptiveAlchemist commented Dec 7, 2016 via email

@nemobis
Copy link
Member

nemobis commented Feb 7, 2020

Hey bro. Just a question. I was actually talking about the pages that
fail because of insufficient ram.

As was explained elsewhere, the usual workaround has been to download those manually with Special:Export on a browser. The current solution is to use the API and limit the size of the responses, which is mostly solved by using the command line option --xmlrevisions: please test it!
#311
#18

@nemobis nemobis closed this as completed Feb 7, 2020
@nemobis nemobis changed the title Dumpgenerator.py to include missing pages Print titles of pages which dumpgenerator.py failed to download for MemoryError or other fatals Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants