-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Selenium 3.4+, FF 52 ESR, and add support for Python 3.4+. #152
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PIL is not a dependency of OpenWPM proper, so it would be inappropriate to list it in requirements.txt, but it _is_ used by one of the tests. Kludge a direct installation command into .travis.yml.
This should get us 90% of the way to Python 3 support.
This module was completely removed in Python 3.
This may mean install.sh now needs to install something called 'GeckoDriver', but let's see if we can get away without it.
Since Selenium 3.3 requires a 'geckodriver' executable in the PATH, put <root_dir>/firefox-bin in the PATH if it exists, and rely on PATH search to find 'firefox'.
* Replace Adblock Plus with uBlock Origin (which does not need precached filter lists) * Replace Ghostery with Disconnect (ditto) * Update HTTPS Everywhere to latest version
We _might_ be down to just HTTP instrumentation problems at this point.
The behavior of `open(path, "a+")` differs between Python 2 and Python 3. In the latter, it will try to seek to the end, and if this fails (e.g. if `path` is a pipe) it will throw an exception. To work around this we have to monkey-patch selenium.webdriver.firefox.service.Service.
sqlite3 fetchall() has always returned an array of tuples even when the query returns a single row; Python 2's sloppy cross-type comparisons let us get away with it.
Selenium 3 + geckodriver don't (currently) self-identify in the DOM
OpenWPM requires a specific version. Instead, let's suggest the user run the install script.
The multiprocess library used `dill` where multiprocessing would normally use `pickle`. The main benefit is flexibility in the types of arguments we can pass to `run_custom_function`.
This was referenced Oct 9, 2017
This was referenced Aug 9, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This builds on top of #143. I've successfully run crawls using most of the platform features over the past 2 months, so I think this is ready to merge to master.
The following changes above that PR are included:
One known issue: Ghostery, uBlock Origin, and Disconnect will all phone home in some way when the browser first starts. uBlock Origin downloads all of the lists and Disconnect fetches the public suffix list. I have set any preferences/settings I could find to disable auto-updating, so I think these updates are inevitable. This is undesirable for two reasons: we don't want filter lists changing mid-measurement and stateless crawls will download these lists on every page visit.
We should be able to prime an otherwise bare browser profile with the filter lists for any enabled extension and load that profile in both a stateful or stateless crawl. This would also allow us to re-enable Tracking Protection and Safebrowsing, if desired. I'll create a follow-up issue once merged.
Thanks to @zackw for doing all of the heavy lifting.