Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Selenium 3.4+, FF 52 ESR, and add support for Python 3.4+. #152

Merged
merged 80 commits into from
Oct 9, 2017

Conversation

englehardt
Copy link
Collaborator

@englehardt englehardt commented Oct 8, 2017

This builds on top of #143. I've successfully run crawls using most of the platform features over the past 2 months, so I think this is ready to merge to master.

The following changes above that PR are included:

  1. Bump Selenium and FF version to most recent.
  2. Bugfixes in process management, in the manual test script, and in POST request processing.
  3. Added support for WebExtensions (note that our extension is still an add-on sdk extension).
  4. Updated browser prefs for FF 52.
  5. Updated bundled extensions.
  6. Disabled Tracking Protection. We need to determine how to gracefully update the tracking protection lists before we can re-enable.
  7. Completely remove the proxy from the tree.
  8. Remove webdriver self-identification protections.
  9. Some additional minor tweaks, refactorings, and PEP8 Fixes.

One known issue: Ghostery, uBlock Origin, and Disconnect will all phone home in some way when the browser first starts. uBlock Origin downloads all of the lists and Disconnect fetches the public suffix list. I have set any preferences/settings I could find to disable auto-updating, so I think these updates are inevitable. This is undesirable for two reasons: we don't want filter lists changing mid-measurement and stateless crawls will download these lists on every page visit.

We should be able to prime an otherwise bare browser profile with the filter lists for any enabled extension and load that profile in both a stateful or stateless crawl. This would also allow us to re-enable Tracking Protection and Safebrowsing, if desired. I'll create a follow-up issue once merged.

Thanks to @zackw for doing all of the heavy lifting.

zackw added 30 commits March 9, 2017 10:17
PIL is not a dependency of OpenWPM proper, so it would be inappropriate
to list it in requirements.txt, but it _is_ used by one of the tests.
Kludge a direct installation command into .travis.yml.
This should get us 90% of the way to Python 3 support.
This module was completely removed in Python 3.
This may mean install.sh now needs to install something called 'GeckoDriver',
but let's see if we can get away without it.
 * Rationalize import ordering in some files.
 * Don't run nontrivial code at module scope when invoked as __main__.
 * If jpm is not available, but the .xpi exists, don't bomb out.
Since Selenium 3.3 requires a 'geckodriver' executable in
the PATH, put <root_dir>/firefox-bin in the PATH if it exists,
and rely on PATH search to find 'firefox'.
 * Replace Adblock Plus with uBlock Origin
   (which does not need precached filter lists)
 * Replace Ghostery with Disconnect (ditto)
 * Update HTTPS Everywhere to latest version
We _might_ be down to just HTTP instrumentation problems at this point.
The behavior of `open(path, "a+")` differs between Python 2 and Python 3.
In the latter, it will try to seek to the end, and if this fails (e.g. if
`path` is a pipe) it will throw an exception.  To work around this we
have to monkey-patch selenium.webdriver.firefox.service.Service.
sqlite3 fetchall() has always returned an array of tuples even when
the query returns a single row; Python 2's sloppy cross-type comparisons
let us get away with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants