Some samples from the dataset are presented here for demonstration purposes. Using the provided script, it is possible to load a piece of code and run the model on it to check for vulnerabilities. The script produces an output image with colored highlights corresponding to the vulnerability status of each code token.
There are examples for all relevant types of vulnerabilities:
Vulnerability | Source | Link to commit | Changed file | Example file |
---|---|---|---|---|
SQL injection | instacart/lore | protect against potential sql injection | /lore/io/connection.py | sql-1.py |
SQL injection | uktrade/export-wins-data | parameterise the sql query to avoid injection attacks | /wins/views/flat_csv.py | sql-2.py |
SQL injection | newyoming/onewyoming | fix sql injection vulnerability | /experimental/python/buford/model/visitor.py | sql-3.py |
XSS | AMfalme/Horizon_Openstack | Remove dangerous safestring declaration | /openstack_dashboard/dashboards/identity/mappings/tables.py | xss-1.py |
XSS | omirajkar/bench_frappe | fix(blog): Fix possible reflected XSS attack vector | /frappe/website/doctype/blog_post/blog_post.py | xss-2.py |
XSS | Technikradio/C3FOCSite | fix: XSS bug in now exposed user forms | /c3shop/frontpage/management/reservation_actions.py | xss-3.py |
Commmand injection | dgabbe/os-x | For sudo commands, removed 'sudo' from read commands. Added shlex call to prevent injection attacks. | /os-x-config/standard_tweaks/install_mac_tweaks.py | command_injection-1.py |
Commmand injection | Atticuss/ajar | fixed command injection issue | /ajar.py | command_injection-2.py |
Commmand injection | yasong/netzob | Remove security issue related to shell command injection | /src/netzob/Simulator/Channels/RawEthernetClient.py | command_injection-3.py |
XSRF | deepnote/notebook | add xsrf checks on files endpoints | /notebook/files/handlers.py | xsrf-1.py |
XSRF | wbrxcorp/forgetthespiltmilk | xsrf token handling corrected | /frontend/app.py | xsrf-2.py |
XSRF | tricycle/lesswrong | Implement proper modhash checking to stop xsrf | r2/r2/models/account.py | xsrf-3.py |
Path disclosure | fkmclane/python-fooster-web | prevent local file disclosure via url encoded nonormalized paths | /fooster/web/file.py | path_disclosure-1.py |
Path disclosure | zms-publishing/zms4 | applied fix for disclosure of physical-path in zmi | /ZMSItem.py | path_disclosure-2.py |
Path disclosure | zcutlip/pyweb | more path checking in pyweb-add-conent | pyweb_add_content.py | path_disclosure-3.py |
Open redirect | karambir/mozilla-django-oidc | This uses Django's is_safe_url to sanitize the next url for the authentication view. This prevents open redirects. | /mozilla_django_oidc/views.py | open_redirect-1.py |
Open redirect | nyaadevs/nyaa | Fix open redirect | /nyaa/views/account.py | open_redirect-2.py |
Open redirect | richgieg/flask-now | Sanitize the login view's "next" redirect URL | /app/auth/views.py | open_redirect-3.py |
Remote code execution | Internet-of-People/titania-os | #21 Unauthenticated remote root code execution | /vuedj/configtitania/views.py | remote_code_execution-1.py |
Remote code execution | Scout24/monitoring-config-generator | Prevent remote code execution | /src/main/python/monitoring_config_generator/yaml_tools/readers.py | remote_code_execution-2.py |
Remote code execution | pipermerriam/flex | Fix remote code execution issue with yaml.load | flex/core.py | remote_code_execution-3.py |
To try out one of the examples, simply execute:
python3 62Demonstrate.py sql 1
The first parameter specifies the type of vulnerability. It should be either "sql","xss","command_injection","xsrf","path_disclosure","open_redirect" or "remote_code_execution".
The second paramter should be the number of the example between 1 and 3.
An optional third parameter "fine" increases the resolution of colors.
The script puts output to the screen that already highlights the vulnerable parts in red and the (probably) non-vulnerable parts in green, but the detailed outcome is printed as an image file. Refer to that one for a closer look at the predicions. It highlights which parts might be vulnerable according to the following color chart:
The following example was created using the fine resolution with the first example for the vulnerability path disclosure.
python3 demonstrate.py remote_code_execution 1 fine
Alternatively, the script demonstrate_sourcecode.py can be used to ignore the dataset and directly load the example source code files. The outcome is essentially the same.
python3 demonstrate_sourcecode.py sql 3 fine
There is also the possibility to take the labeling in the dataset into account. In this case, the skript can color false positives in a different color than true positives, and false negatives in a different color than true negatives. For this purpose, the script is used. It takes the same parameters as the previous one and also saves its result as a png file.
The following example shows the prediction for the first example file for the open redirect vulnerability, using labels for coloring:
python3 demonstrate_labeled.py open_redirect 1 fine
You can see that both parts in the file that are vulnerable were recognized (light blue color), and most of the rest was correctly identified as not vulnerable (dark green), with some irregularities around the "edges" of the vulnerable code parts.