-
Notifications
You must be signed in to change notification settings - Fork 4
Gene mutation script
Here we describe part of code for downloading gene mutations for selected tissue from CosmicDB. It consists of two scripts: getGeneMutations.py
and getMutatedFASTAseq.py
. First script fetches mutation data and FASTA sequences for a particualar gene or list of genes. Second script is used for filtering most often mutations and appling it to an original FASTA sequence.
- requirements: Python 2.7, pandas, requests
- setup COSMICDB_USER and COSMICDB_PASS environment variables
- download token has to be manually obtained from https://cancer.sanger.ac.uk/cosmic/download - it is valid for unknown period of time (48h - 14 days)
- default WORKING_DIR for mutations is ./genes/mutations/
- default WORKING_DIR for sequences is ./genes/sequences/
After downloading CSV files with samples and gene mutations, it changes FASTA sequence of gene with top 10 distinct mutations.
Usage:
python get_gene_mutation.py -g <gene Uniprot ID or file path> -t <tissue name>
python generateMutatedFASTAseq.py
CosmicDB allows download of already filtered gene mutations for specific tissue, unlike gene expressions which have to be filtered for selected tissue in VINI.
getGeneMutations
method is trying with 10 attempts to download mutations from CosmicDB. Sometimes CosmicDB randomly responds with 401(unauthorized) response code, so in that case script sleeps for 2sec and tries again with download request.
Types of mutation:
-
Substitution - missense
-
Substitution - nonsense
-
Substitution - coding silent
-
Deletion - in frame
-
Deletion - frame shift
-
Insertion - frame shift
-
Complex - deletion inframe
-
Complex - frame shift
-
Unknown