Skip to content

Gene mutation script

Josip edited this page Jan 13, 2021 · 16 revisions

This script is downloading gene mutations for selected tissue from CosmicDB. After downloading CSV files with samples and gene mutations, it changes FASTA sequence of gene with top 10 distinct mutations.

CosmicDB allows download of already filtered gene mutations for specific tissue, unlike gene expressions which have to be filtered for selected tissue in VINI.

getGeneMutations method is trying with 10 attempts to download mutations from CosmicDB. Sometimes CosmicDB randomly responds with 401(unauthorized) response code, so in that case script sleeps for 2sec and tries again with download request.

Working directory for saving mutations is ./genes/mutations/

Working directory for saving FASTA sequences is ./genes/sequences/

Types of mutation:

  • Substitution - missense

  • Substitution - nonsense

  • Substitution - coding silent

  • Deletion - in frame

  • Deletion - frame shift

  • Insertion - frame shift

  • Complex - deletion inframe

  • Complex - frame shift

  • Unknown

python get_gene_mutation.py -g <gene Uniprot ID or file path> -t <tissue name> python generateMutatedFASTAseq.py