Skip to content

Commit

Permalink
Test SPDI 3 bit packing RefSeq scheme
Browse files Browse the repository at this point in the history
  • Loading branch information
mihaitodor committed Oct 5, 2023
1 parent 2ba8022 commit 223b03c
Show file tree
Hide file tree
Showing 8 changed files with 133 additions and 111 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,14 @@ jobs:
args: --extend-ignore E501,E741

- name: Run Tests
run: python -m pytest
run: ./fetch_refseq.sh && python -m pytest

# TODOChange this to dev
deploy:
name: Deploy
runs-on: ubuntu-latest

if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' && always() && contains(join(needs.*.result, ','), 'success') }}
if: ${{ contains(join(needs.*.result, ','), 'success') }}
needs: [test]

steps:
Expand All @@ -43,5 +44,5 @@ jobs:
- uses: akhileshns/heroku-deploy@v3.12.12
with:
heroku_api_key: ${{secrets.HEROKU_API_KEY}}
heroku_app_name: ${{secrets.HEROKU_APP_NAME}}
heroku_app_name: ${{secrets.HEROKU_DEV_APP_NAME}}
heroku_email: ${{secrets.HEROKU_EMAIL}}
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,9 @@ run `python3 -m pytest` from the terminal to execute them all.
Additionally, since the tests run against the Mongo DB database, if you need to update the test data in this repo, you
can run `OVERWRITE_TEST_EXPECTED_DATA=true python3 -m pytest` from the terminal and then create a pull request with the
changes.

## Development environment on Heroku

Pull requests will trigger a deployment to this environment automatically which is accessible at the following URL:

https://fhir-gen-ops-dev-ca42373833b6.herokuapp.com/
4 changes: 0 additions & 4 deletions app/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,9 @@
import flask
from flask_cors import CORS
import os
# from .refseq import download_refseq_files


def create_app():
# First ensure we have the refseq files locally
# download_refseq_files()

# App and API
options = {
'swagger_url': '/',
Expand Down
55 changes: 46 additions & 9 deletions app/api_spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ paths:
type: boolean
default: false
description: Include sequence phase relationships in response if set to true.

/subject-operations/genotype-operations/$find-subject-specific-variants:
get:
description: >
Expand Down Expand Up @@ -177,7 +177,7 @@ paths:
- "germline"
- "somatic"
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.

/subject-operations/genotype-operations/$find-subject-structural-intersecting-variants:
get:
description: >
Expand Down Expand Up @@ -262,7 +262,7 @@ paths:
type: boolean
default: false
description: Include variants in response if set to true.

/subject-operations/genotype-operations/$find-subject-structural-subsuming-variants:
get:
description: >
Expand Down Expand Up @@ -346,7 +346,7 @@ paths:
type: boolean
default: false
description: Include variants in response if set to true.

/subject-operations/genotype-operations/$find-subject-haplotypes:
get:
description: >
Expand Down Expand Up @@ -422,7 +422,7 @@ paths:
- "germline"
- "somatic"
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.

/subject-operations/genotype-operations/$find-subject-specific-haplotypes:
get:
description: >
Expand Down Expand Up @@ -497,7 +497,7 @@ paths:
- "germline"
- "somatic"
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.

/subject-operations/phenotype-operations/$find-subject-tx-implications:
get:
description: |-
Expand Down Expand Up @@ -614,7 +614,7 @@ paths:
- "germline"
- "somatic"
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.

/subject-operations/phenotype-operations/$find-subject-dx-implications:
get:
description: |-
Expand Down Expand Up @@ -713,7 +713,7 @@ paths:
- "germline"
- "somatic"
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.

/subject-operations/metadata-operations/$find-study-metadata:
get:
description: |-
Expand Down Expand Up @@ -1147,6 +1147,7 @@ paths:
type: string
pattern: '^\s*[Nn][Pp]_\d{4,10}(\.)?(\d{1,2})?\s*$'
example: "NP_000005.3"

/utilities/find-the-gene:
get:
summary: "Find The Gene"
Expand All @@ -1170,6 +1171,42 @@ paths:
pattern: '^\s*[Nn][Cc]_\d{4,10}(\.)(\d{1,2}):\d{1,10}-\d{1,10}\s*$'
example: "NC_000001.11:11794399-11794400"

/utilities/seqfetcher/1/sequence/{ref_seq}:
get:
summary: "Seqfetcher"
operationId: "app.utilities_endpoints.seqfetcher"
tags:
- "Seqfetcher Utility"
responses:
'200':
description: "Returns RefSeq subsequence"
content:
text/plain:
schema:
type: string
parameters:
- name: ref_seq
in: path
required: true
description: RefSeq
schema:
type: string
example: "NC_000001.10"
- name: start
in: query
required: true
description: Subsequence start index
schema:
type: integer
example: 1
- name: end
in: query
required: true
description: Subsequence end index
schema:
type: integer
example: 2

tags:
- name: Subject Genotype Operations
- name: Subject Phenotype Operations
Expand All @@ -1178,7 +1215,7 @@ tags:
- name: Population Phenotype Operations
- name: Feature Coordinates Utility
description: This utility returns genomic feature coordinates and other annotations. All data are from <a href="https://www.ncbi.nlm.nih.gov/genome/guide/human/">NCBI Human Genome Resources</a>. For chromosomes, build 37 and build 38 reference sequences are returned. For genes, genomic coordinates are returned, along with a list of transcripts. MANE transcript is flagged. For transcripts, genomic coordinates are returned, along with the gene name and composite exons, along with exon coordinates. For proteins, the corresponding transcript is returned.

- name: Find The Gene Utility
description: This utility returns all genes that intersect with a provided genomic region.

28 changes: 0 additions & 28 deletions app/refseq.py

This file was deleted.

12 changes: 11 additions & 1 deletion app/utilities_endpoints.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from flask import abort, jsonify
from collections import OrderedDict
from app import common
from utilities import SPDI_Normalization


def get_feature_coordinates(
Expand Down Expand Up @@ -133,7 +134,8 @@ def get_feature_coordinates(
protein = protein.split('.')[0]

try:
result = common.proteins_data.aggregate([{"$match": {"proteinRefSeq": {'$regex': ".*"+str(protein).replace('*', r'\*')+".*"}}}])
result = common.proteins_data.aggregate(
[{"$match": {"proteinRefSeq": {'$regex': ".*"+str(protein).replace('*', r'\*')+".*"}}}])
result = list(result)
except Exception as e:
print(f"DEBUG: Error({e}) under get_feature_coordinates(protein={protein})")
Expand Down Expand Up @@ -189,3 +191,11 @@ def find_the_gene(range=None):
output.append(ord_dict)

return (jsonify(output))


def seqfetcher(ref_seq, start, end):
try:
subseq = SPDI_Normalization.get_ref_seq_subseq('GRCh37', ref_seq, start, end)
except Exception:
subseq = SPDI_Normalization.get_ref_seq_subseq('GRCh38', ref_seq, start, end)
return f'>{ref_seq}:{start}-{end} Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly\n{subseq}\n\n'
8 changes: 4 additions & 4 deletions fetch_refseq.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ cd ./refseq

echo "Downloading refseq files..."

curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh37seq.tar.gz
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh38seq.tar.gz
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh37_refseq.tar.gz
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh38_refseq.tar.gz

echo "Extracting refseq files..."

tar -xzf GRCh37seq.tar.gz
tar -xzf GRCh38seq.tar.gz
tar -xzf GRCh37_refseq.tar.gz
tar -xzf GRCh38_refseq.tar.gz

echo "Finished extracting refseq files."
Loading

0 comments on commit 223b03c

Please sign in to comment.