Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2397 ns forest markers #2439

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

2397 ns forest markers #2439

wants to merge 8 commits into from

Conversation

dosumis
Copy link
Contributor

@dosumis dosumis commented Jul 17, 2024

Turning this into a draft PR as a place for comments and discussion

@aleixpuigb - hope that's OK.

@aleixpuigb aleixpuigb self-assigned this Jul 17, 2024
@dosumis
Copy link
Contributor Author

dosumis commented Jul 17, 2024

ERROR:root:Failed to load pattern file: ../patterns/dosdp-patterns/MarkersToCells.yaml

@aleixpuigb
Copy link
Collaborator

The Marker to cell pattern is not finished, I will leave it in draft mode while I solve the issues.

@aleixpuigb aleixpuigb marked this pull request as draft July 17, 2024 18:06
Fixed YAML syntax error
@dosumis
Copy link
Contributor Author

dosumis commented Jul 23, 2024

@hkir-dev - Could you co-ordinate with @aleixpuigb on this. High priority. We will also need these patterns in the CL-KG build. We need something very similar to BDSO.

Main issue:

  • How to efficiently co-ordinate axioms needed for term partly defined by NS-Forest markers & the marker set term. I guess we used some Python in BDSO. Might be best to have this here so we're not relying on duplicating content between TSVs.

Specific issues:

  1. We need a slot for xrefs. These should take the DOIs for notebooks detailing the analysis. Maybe we could add this as an xref on some text in the comment field on the marker set.
  2. Type marker set as "sequence_collection": "SO:0001260"
  3. Marker set needs regular var for cell type (range "CL:0000000")
  4. We have a new AP for fbetaConfidenceScore: "STATO:0001000"
  5. Patterns have the same pattern_name

Fixing dosdp schema fail
@dosumis dosumis added this to the Sprint 22/07/24 milestone Jul 23, 2024
@dosumis dosumis requested a review from hkir-dev July 23, 2024 15:39
clauses:
- text: "'has_part' some %s"
vars:
- Minimal_markers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the marker urls are generated like http://purl.obolibrary.org/obo/entrez_286133 but in the BDSO it is in the form of http://identifiers.org/ncbigene/286133 which is resolvable.

To do this we need to override the make file rules that run $(DOSDPT) generate command and provide a prefix mapping file to it: --prefixes=template_prefixes.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need an import file for these to get labels. (I think we had ensembl & NCBI for BDSO). I'm wondering whether, longer term, a more sustainable approach would be to have a separate repo for generating marker objects and make imports from that to the KG and CL.

FBeta_confidence_score: "xsd:double"

annotationProperties:
fbetaConfidenceScore: "PCL:0010062"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently STATO added a new term for F Beta Score ISA-tools/stato#86 (comment).

Should we use STATO:0000663(F-beta score) instead of fbetaConfidenceScore: "PCL:0010062"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please

@@ -0,0 +1,44 @@
pattern_name: NSForestMarkers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this pattern is running at all. I couldn't see any terms generated by this template in the definitions.owl. I couldn't see any terms in my local build as well.
Errors are:

  • missing column for pattern variable <Marker_set>
  • missing column for pattern variable <Marker_set_xref>
  • missing column for pattern variable <has_characterization_set>

@hkir-dev
Copy link
Contributor

hkir-dev commented Jul 24, 2024

How to efficiently co-ordinate axioms needed for term partly defined by NS-Forest markers & the marker set term. I guess we used some Python in BDSO. Might be best to have this here so we're not relying on duplicating content between TSVs.

Yes, in BDSO, we have a manual step before the ODK prepare_release that reads the source files and programmatically generates the template data files. I didn't notice any script modifications in the CL repo, which makes me hesitant to add these scripts as it could complicate the maintenance of the CL build pipeline. However, if the number of markers will significantly increase in the future and making programmatic management inevitable, I can propose two solutions:

  1. Create a new source file at src/patterns/data/raw/NSForestMarkersSource.tsv programmatically populate the contents of MarkersToCells.tsv and NSForestMarkers.tsv before executing the dosdp templates.
  2. Modify the makefile to include a goal similar to the existing goal to download a dosdp template from a google sheet (though this goal seems inactive currently). Similarly, we can manage markers in a separate repo and pull them at runtime.

@dillerm
Copy link

dillerm commented Jul 31, 2024

Specific issues:

1. We need a slot for xrefs.  These should take the DOIs for notebooks detailing the analysis.  Maybe we could add this as an xref on some text in the comment field on the marker set.

Is this referring to GO's hasDbXref annotation? I'm wondering if IAO's 'definition source' (IAO:0000119) might be more appropriate here.

@dosumis
Copy link
Contributor Author

dosumis commented Aug 6, 2024

This has been superseded by work on https://github.com/Cellular-Semantics/CellMark/tree/main - which will be a source of imports for for CL as well as a source a source of markers more broadly for https://github.com/Cellular-Semantics/CL_KG

@dosumis
Copy link
Contributor Author

dosumis commented Aug 6, 2024

Is this referring to GO's hasDbXref annotation? I'm wondering if IAO's 'definition source' (IAO:0000119) might be more appropriate here.

Using GO's hasDbXref aligns better with current CL standards.

@dosumis dosumis removed this from the Sprint 22/07/24 milestone Aug 6, 2024
@aleixpuigb aleixpuigb removed their assignment Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ice Box
Development

Successfully merging this pull request may close these issues.

4 participants