Mat cleanup #1220

akaviaLab · 2022-05-10T13:20:46Z

description of feature/fix
This does some minor changes to SBML, mostly that annotations (and others) will be loaded as lists. It also changes the example models a bit. The goal is to make sure that XML that is written to MATLAB, then reread will be identical to the original SBML loaded.
Fixes some of the FIXMEs in sbml.py saying "Should be al list"
tests added/passed
add an entry to the next release

changes to sbml.py to read annotations as list improved validate.py so it won't fail with new files modified annotation.xml, e_coli_core.xml to have more correct XML files and less errors compared to mat file modified true values in test_annotation.py and test_sbml.py

got sbml to add the SBO term for subsystem updated wrong_key_caps.mat file and the associated test

minor tidying of sbml.py, test_mat.py some tidying and flake8 correction of test_sbml.py

…to MATLAB Fixed some issues in mat.py Made SBML read groups into subsystem. Made SBML have group annotations as list

SBML will set undefined formula and model name as None

akaviaLab · 2022-05-10T13:42:09Z

@cdiener - This is the updates we were discussing on gitter.
One question - looking at TestCobraIO, there are some that are expected to fail - if I comment the xfail section out, the only one that fails is "fbc1". Would you like me to modify these sections? I'm not sure why they exist or why is it supposed to fail.

cdiener · 2022-05-11T18:04:53Z

Hi, thanks for the fixes. One comment before I start reviewing. For annotations, both the {"provider": "value"} and {"provider": ["value", ...]} formats are valid and for single values we probably even prefer the first version since it's a bit more concise. So there is no need to change every single-value annotation to the second form. It's okay if the mat reader returns {"provider": ["value"]} annotations for now. I guess you changed it to check for equality of models?

akaviaLab · 2022-05-12T16:30:09Z

Yes, I changed it for that reason.
Also, I don't like where you have a situation that a variable is sometimes a list, and sometimes a str - every function that tries to access this variable will have to check str/list and process accordingly.

akaviaLab · 2022-05-12T16:38:25Z

Might become irrelevant if we update the annotations, but we can decide for now.

akaviaLab · 2022-05-24T15:13:58Z

@cdiener Ping. Can you please review this?

cdiener · 2022-06-06T18:23:10Z

I'm still somewhat reluctant to change the test models' annotation format here mostly because the idea is to have the test models manually vetted and than adapt the code base to read them well. Tests and/or the code base still has to be able to recognize that {"provider": "annotation"} is the same as {"provider": ["annotation"]}.

akaviaLab · 2022-06-07T00:54:18Z

Okay. We can discuss this in detail, but there are several changes to the annotation files (which perhaps should be in a separate PR)

Removing some spaces which were confusing Matlab/XML comparison and don't have any other effects
Converting ncbigi, which is not a valid identifiers.org identifier to ncbiprotein, which is.
Replacing some wrong annoation, using bigg.model annoation for metabolite for some reactions instead of the actual bigg.models reaction annoation
4)Removing unused compartments from the notes.xml file, since Matlab doesn't make compartments that are unused. This led to confusion when comparing XML and Matlab. I can change it back, but since I made the original notes.xml file, I thought it wouldn't matter that much.
Should I move the compare_models to test? If so, can you answer my questions there (add compare.py which contains functions for comparing. #1206) please?

Are these the changes you meant?

cdiener · 2022-06-10T03:11:21Z

All of those cases are fine with me. I would just revert the conversions from single annotations to list. That has to be fixed in the tests that should accept the equivalence I outlined in the previous comment.

Regarding the comparison it sounded like @Midnighter had some opinion on that so that's why I was waiting on more feedback from him.

akaviaLab · 2022-06-10T20:56:33Z

I'm a bit confused about the list for single annotations.
Right now, before my changes, if there is one annotation for any key, it builds it as a string. If another is added, it converts it to a list. Therefore, every function that accesses annotations has to check if it is a list or a string.
That doesn't make a lot of sense to me - it makes sense to always have it as a list, so the behavior is constant. What is the argument against doing that?
If the annotations are always a list, I don't understand what you mean about the tests.
In my revision of the metadata #1225, I think I made the code so that annotations will be constructed as a list even if given a string, so annotation read will be equal to both list and string as test cases. I don't see how to do it here.

cdiener · 2022-06-11T03:54:41Z

Because by changing the test models to be more convenient for you, you don't change the fact that annotations are still allowed to be strings or lists for single elements. You just remove examples where they are strings from the test suite which then covers less data formats. We need to remain backwards compatible with old models in JSON or YAML format and therefore that case still has to remain supported and tested until we change it to a new annotation format. But like I said it also comes down to only modifying the test models if they are actually wrong and not if it's more convenient to write tests.

cdiener · 2022-06-11T04:16:00Z

To be clear my comments are only aimed at the JSON test models you are changing. In the mat parsing you are free to return them as you wish. Also why are you changing almost all other test models in this PR (like pickle and SBML ones as well). What do those have to do with the Matlab interface?

akaviaLab · 2022-06-11T12:28:53Z

There are a lot of changes, and things got mixed. Let me specify them, and mark which ones make less sense to be in this PR (although I think some of them are needed in another PR, and some make sense to be in the same PR). Can you please comment on each one so that we are on the same page? c14b306 Changes in mat.py (this PR) modify tests/data/e_coli_core.xml - ncbigi to ncbiprotein, since ncbigi is not a valid identifiers.org prefix (Other PR) - removed excess spaces (shouldn't have done that, should fix code to ignore them) - changed invalide identifiers like http://identifiers.org/bigg.reaction/ac to http://identifiers.org/bigg.reaction/EX_ac_e - ??. Maybe I shouldn't have done this at all? annotations.xml - change invalid ncbgi to ncbiprotein (other PR) textbook.xml.gz - basically same changes as e_coli_core.xml, will process according to decision there sbml.py - read annotations into list (other PR or not at all) I saw line 1805 in sbml.py which said # FIXME: always in list, and thought it would be a good idea validate.py - fix according to changes in sbml.py test_annotation.py, test_sbml.py - changed according to sbml.py update_pickles.py (necessary because of changes in xml files, but should be moved to other PR) a9d7f18 make sbml.py subsystem reading add partonomy, which matches the definition of SBO:0000633 (see https://sourceforge.net/p/sbo/term-request/113/) - other PR update_pickles.py changes in test_mat.py to match the reading of CHEBI (this PR) a77f773 mat.py - some tidying (this PR) sbml.py - unify two separate ifs (other PR) test_sbml.py - unify implicitly concatenated strings (other PR). Doesn't change behavior, just code 219360b tests/data/example_notes.xml - remove excess compartments, since excess compartments are read into SBML, but not MAT. Not sure if this change should happen, but since I made example_notes.xml I thought it would be fine src/cobra/io/sbml.py - read group back into subsystem to keep read/write consistent, also since mat doesn't understand groups, only subsystem (not sure where this should be, if at all) - modify SBO to be in list (like line 1805, above, will modify according to that) mat.py - some changes (this PR) 307ec41 mat.py - correctly read back subsystems into groups from mat (this PR) sbml.py - set empty values to None instead of '' (other PR?). This makes sense to me since C++/SBML and Python seem to disagree on what is empty. SBML does '', while Python does None. 77a1359 update_pickles.py and black - needed, but need to decide on other changes f6244f2 black on test_annotation.py - needed, but will decide on all other changes One reason these are all the same PR is that some of them would require a lot of code if they were separate. For example, if SBML should read things in lists (according to the FIXME comment) and it is a separate PR, then Mat would have to have excess code to deal with things that aren't lists. And then when/if SBML becomes list, mat.py would need to remove excess code.

…

On Sat, Jun 11, 2022 at 12:16 AM Christian Diener ***@***.***> wrote: To be clear my comments are *only* aimed at the JSON test models you are changing. In the mat parsing you are free to return them as you wish. Also why are you changing almost all other test models in this PR (like pickle and SBML ones as well). What do those have to do with the Matlab interface? — Reply to this email directly, view it on GitHub <#1220 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQYYZVSSR2IPV3OL6A2SUDVOQHIXANCNFSM5VRSJCIA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

codecov-commenter · 2022-06-19T22:59:21Z

Codecov Report

Merging #1220 (ee4842b) into devel (e838e54) will decrease coverage by 0.25%.
The diff coverage is 66.25%.

@@            Coverage Diff             @@
##            devel    #1220      +/-   ##
==========================================
- Coverage   84.59%   84.33%   -0.26%     
==========================================
  Files          66       66              
  Lines        5491     5509      +18     
  Branches     1264     1268       +4     
==========================================
+ Hits         4645     4646       +1     
- Misses        545      557      +12     
- Partials      301      306       +5

Impacted Files	Coverage Δ
src/cobra/io/mat.py	`79.37% <63.76%> (-2.92%)`	⬇️
src/cobra/io/sbml.py	`80.14% <81.81%> (-0.32%)`	⬇️
src/cobra/flux_analysis/__init__.py	`100.00% <0.00%> (ø)`
src/cobra/flux_analysis/deletion.py	`89.47% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e838e54...ee4842b. Read the comment docs.

cdiener · 2022-06-27T20:14:21Z

Okay, I'm onboard with everything except for the following which I'd like to discuss:

read annotations into list (other PR or not at all)

In general that make sense, however that would be an API breaking change so we would need to release a new version for that and it would not be backwards compatible. Since there is another PR lined up that will add new annotation formatting and will need a new version as well I would skip it in favour of the FBC3 PR.

read group back into subsystem to keep read/write consistent, also since
mat doesn't understand groups, only subsystem (not sure where this should
be, if at all)

Subsystems are deprecated in favor of groups and should only read in the formats that still have them like legacy SBML and Matlab. So I would not convert them here. This info would be dropped in my book since it's data in groups that the Matlab format does not support. I get that this breaks roundtripping and other may think different. My worry would be what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act?

And the unmentioned changes in mini.json. If mini.json passes the JSON schema validator in this version, it should not be changed.

akaviaLab · 2022-06-28T00:47:56Z

Other PR for everything else that isn't mat, or this PR?
Like the changes in annotation to make it more identifiers compliant?
The list change will be saved for the appropriate metadata PR.

akaviaLab · 2022-06-28T00:53:16Z

what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act?

The groups are only exported for reactions. If reactions have the attribute "subsystem" and groups, "subsystem" will win and be exported, "groups" will be dropped.
If there is a reaction that belongs to a group, that reactions is exported with subsystem equal to the group id/name. Any other entities in the group are not exported.

So if you have something like
Group1 = [rxn1, rxn2, Group2]
Group2 = [rxn3, met1]

The subsystems when exported would look like
rxn1 - Group1
rxn2 - Group1
rxn3 - Group2

cdiener · 2022-06-28T14:55:23Z

I think in general it's easier to review several smaller Pars than a huge one. However, since we already discussed a lot of the things here let's just continue as it is. If you want just use a more descriptive title.

Sorry I misunderstood the subsystem thing. Your solution makes sense.

akaviaLab · 2022-06-28T15:09:03Z

Okay. So I think everything discussed is done.
The annotations into list should be in the metadata PR.

cdiener

Looks good to me. Some minor comments and a doubt about duplicated models files.

cdiener · 2022-07-01T18:04:55Z

src/cobra/data/mini.json

-                "ncbigi": [
-                    "GI:1208453",
-                    "GI:1652654"
+                "ncbiprotein": [


What's the difference between the mini.json in src/cobra/data and the one in tests/data. Couldn't the tests just use the first one?

None.
The tests are using the one in tests, because that's how the functions are designed. The ones in src/cobra/data are example files for users installing cobra without development and without tests.
If we want to have tests rely on the files in src/cobra/data we can rewrite tests and update_pickles.py, which I'd be happy to do in a different PR.

That sounds good to me 👍

src/cobra/io/mat.py

cdiener · 2022-07-01T18:09:57Z

src/cobra/io/mat.py

@@ -856,7 +903,9 @@ def from_mat_struct(
        rxn_group_names = set(rxn_subsystems).difference({None})
        new_groups = []
        for g_name in sorted(rxn_group_names):
-            group_members = model.reactions.query(lambda x: x.subsystem == g_name)
+            group_members = model.reactions.query(
+                lambda x, g_n=g_name: x.subsystem == g_n


I don't think it makes sense to define default values for a lambda function. The previous one looks better to me.

Okay. My linter complained about it. Reverted.

cdiener · 2022-07-01T18:11:25Z

src/cobra/io/sbml.py

@@ -598,7 +598,7 @@ def _sbml_to_model(
    if not libsbml.SyntaxChecker.isValidSBMLSId(model_id):
        LOGGER.error(f"'{model_id}' is not a valid SBML 'SId'.")
    cobra_model = Model(model_id)
-    cobra_model.name = model.getName()
+    cobra_model.name = model.getName() or None


Above you marked SBML returning a '' as model name by default as TODO but it seems like this fixes that, right?

Yes. This does fix it. I can remove the TODO.

Removed TODO

cdiener

Looks good. I'll merge after the typo is fixed.

release-notes/next-release.md

cdiener · 2022-07-05T20:41:57Z

Awesome, thanks so much for your patience!

uri.akavia added 6 commits May 10, 2022 09:14

updated pickles

a9d7f18

got sbml to add the SBO term for subsystem updated wrong_key_caps.mat file and the associated test

tidying of mat.py mostly from_mat_struct()

a77f773

minor tidying of sbml.py, test_mat.py some tidying and flake8 correction of test_sbml.py

modified example_notes.xml to have no difference when saving/reading …

219360b

…to MATLAB Fixed some issues in mat.py Made SBML read groups into subsystem. Made SBML have group annotations as list

subsystems should have group name if availabe

307ec41

SBML will set undefined formula and model name as None

black and updated pickles

77a1359

forgot black on one file

f6244f2

fix gitignore

ffd375a

uri.akavia added 2 commits June 28, 2022 10:50

reverted list changes as discussed

f47a655

changes in annotation of files

878486e

cdiener reviewed Jul 1, 2022

View reviewed changes

corrected according to comments

c9c84d0

cdiener requested changes Jul 1, 2022

View reviewed changes

release-notes/next-release.md Outdated Show resolved Hide resolved

fixed typo

ee4842b

cdiener approved these changes Jul 5, 2022

View reviewed changes

cdiener merged commit 42af7e9 into opencobra:devel Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mat cleanup #1220

Mat cleanup #1220

akaviaLab commented May 10, 2022 •

edited by cdiener

Loading

akaviaLab commented May 10, 2022

cdiener commented May 11, 2022

akaviaLab commented May 12, 2022

akaviaLab commented May 12, 2022

akaviaLab commented May 24, 2022

cdiener commented Jun 6, 2022

akaviaLab commented Jun 7, 2022

cdiener commented Jun 10, 2022

akaviaLab commented Jun 10, 2022

cdiener commented Jun 11, 2022

cdiener commented Jun 11, 2022

akaviaLab commented Jun 11, 2022 via email

codecov-commenter commented Jun 19, 2022 •

edited

Loading

cdiener commented Jun 27, 2022

akaviaLab commented Jun 28, 2022 •

edited

Loading

akaviaLab commented Jun 28, 2022

cdiener commented Jun 28, 2022

akaviaLab commented Jun 28, 2022

cdiener left a comment

cdiener Jul 1, 2022

akaviaLab Jul 1, 2022

cdiener Jul 1, 2022

cdiener Jul 1, 2022

akaviaLab Jul 1, 2022

cdiener Jul 1, 2022

akaviaLab Jul 1, 2022

akaviaLab Jul 1, 2022

cdiener left a comment

cdiener commented Jul 5, 2022

Mat cleanup #1220

Mat cleanup #1220

Conversation

akaviaLab commented May 10, 2022 • edited by cdiener Loading

akaviaLab commented May 10, 2022

cdiener commented May 11, 2022

akaviaLab commented May 12, 2022

akaviaLab commented May 12, 2022

akaviaLab commented May 24, 2022

cdiener commented Jun 6, 2022

akaviaLab commented Jun 7, 2022

cdiener commented Jun 10, 2022

akaviaLab commented Jun 10, 2022

cdiener commented Jun 11, 2022

cdiener commented Jun 11, 2022

akaviaLab commented Jun 11, 2022 via email

codecov-commenter commented Jun 19, 2022 • edited Loading

Codecov Report

cdiener commented Jun 27, 2022

akaviaLab commented Jun 28, 2022 • edited Loading

akaviaLab commented Jun 28, 2022

cdiener commented Jun 28, 2022

akaviaLab commented Jun 28, 2022

cdiener left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdiener left a comment

Choose a reason for hiding this comment

cdiener commented Jul 5, 2022

akaviaLab commented May 10, 2022 •

edited by cdiener

Loading

codecov-commenter commented Jun 19, 2022 •

edited

Loading

akaviaLab commented Jun 28, 2022 •

edited

Loading