Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mat cleanup #1220

Merged
merged 12 commits into from
Jul 5, 2022
Merged

Mat cleanup #1220

merged 12 commits into from
Jul 5, 2022

Conversation

akaviaLab
Copy link
Contributor

@akaviaLab akaviaLab commented May 10, 2022

  • description of feature/fix
    This does some minor changes to SBML, mostly that annotations (and others) will be loaded as lists. It also changes the example models a bit. The goal is to make sure that XML that is written to MATLAB, then reread will be identical to the original SBML loaded.
    Fixes some of the FIXMEs in sbml.py saying "Should be al list"
  • tests added/passed
  • add an entry to the next release

uri.akavia added 6 commits May 10, 2022 09:14
changes to sbml.py to read annotations as list
improved validate.py so it won't fail with new files
modified annotation.xml, e_coli_core.xml to have more correct XML files and less errors compared to mat file
modified true values in test_annotation.py and test_sbml.py
got sbml to add the SBO term for subsystem
updated wrong_key_caps.mat file and the associated test
minor tidying of sbml.py, test_mat.py
some tidying and flake8 correction of test_sbml.py
…to MATLAB

Fixed some issues in mat.py
Made SBML read groups into subsystem. Made SBML have group annotations as list
SBML will set undefined formula and model name as None
@akaviaLab
Copy link
Contributor Author

@cdiener - This is the updates we were discussing on gitter.
One question - looking at TestCobraIO, there are some that are expected to fail - if I comment the xfail section out, the only one that fails is "fbc1". Would you like me to modify these sections? I'm not sure why they exist or why is it supposed to fail.

@cdiener
Copy link
Member

cdiener commented May 11, 2022

Hi, thanks for the fixes. One comment before I start reviewing. For annotations, both the {"provider": "value"} and {"provider": ["value", ...]} formats are valid and for single values we probably even prefer the first version since it's a bit more concise. So there is no need to change every single-value annotation to the second form. It's okay if the mat reader returns {"provider": ["value"]} annotations for now. I guess you changed it to check for equality of models?

@akaviaLab
Copy link
Contributor Author

Yes, I changed it for that reason.
Also, I don't like where you have a situation that a variable is sometimes a list, and sometimes a str - every function that tries to access this variable will have to check str/list and process accordingly.

@akaviaLab
Copy link
Contributor Author

Might become irrelevant if we update the annotations, but we can decide for now.

@akaviaLab
Copy link
Contributor Author

@cdiener Ping. Can you please review this?

@cdiener
Copy link
Member

cdiener commented Jun 6, 2022

I'm still somewhat reluctant to change the test models' annotation format here mostly because the idea is to have the test models manually vetted and than adapt the code base to read them well. Tests and/or the code base still has to be able to recognize that {"provider": "annotation"} is the same as {"provider": ["annotation"]}.

@akaviaLab
Copy link
Contributor Author

Okay. We can discuss this in detail, but there are several changes to the annotation files (which perhaps should be in a separate PR)

  1. Removing some spaces which were confusing Matlab/XML comparison and don't have any other effects
  2. Converting ncbigi, which is not a valid identifiers.org identifier to ncbiprotein, which is.
  3. Replacing some wrong annoation, using bigg.model annoation for metabolite for some reactions instead of the actual bigg.models reaction annoation
    4)Removing unused compartments from the notes.xml file, since Matlab doesn't make compartments that are unused. This led to confusion when comparing XML and Matlab. I can change it back, but since I made the original notes.xml file, I thought it wouldn't matter that much.
    Should I move the compare_models to test? If so, can you answer my questions there (add compare.py which contains functions for comparing. #1206) please?

Are these the changes you meant?

@cdiener
Copy link
Member

cdiener commented Jun 10, 2022

All of those cases are fine with me. I would just revert the conversions from single annotations to list. That has to be fixed in the tests that should accept the equivalence I outlined in the previous comment.

Regarding the comparison it sounded like @Midnighter had some opinion on that so that's why I was waiting on more feedback from him.

@akaviaLab
Copy link
Contributor Author

I'm a bit confused about the list for single annotations.
Right now, before my changes, if there is one annotation for any key, it builds it as a string. If another is added, it converts it to a list. Therefore, every function that accesses annotations has to check if it is a list or a string.
That doesn't make a lot of sense to me - it makes sense to always have it as a list, so the behavior is constant. What is the argument against doing that?
If the annotations are always a list, I don't understand what you mean about the tests.
In my revision of the metadata #1225, I think I made the code so that annotations will be constructed as a list even if given a string, so annotation read will be equal to both list and string as test cases. I don't see how to do it here.

@cdiener
Copy link
Member

cdiener commented Jun 11, 2022

Because by changing the test models to be more convenient for you, you don't change the fact that annotations are still allowed to be strings or lists for single elements. You just remove examples where they are strings from the test suite which then covers less data formats. We need to remain backwards compatible with old models in JSON or YAML format and therefore that case still has to remain supported and tested until we change it to a new annotation format. But like I said it also comes down to only modifying the test models if they are actually wrong and not if it's more convenient to write tests.

@cdiener
Copy link
Member

cdiener commented Jun 11, 2022

To be clear my comments are only aimed at the JSON test models you are changing. In the mat parsing you are free to return them as you wish. Also why are you changing almost all other test models in this PR (like pickle and SBML ones as well). What do those have to do with the Matlab interface?

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jun 11, 2022 via email

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2022

Codecov Report

Merging #1220 (ee4842b) into devel (e838e54) will decrease coverage by 0.25%.
The diff coverage is 66.25%.

@@            Coverage Diff             @@
##            devel    #1220      +/-   ##
==========================================
- Coverage   84.59%   84.33%   -0.26%     
==========================================
  Files          66       66              
  Lines        5491     5509      +18     
  Branches     1264     1268       +4     
==========================================
+ Hits         4645     4646       +1     
- Misses        545      557      +12     
- Partials      301      306       +5     
Impacted Files Coverage Δ
src/cobra/io/mat.py 79.37% <63.76%> (-2.92%) ⬇️
src/cobra/io/sbml.py 80.14% <81.81%> (-0.32%) ⬇️
src/cobra/flux_analysis/__init__.py 100.00% <0.00%> (ø)
src/cobra/flux_analysis/deletion.py 89.47% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e838e54...ee4842b. Read the comment docs.

@cdiener
Copy link
Member

cdiener commented Jun 27, 2022

Okay, I'm onboard with everything except for the following which I'd like to discuss:

  • read annotations into list (other PR or not at all)

In general that make sense, however that would be an API breaking change so we would need to release a new version for that and it would not be backwards compatible. Since there is another PR lined up that will add new annotation formatting and will need a new version as well I would skip it in favour of the FBC3 PR.

  • read group back into subsystem to keep read/write consistent, also since
    mat doesn't understand groups, only subsystem (not sure where this should
    be, if at all)

Subsystems are deprecated in favor of groups and should only read in the formats that still have them like legacy SBML and Matlab. So I would not convert them here. This info would be dropped in my book since it's data in groups that the Matlab format does not support. I get that this breaks roundtripping and other may think different. My worry would be what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act?

And the unmentioned changes in mini.json. If mini.json passes the JSON schema validator in this version, it should not be changed.

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jun 28, 2022

Other PR for everything else that isn't mat, or this PR?
Like the changes in annotation to make it more identifiers compliant?
The list change will be saved for the appropriate metadata PR.

@akaviaLab
Copy link
Contributor Author

what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act?

The groups are only exported for reactions. If reactions have the attribute "subsystem" and groups, "subsystem" will win and be exported, "groups" will be dropped.
If there is a reaction that belongs to a group, that reactions is exported with subsystem equal to the group id/name. Any other entities in the group are not exported.

So if you have something like
Group1 = [rxn1, rxn2, Group2]
Group2 = [rxn3, met1]

The subsystems when exported would look like
rxn1 - Group1
rxn2 - Group1
rxn3 - Group2

@cdiener
Copy link
Member

cdiener commented Jun 28, 2022

I think in general it's easier to review several smaller Pars than a huge one. However, since we already discussed a lot of the things here let's just continue as it is. If you want just use a more descriptive title.

Sorry I misunderstood the subsystem thing. Your solution makes sense.

@akaviaLab
Copy link
Contributor Author

Okay. So I think everything discussed is done.
The annotations into list should be in the metadata PR.

Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Some minor comments and a doubt about duplicated models files.

"ncbigi": [
"GI:1208453",
"GI:1652654"
"ncbiprotein": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between the mini.json in src/cobra/data and the one in tests/data. Couldn't the tests just use the first one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None.
The tests are using the one in tests, because that's how the functions are designed. The ones in src/cobra/data are example files for users installing cobra without development and without tests.
If we want to have tests rely on the files in src/cobra/data we can rewrite tests and update_pickles.py, which I'd be happy to do in a different PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me 👍

src/cobra/io/mat.py Outdated Show resolved Hide resolved
src/cobra/io/mat.py Outdated Show resolved Hide resolved
src/cobra/io/mat.py Outdated Show resolved Hide resolved
@@ -856,7 +903,9 @@ def from_mat_struct(
rxn_group_names = set(rxn_subsystems).difference({None})
new_groups = []
for g_name in sorted(rxn_group_names):
group_members = model.reactions.query(lambda x: x.subsystem == g_name)
group_members = model.reactions.query(
lambda x, g_n=g_name: x.subsystem == g_n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to define default values for a lambda function. The previous one looks better to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. My linter complained about it. Reverted.

@@ -598,7 +598,7 @@ def _sbml_to_model(
if not libsbml.SyntaxChecker.isValidSBMLSId(model_id):
LOGGER.error(f"'{model_id}' is not a valid SBML 'SId'.")
cobra_model = Model(model_id)
cobra_model.name = model.getName()
cobra_model.name = model.getName() or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you marked SBML returning a '' as model name by default as TODO but it seems like this fixes that, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This does fix it. I can remove the TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed TODO

Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll merge after the typo is fixed.

release-notes/next-release.md Outdated Show resolved Hide resolved
@cdiener
Copy link
Member

cdiener commented Jul 5, 2022

Awesome, thanks so much for your patience!

@cdiener cdiener merged commit 42af7e9 into opencobra:devel Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants