Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add find_active_reactions.py to find active reactions quicker #841

Open
wants to merge 18 commits into
base: devel
Choose a base branch
from

Conversation

shjchan
Copy link

@shjchan shjchan commented Apr 19, 2019

Add find_active_reactions.py and a test_find_active_reactions.py.
Had some discussion with @ChristianLieven and @Midnighter before. Said it might be nice to add these functions for finding active reactions (primarily for finding reactions in loops) which solve an MILP or a few LPs instead of doing FVA. From tests on a few different models, comparing to using cobra.flux_analysis.find_blocked_reactions.py, it can be 3 - 5x faster. Would be good to suggest ways to streamline the code and improve speed. I have been using cobrapy from time to time but not very into its infrastructure. The test may need changes too.

@codecov-io
Copy link

codecov-io commented Apr 19, 2019

Codecov Report

Merging #841 into devel will decrease coverage by 0.46%.
The diff coverage is 75.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##            devel     #841      +/-   ##
==========================================
- Coverage   84.39%   83.93%   -0.47%     
==========================================
  Files          47       48       +1     
  Lines        4205     4433     +228     
  Branches      978     1044      +66     
==========================================
+ Hits         3549     3721     +172     
- Misses        423      458      +35     
- Partials      233      254      +21
Impacted Files Coverage Δ
cobra/flux_analysis/helpers.py 66.66% <100%> (+16.66%) ⬆️
cobra/flux_analysis/loopless.py 78.43% <67.1%> (-11.7%) ⬇️
cobra/flux_analysis/find_active_reactions.py 79.6% <79.6%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d1987c...67f2863. Read the comment docs.

@cdiener
Copy link
Member

cdiener commented Apr 19, 2019

It seems that you describe a full loop-removal procedure in the paper. If that does not have the same drawbacks as CycleFreeFlux it may be worthwhile to switch to yours. Could you use it for loopless FVA as well?

Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good style, some minor initial comments for now 😄


def find_active_reactions(model, bigM=10000, zero_cutoff=None,
relax_bounds=True, solve="lp", verbose=False):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a verbose flag and print, use the logging module as done in other modules and output the additional messages with INFO or DEBUG. That allows user to set the verbosity globally rather than individually for each function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the docstring is not in the right format. Needs a small description right after """".

else:
eps = normalize_cutoff(model, zero_cutoff)

eps = max(eps, model.solver.configuration.tolerances.feasibility * 100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may still be good to use `model.tolerance' here. Not all tolerances (integrality, feasibility, optimality) are implemented for all solvers (for instance GLPK has no optimality tolerance, they use different measures).



def find_reactions_in_cycles(model, bigM=10000, zero_cutoff=1e-1,
relax_bounds=True, solve="lp", verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is exported it should get a docstring.

return active_rxns


def find_reactions_in_cycles(model, bigM=10000, zero_cutoff=1e-1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does that zero_cutoff not default to model.tolerance as before?

large_model.solver = all_solvers
# solve LPs
rxns_in_cycles_lp = find_reactions_in_cycles(large_model)
# solving MILP or Fast-SNP may not be feasible for some solvers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make tests conditional on the available solvers. We can help you with that or you could look through some of the other tests for some examples.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you refer me to one of the examples? Have made other changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_solvers is a fixture that will test with all available solvers. Should be good as is.

@shjchan
Copy link
Author

shjchan commented Apr 19, 2019

It seems that you describe a full loop-removal procedure in the paper. If that does not have the same drawbacks as CycleFreeFlux it may be worthwhile to switch to yours. Could you use it for loopless FVA as well?

From what I understand, CycleFreeFlux is a post-optimization loop-removal procedure, which does not guarantee optimality. I guess it is particularly true for the reactions in loops. The method in my paper guarantee optimality as the original loop-law constraints and usually with much less binary variables, thus faster. The functions here are only for the first step of the whole localized-loopless-FVA procedure, i.e., to find a minimal feasible null space. The rest of the implementation will need some significant coding, which I hope I can do some time later.

@cdiener cdiener added the WIP work in progress label Apr 19, 2019
@shjchan
Copy link
Author

shjchan commented Apr 20, 2019

Any idea why the test is stuck at test_fastcc.py? I remember that when I just made the pull request, it ran through it.

Copy link
Member

@Midnighter Midnighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have the time yet to go through the entire code. Out of curiosity, how is your algorithm affected by (how does it treat) reactions that are unbounded in one direction? Are they then also intended to be bounded by big M?

cobra/flux_analysis/find_active_reactions.py Outdated Show resolved Hide resolved

Notes
-----
The optmization problem solved is as follow:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be really neat to use the Sphinx directives so that this is rendered as proper math. See here.

@shjchan
Copy link
Author

shjchan commented Apr 20, 2019

I didn't have the time yet to go through the entire code. Out of curiosity, how is your algorithm affected by (how does it treat) reactions that are unbounded in one direction? Are they then also intended to be bounded by big M?

They are bounded by either big M (if relax_bounds=True) or the original bounds (if relax_bounds=False). The additional constraints for irreversible reactions are actually similar to fastcc while for the reversible reactions are a little different.

@shjchan
Copy link
Author

shjchan commented Apr 21, 2019

It seems that you describe a full loop-removal procedure in the paper. If that does not have the same drawbacks as CycleFreeFlux it may be worthwhile to switch to yours. Could you use it for loopless FVA as well?

From what I understand, CycleFreeFlux is a post-optimization loop-removal procedure, which does not guarantee optimality. I guess it is particularly true for the reactions in loops. The method in my paper guarantee optimality as the original loop-law constraints and usually with much less binary variables, thus faster. The functions here are only for the first step of the whole localized-loopless-FVA procedure, i.e., to find a minimal feasible null space. The rest of the implementation will need some significant coding, which I hope I can do some time later.

Good news: the Fast-SNP which I also implemented in this pull request, can be readily used to largely reduce the time for loopless FVA already. I reorganized the newly added functions and added a method options to flux_variability_analysis (just a few lines). Tested on E. coli core model, results exactly the same. Tested on some reactions in iJO1366, around 40x time reduction (16xx sec --> 40 sec).

@cdiener
Copy link
Member

cdiener commented Apr 23, 2019

Good news: the Fast-SNP which I also implemented in this pull request, can be readily used to largely reduce the time for loopless FVA already. I reorganized the newly added functions and added a method options to flux_variability_analysis (just a few lines). Tested on E. coli core model, results exactly the same. Tested on some reactions in iJO1366, around 40x time reduction (16xx sec --> 40 sec).

This sounds great. If I find the time I will give it a go with the problematic example from #698. If it handles that well I think the way to go would be to drop CycleFreeFlux in favor of your method.

@shjchan
Copy link
Author

shjchan commented Apr 23, 2019

@cdiener great. I am pretty sure that it will work fine. By the way, if you have time, could you advise me what goes wrong in the test? It just froze after test_fastcc. I couldn't figure why. I don't think I have changed anything related to that. And I could finish the run if I ran pytest locally.

Copy link
Member

@Midnighter Midnighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to read your paper in detail but made some comments on the code already. Looks promising 😃

switch_constrs.append(prob.Constraint(r.flux_expression +
coeff_neg * z_neg[r], ub=-eps))

model.add_cons_vars([z for r, z in z_pos.items()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model.add_cons_vars([z for r, z in z_pos.items()])
model.add_cons_vars([z for z in z_pos.values()])

coeff_neg * z_neg[r], ub=-eps))

model.add_cons_vars([z for r, z in z_pos.items()])
model.add_cons_vars([z for r, z in z_neg.items()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model.add_cons_vars([z for r, z in z_neg.items()])
model.add_cons_vars([z for z in z_neg.values()])

model.add_cons_vars([z for r, z in z_neg.items()])
model.add_cons_vars(switch_constrs)
model.objective = prob.Objective(Zero, sloppy=True, direction="min")
model.objective.set_linear_coefficients({z: 1.0 for r, z in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above you can use the dict.values method.

model.objective = prob.Objective(Zero, sloppy=True, direction="min")
model.objective.set_linear_coefficients({z: 1.0 for r, z in
z_pos.items()})
model.objective.set_linear_coefficients({z: 1.0 for r, z in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

if solve == "milp":
try:
# ensure bigM*z << eps at integrality tolerance limit
eps = max(eps, model.solver.configuration.tolerances.integrality *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eps = max(eps, model.solver.configuration.tolerances.integrality *
eps = max(eps, model.tolerance *

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But here it is important for eps to be at least 10x larger then the integrality tolerance if solving MILP. Otherwise there could be integrality violation to satisfy the constraint so that some reactions do not necessarily have flux.
And I was not sure if it is always defined for every solver (i.e. model.configurationas.tolerances.integrality is always an attribute), so I added try/except here hoping to see if that might help pass the test. But it was probably not the issue and is not needed, correct? Is model.tolerance not less than integrality tolerance if it is used? Any suggestion? Just want to make sure it is large enough when solving MILP and when integrality is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. We have simplified the tolerance interface in cobrapy. So setting the model.tolerance attempts to set all available tolerances given by the solver. Thus it is also expected that the model.tolerance value is the same one for integrality, feasiblity, etc. So yes, using model.tolerance should do the right thing and also not raise any exceptions.

{r.reverse_variable: 1.0 for r in model.reactions})

iter = 0
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a potentially infinite loop. It'd be safer to introduce a function parameter max_iterations (or similar name) with a default value and then run a loop instead.

Suggested change
while True:
for i in range(max_iterations):

sol = model.optimize()

if sol.status == "optimal":
x = sol.fluxes.to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the following also works rather than making a copy?

Suggested change
x = sol.fluxes.to_numpy()
x = sol.fluxes.values

sol = model.optimize()

if sol.status == "optimal":
y = sol.fluxes.to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

constr_proj.ub = bigM
constr_proj.lb = eps
constr_proj.ub = None
sol = model.optimize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.optimize() may return an OptimizationError, should that be handled?

large_model.solver = all_solvers
# solve LPs
rxns_in_cycles_lp = find_reactions_in_cycles(large_model)
# solving MILP or Fast-SNP may not be feasible for some solvers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_solvers is a fixture that will test with all available solvers. Should be good as is.

@shjchan shjchan closed this May 1, 2019
@shjchan shjchan reopened this May 1, 2019
@shjchan
Copy link
Author

shjchan commented May 1, 2019

I think I have addressed all comments and have restructured the code so that it is more pythonic.

@cdiener
Copy link
Member

cdiener commented Jun 21, 2019

Sorry for the delay, just saw it and will go through it again. Will probably take me a little since it is a lot of code changes. From a quick look I don't see any changes to flux_variability_analysis as mentioned above and we will need more tests.

@cdiener cdiener self-assigned this Jun 21, 2019
@cdiener cdiener added ready Finished PR that requires review and merge. and removed WIP work in progress labels Jun 21, 2019
Copy link
Member

@cdiener cdiener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay. Most of it being compliant with pep8 and numpy docstring format as well as rebasing on the latest devel branch. However, the method also does not seem to pass its own tests. Also the tests should also compare your implementation to the reference implementation (method="original" in add_loopless) for a smaller model.

Some functionality seems to overlap with the already existing fastcc function. In particular the find_active_reactions function. It seems to be very similar things.

However, in_cycles version looks very useful and it would be great to add it to FVA. Is it guranteed to remove all cycles from the solution?

I can help out with fixing the the style issues if you want but would leave the methods to you if that is all right.



def fastSNP(model, bigM=1e4, zero_cutoff=None, eps=1e-3, N=None):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the numpy docstring format. So it needs a short imperative description after """.

x, y = None, None
try:
sol = model.optimize()
except OptimizationError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptimizationError has not been imported. Also the handling does not seem to be correct. If there is no feasible solution you should probably raise an error as well.

@@ -0,0 +1,486 @@

"""
Find all active reactions by solving a single MILP problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to specify what "active" means. There is already a find_blocked_reactions but this module does not seem to do the opposite (identifying any reactions that can carry flux).

LOGGER = logging.getLogger(__name__)


def find_active_reactions(model, bigM=10000, zero_cutoff=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming might be confusing. Maybe find_loopless_reactions or something.


def find_active_reactions(model, bigM=10000, zero_cutoff=None,
relax_bounds=True, solve="lp", verbose=False):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the docstring is not in the right format. Needs a small description right after """".

model.add_cons_vars(constr_min_flux)

n_lp_solved = 3
feas_tol = model.tolerance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable is never used.

constr_min_flux.lb = -min_flux
constr_min_flux.ub = -min_flux
constr_min_flux.lb = None
feas_tol = model.tolerance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable is never used.

constr_proj.lb = None
try:
sol = model.optimize()
except OptimizationError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -249,3 +270,156 @@ def loopless_fva_iter(model, reaction, solution=False, zero_cutoff=None):
best = reaction.flux
model.objective.direction = objective_dir
return best


def fastSNP(model, bigM=1e4, zero_cutoff=None, eps=1e-3, N=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also live in find_active_reactions.py.

benchmark(find_reactions_in_cycles, model)


def test_find_active_reactions(model, all_solvers):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails for me with all solvers:

def test_find_reactions_in_cycles(large_model, all_solvers):
        """Test find_reactions_in_cycles."""
    
        large_model.solver = all_solvers
        # solve LPs
        rxns_in_cycles_lp = find_reactions_in_cycles(large_model)
    
        rxns_in_cycles = ['ABUTt2pp', 'ACCOAL', 'ACKr', 'ACS', 'ACt2rpp',
                          'ACt4pp', 'ADK1', 'ADK3', 'ADNt2pp', 'ADNt2rpp',
                          'ALATA_L', 'ALAt2pp', 'ALAt2rpp', 'ALAt4pp', 'ASPt2pp',
                          'ASPt2rpp', 'CA2t3pp', 'CAt6pp', 'CRNDt2rpp',
                          'CRNt2rpp', 'CRNt8pp', 'CYTDt2pp', 'CYTDt2rpp',
                          'FOMETRi', 'GLBRAN2', 'GLCP', 'GLCP2', 'GLCS1',
                          'GLCtex', 'GLCtexi', 'GLDBRAN2', 'GLGC', 'GLUABUTt7pp',
                          'GLUt2rpp', 'GLUt4pp', 'GLYCLTt2rpp', 'GLYCLTt4pp',
                          'GLYt2pp', 'GLYt2rpp', 'GLYt4pp', 'HPYRI', 'HPYRRx',
                          'ICHORS', 'ICHORSi', 'INDOLEt2pp', 'INDOLEt2rpp',
                          'INSt2pp', 'INSt2rpp', 'NAt3pp', 'NDPK1', 'PPAKr',
                          'PPCSCT', 'PPKr', 'PPM', 'PROt2rpp', 'PROt4pp',
                          'PRPPS', 'PTA2', 'PTAr', 'R15BPK', 'R1PK', 'SERt2rpp',
                          'SERt4pp', 'SUCOAS', 'THFAT', 'THMDt2pp', 'THMDt2rpp',
                          'THRt2rpp', 'THRt4pp', 'TRSARr', 'URAt2pp', 'URAt2rpp',
                          'URIt2pp', 'URIt2rpp', 'VALTA', 'VPAMTr']
    
>       assert set(rxns_in_cycles_lp) == set(rxns_in_cycles)
E       AssertionError: assert {'ABUTt2pp', ...'ACt4pp', ...} == {'ABUTt2pp', '...'ACt4pp', ...}
E         Extra items in the left set:
E         'ICHORS_copy1'
E         'INSt2pp_copy2'
E         'ALAt2pp_copy1'
E         'ICHORS_copy2'
E         'GLYt2pp_copy2'
E         'GLCtex_copy2'...
E         
E         ...Full output truncated (43 lines hidden), use '-vv' to show

@cdiener cdiener removed the ready Finished PR that requires review and merge. label Aug 4, 2019
@cdiener cdiener added the WIP work in progress label Mar 13, 2020
@Midnighter Midnighter added the stale The issue or pull request lacks activity. label Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue or pull request lacks activity. WIP work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants