Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SOLVE kinetic STEADYSTATE sparse" is not solving for steady state. #1394

Open
nrnhines opened this issue Aug 10, 2024 · 13 comments
Open

"SOLVE kinetic STEADYSTATE sparse" is not solving for steady state. #1394

nrnhines opened this issue Aug 10, 2024 · 13 comments

Comments

@nrnhines
Copy link
Collaborator

nrnhines commented Aug 10, 2024

It appears that the implementation contains terms involving nt->dt
nocmodl makes use of _ss_sparse_thread which iterative solves a nonlinear system using auto const ss_dt = 1e9;
(ideally the solution should use the conservation constraints in place of an equal number of ODE to avoid trying to solve a singular matrix, but that is not directly relevant to this issue). This issue was observed when comparing NMODL and nocmodl results for nrn/share/examples/nrniv/nmodl/capmp.mod when launching nrngui capmp.hoc.
The left column is for NMODL, the right is for nocmodl. (note: I'm using NMODL branch 1uc/fix-cadifpmp-shift).
The top row shows calcium concentration as a function of time for three cell diameters. In every case the intention was to initialize the concentration to 0.01 mM (single compartment) The NMODL 100diam (black line) case calculated an initial steady state of approximately 0.0076 instead of the correct 0.01.

image

Ignore the bottom row. I'm showing it because the missing beginning and ending of the steady state calcium current vs constant calcium concentration for the NMODL simulation is a puzzle (but likely not relevant to the initialization issue for the large diameter case.)

I believe the steady state initialization issue is due to using the value of nt->dt in the steady state equations of the NMODL generated capmp.cpp file. Evidence for this is

oc>diam = 100
oc>init()
oc>cai
	0.0076215222 
oc>dt
	0.025 
oc>dt=10
oc>init()
oc>cai
	0.0099921471 

but my naive expectations are dashed with

oc>dt=1e9
oc>init()
special: /home/hines/neuron/temp/share/examples/nrniv/nmodl/x86_64/capmp.cpp:824: void neuron::nrn_init_capr(const {anonymous}::_nrn_model_sorted_token&, NrnThread*, Memb_list*, int): Assertion `false && "Newton solver did not converge!"' failed.
Aborted (core dumped)
@1uc
Copy link
Collaborator

1uc commented Aug 12, 2024

Thank you, I can reproduce the issue and I'll debugging now. STEADY_STATE is a very good next keyword to work on.

@1uc
Copy link
Collaborator

1uc commented Aug 13, 2024

I'm using the following simplified MOD file:

NEURON {
  SUFFIX minipump
}

PARAMETER {
  volA = 1e9
  volB = 1e9
  volC = 13.0
  kf = 3.0
  kb = 4.0
}

STATE {
  X
  Y
  Z
}

INITIAL {
  X = 40.0
  Y = 2.0
  Z = 0.0

  SOLVE state STEADYSTATE sparse
}

: Required b/c of a bug in NMODL.
BREAKPOINT {
  SOLVE state METHOD sparse
}

KINETIC state {
  COMPARTMENT volA {X}
  COMPARTMENT volB {Y}
  COMPARTMENT volC {Z}

  ~ X + Y <-> Z (kf, kb)
  CONSERVE Y + Z = 2.0
}

The equation for X in NOCMODL is:

volA * (X - X_old)/dt = - X*Y * kf + Z *kb 

In NMODL we divide the above by volA. For small values of vol? the difference doesn't matter, but for volA = 1e9 we suddenly start seeing failures to converge (to the correct solution).

I'll create a PR that adopts the battle-tested scaling factors used in NOCMODL.

@1uc
Copy link
Collaborator

1uc commented Aug 13, 2024

However, it's not enough to make capmp.mod work as expected.

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2024

I think it comes down to these three things:

  • scaling of the equations,
  • the the tolerances,
  • one case of cancellation magic.

Scaling ODE terms:

volA * (X - X_old)/dt = - X*Y * kf + Z * kb    # NOCMODL
(X - X_old)/dt = (- X*Y * kf + Z * kb)/volA    # NMODL

Scaling CONSERVE terms:

volB*Y + volC*Z = const     # NOCMODL
Z = (volB*Y - const)/volC   # NMODL

The tolerances involved are 1e-5 with NOCMODL vs. 1e-12 for NMODL. (But since the equations are scaled different it's hard to compare the two numbers.) Note that NOCMODL uses an l1 error while NMODL uses an l2 error.

Finally, there's the issue that the CONSERVE term has very favourable cancellation properties in NOCMODL:

volB*Y + volC*Z - const

assume volB >> volC, Y ~= Z, const = volB * Y(t=0) + volC*Z(t=0). Then, we get that:

volB*Y + volC*Z == volB*Y

Hence,

(volB*Y + volC*Z) - const  == 0.0 # NOCMODL
(volB*Y - const) + volC*Z == volC*Z # NMODL

then depending on the size of volC*Z this can be large compared to the tolerance of 1e-5 (simply because volC is big, but volB much bigger).

@nrnhines
Copy link
Collaborator Author

Note that NOCMODL uses an l1 error while NMODL uses an l2 error.

Not familiar with the l1 and l2 error terminology. What do those mean?
Very sorry that you needed to get so deeply into these numerical error analysis weeds.

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2024

Using numpy notation:

l1_err = np.sum(np.abs(x))
l2_err = np.sum(x**2)**0.5

(it's unlikely a really important part of the discussion.)

Might as well make it a complete answer: Mathematicians like to unify, here it's the exponent one can replace with a symbol p (is the favourite):

lp_error = np.sum(np.abs(x)**p)**(1.0/p)

With a little abuse of notation one can allow p = inf:

linf_error = np.max(np.abs(x))

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2024

Since we anyway have access to the Jacobian, we can (somewhat) inexpensively do Newton error propagation:
https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Simplification

The idea is that it should give us an estimate of how big we'd expect round-off errors to be. Once we know that we can pick a tolerance, e.g. 100.0 "newton errors". The good thing is that the estimate is per element of F. We simply require each element of F to have converged.

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2024

It requires two fixes:

  1. Set dt = 1e9 before computing the steady state.
  2. Use Newton error propagation to estimate how large roundoff error in F are.

If we do both we get the plot below. (I only tested this, and the minipmp.mod (posted above).)

2024-08-16-172843_491x765_scrot

@nrnhines
Copy link
Collaborator Author

nrnhines commented Aug 16, 2024

Looks good. (I wonder where my above Assertion `false && "Newton solver did not converge!"'came from in response to oc>dt=1e9? But that's just idle curiosity.)

@1uc
Copy link
Collaborator

1uc commented Aug 16, 2024

I don't see a segfault, only a very long error message leading to an assert followed by a core dump:

special: /home/hines/neuron/temp/share/examples/nrniv/nmodl/x86_64/capmp.cpp:824: void neuron::nrn_init_capr(const {anonymous}::_nrn_model_sorted_token&, NrnThread*, Memb_list*, int): Assertion `false && "Newton solver did not converge!"' failed.

I'm happy to follow up on the SEGFAULT, if it doesn't go away after this has been made clean and merged.

@nrnhines
Copy link
Collaborator Author

Sorry. Changed the comment. There was no segfault.

@1uc
Copy link
Collaborator

1uc commented Aug 19, 2024

That's plausibly explained by something I didn't mention in the summary. One more difference is how F scales in dt:

(X - X_old)/dt = dX    # NOCMODL
X = X_old + dt*dX      # NMODL

meaning the residual would grow linearly with dt. Hence, it would make sense that it starts failing if one pushes dt to very large values.

@1uc
Copy link
Collaborator

1uc commented Aug 19, 2024

The changes have been implemented here: #1403.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants