Skip to content

Support exponential sampling from the space of steady states of a metabolic network

Vissarion Fisikopoulos edited this page Mar 27, 2023 · 4 revisions

Overview

The project will expose, from volesti to the Python package dingo, two Markov Chain Monte Carlo sampling methods to sample from the exponential multivariate distribution over a convex polytope.
The methods are:

  1. NUTS Reflective Hamiltonian Monte Carlo (HMC) [1],
  2. Reflective Exact HMC [2],
  3. Riemannian Hamiltonian Monte Carlo [4].

The methods are implemented in C++ and already integrated in volesti codebase.

The contributor will also run experiments with several metabolic networks and compare the results against the case of uniform sampling.

Related work

In constraint-based metabolic modelling, physical and biochemical constraints define a polyhedral convex set of feasible flux vectors. Uniform sampling of this set provides an unbiased characterization of the metabolic capabilities of a biochemical network. The corresponding polyhedra typically lie in hundreds or thousands of dimensions.

To sample uniformly from a convex polytope, dingo uses Multiphase Monte Carlo Sampling method based on Billiard walk [3].
HMC is considered as the most efficient method to sample from general log-concave distributions over a convex polytope.

Exponential sampling from the set of flux vectors will allow for new biological insights.

For more information contact the mentors.

References:

[1] Matthew D. Hoffman, Andrew Gelman, The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo, 2014.
[2] Ari Pakman, Liam Paninski, Exact Hamiltonian Monte Carlo for Truncated Multivariate Gaussians, 2013.
[3] Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos, Geometric algorithms for sampling the flux space of metabolic networks, 2021. [4] Yunbum Kook, Yin Tat Lee, Ruoqi Shen, Santosh S. Vempala - Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space 2022

Details of your coding project

The contributor will have to extend the C++ bindings and Python wrappers of the dingo to expose the random walks mentioned above.
Then she/he will have to run a few experiments on benchmark metabolic networks using the new random walks and the MMCS method in dingo and write a brief report with the results.

Difficulty: Medium

Expected impact

The project will provide new biological insights. Thus, it is a very important project for GeomScale Org.

Mentors

  • Vissarion Fisikopoulos <vissarion.fisikopoulos at gmail.com> is an international expert in mathematical software, computational geometry and optimization, and has previous GSOC mentoring experience with Boost C++ libraries (2016-2017) and the R-project (2017).

  • Elias Tsigaridas <elias.tsigaridas at inria.fr> is an expert in computational nonlinear algebra and geometry with experience in mathematical software. He has contributed to the implementation, in C and C++, of several solving algorithms for various open source computer algebra libraries and has previous GSOC mentoring experience with the R-project (2019).

Students, please contact the first and the third mentor after completing at least one of the tests below.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: compile and run dingo. Use the documentation to sample from the flux space of e_coli.
  • Medium: Generate a random polytope in Python and use dingo to sample from it using MMCS.
  • Hard: Use HMC and NUTS HMC for exponential distribution in volesti (C++ implementations) to sample from Birkhoff and skinny polytopes (use the C++ polytope generators in volesti) for dimension d <= 200 and report on the run-times (implement this test in C++).

Solutions of tests

Students, please post a link to your test results here.

  • EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.