Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build direct interface #57

Open
metab0t opened this issue May 20, 2024 · 13 comments
Open

Build direct interface #57

metab0t opened this issue May 20, 2024 · 13 comments
Labels
enhancement New feature or request p-medium Medium priority

Comments

@metab0t
Copy link

metab0t commented May 20, 2024

Hello, @staadecker!

I am the author of PyOptInterface, an efficient modeling interface for mathematical optimization in Python. Its performance is quite competitive compared with existing solutions (faster than vendored Python bindings of some optimizers).

As a researcher of power system, I deeply understand the need to construct large-scale optimization models efficiently. PyOptInterface provides user-friendly expression-based API to formulate problems with different optimizers, and the lightweight handles of variables and constraints can be stored freely in built-in multidimensional container or Numpy ndarray or Python dataframes (as you like).

I think that PyOptInterface might be a good abstraction layer for your package to handle different optimizers. You can try out our package and evaluate its performance (both memory consumption and speed) if you are interested, and we welcome feedbacks and suggestions in any form.

@metab0t
Copy link
Author

metab0t commented May 20, 2024

We believe there are fundamental limitations to the file-based I/O as pointed out by comment of developer of JuMP.jl.
For example, file-based I/O makes the following advanced features nearly impossible: incremental modification and re-solve, extensible solver-specific attributes, on-demand query of solutions. Large file I/O is also slower than in-memory operations.

@staadecker
Copy link
Member

Hi @metab0t!

Thank you for bringing PyOptInterface to my attention! Your approach, if I understand correctly, of calling Gurobi's C API directly is very neat! Great job implementing that as I imagine it was non-trivial to get the Python-C bindings working properly.

Pyoframe is built on polars a Rust-based dataframe library that follows the Apache Arrow DataFrame format (and not numpy's format). So one issue I foresee with building off of PyOptInterface is the conversion from Polars to your C++ API (which might be slow?). I think long-term this could be a good goal as I agree file-based I/O is an inefficient way to build models. However, for now, I think file-based IO is good enough: our polars-based writer is extremely fast (~10s for very large models), Gurobi reads in the model as well extremely quickly (~10s for very large models) and we don't use files to read back the results.

Incremental modification, re-solve, etc. are rather niche cases imo although something I'd like to support on the long-term at which point PyOptInterface might make a lot of sense.

If you have an easy way to integrate Polars dataframes with your library do let me know as it would be great to support 4 solvers. However, I would guess such an integration is non-trivial and such a project would need to wait. Let me know! (Also happy to setup a call to discuss).

Thanks for reaching out!!

@metab0t
Copy link
Author

metab0t commented May 21, 2024

Thanks for your explanation!

I think that it would be not difficult to integrate Polars DataFrame with PyOptInterface. Variables and Constraints in PyOptInterface are just lightweight Python objects, and they can be stored as polars.Object column.

There is no need to store UB, LB, RC of variables because they are stored internally by Gurobi and can be queried on demand.

The file-based IO can be skipped because we have added them to the Gurobi model once they are created. Just call model.optimize() and the solution can be queried using the Variable and Constraint handles we previously stored in Polars DataFrame directly (to skip the io mapping process as well).

I will give a brief example later.

@metab0t
Copy link
Author

metab0t commented May 21, 2024

import polars as pl
import pyoptinterface as poi
from pyoptinterface import gurobi

model = gurobi.Model()

# Create a DataFrame
df = pl.DataFrame({
    "X": [1, 2, 3],
    "Y": [4, 5, 6],
    'lb': 0.0,
    'ub': 2.0
})


def addvar(lb, ub):
    return model.add_variable(lb=lb, ub=ub)


df = df.with_columns(
    pl.struct(["lb", "ub"]).map_elements(lambda x: addvar(x["lb"], x["ub"]), return_dtype=pl.Object)
    .alias
    ("Variable")
)

vars = df["Variable"]

model.add_linear_constraint(poi.quicksum(vars), poi.Geq, 1.0)

obj = poi.quicksum(v * v for v in vars)
model.set_objective(obj)

model.optimize()

df = df.with_columns(
    pl.col("Variable").map_elements(lambda x: model.get_value(x), return_dtype=pl.Float64).alias(
        "Value")
)

print(df)

@staadecker This is a simple example to combine PyOptInterface and Polars to solve a QP problem.

@staadecker
Copy link
Member

Thank you @metab0t !

I don't think we'd want to change the expression generation code over to poi.quicksum as the whole benefit of this library is the rapid creation of very large expressions using polars. Additionally, I'd be afraid that .map_elements would be quite slow (perhaps even slower than the fileIO).

In any case, I'm currently swamped with work so I need to put this on hold.

@metab0t
Copy link
Author

metab0t commented May 23, 2024

I have played with Polars and find that its support for Python object is not complete pola-rs/polars#10189

The design of Pyoframe is quite neat. Constraint.lhs.data and Variable.data are compact polars.DataFrame to store their terms and indices, which makes it easy for a possible switch in the future.

In general, using DataFrame to represent multidimensional indices and their sparse combination is a great choice. I remember the benchmark of GAMS and response of JuMP.jl where using DataFrames.jl improves the performance significantly. https://github.com/Gurobi/gurobipy-pandas is also an interesting project to use pandas.DataFrame as container of optimization.

@staadecker
Copy link
Member

@metab0t thank you, I'm glad you like it :)

Before building the library I actually tried to do something simple like gurobipy-pandas with polars but due to Python objects not being fully supported I couldn't store Gurobi Python expressions in a dataframe as gurobipy-pandas does.

@metab0t
Copy link
Author

metab0t commented May 23, 2024

The support for Python object seems not to be the priority of Polars, otherwise a similar API like gurobipy-pandas will be easy to implement (to store persistent variables/constraints objects in DataFrame directly).

Besides, the expression system of PyOptInterface is quite fast to construct expressions with many terms. The core is implemented by efficient hashmap in C++.

I prepare an example based on the facility_problem in Pyoframe repo at https://gist.github.com/metab0t/c3c685a8b2ec1f14171772bd7bc7ea3e

On my computer, the result is:

Pyoframe elapsed time: 28.71 seconds
POI elapsed time: 14.48 seconds

@staadecker
Copy link
Member

Very neat comparison. Do you have a breakdown of where the time is being taken in PyoFrame (expression building vs io)?

@metab0t
Copy link
Author

metab0t commented May 23, 2024

I have updated my gist to report time of Pyoframe in detail.

The time spent on expression building, write LP file and read LP file is approximately 1:1:2. So expression building occupies 25% time and file io occupies the other 75% time.

@staadecker
Copy link
Member

Very neat, this confirms that file io is not ideal and that when I have time I should build a direct interface, perhaps using PyOptInterface. For context, expressions are stored in a "narrow" format where each row is a term and there is a column for the term's coefficient, and another with an ID to indicate the variable. Would that be something easily converted to your API? I'm thinking it is at that level that I'd want to pass things off to C.

@staadecker staadecker reopened this May 24, 2024
@staadecker staadecker changed the title Greeting from PyOptInterface Build direct interface May 24, 2024
@staadecker staadecker added enhancement New feature or request p-medium Medium priority labels May 24, 2024
@staadecker
Copy link
Member

The file based IO also requires a lot of code (i.e. all of io.py and io_mappers.py) so it would be great to get rid of it (we can always use gurobi to generate the .lp file for inspection).

@metab0t
Copy link
Author

metab0t commented May 24, 2024

The representation of expression is OK.

In fact, the variable in PyOptInterface is a thin wrapper of its ID, and the linear expression is two vectors representing the coefficients and indices of variables.

Storing variable object (from PyOptInterface or gurobipy) directly in Polars is not recommended because Polars supports Object poorly.

You can build ONE big array to store all the variables in the model and the variable id points to the array. When you want to add the constraint to the model, just traverse all rows and construct the expression object.

By the way, PyOptInterface supports writing the model to LP/MPS files as well. We use the native C API provided by gurobi and the output should be identical with gurobipy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request p-medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants