Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reduction using SAT and QBF solvers #407

Draft
wants to merge 6 commits into
base: devel
Choose a base branch
from

Conversation

notValord
Copy link
Contributor

Add implementation of two new types of reduction using solvers. The basic idea was the possibility of using solvers for reduction and whether it is possible to create minimal nondeterministic automata.

The solvers work with an automata representation using the CNF formula with variables representing its transitions, initial and final states. The solver then tries to find a solution that satisfies the given clause. The sets are updated until an equivalent automaton is found. The number of states is increasing from the default value making sure that the found automaton is minimal.

The reductions are working with structs SatStats and QbfStats, both inheriting from the base stat AutStats, which defines the number of states and symbols of the automaton, the two sets of words, and an output stream. Declarations can be found in include/mata/nfa/types.hh.

The reductions are working with the same interface as the other possible reduction algorithms which are parametrized using the ParameterMap.

The tests will be added shortly in future commits together with some small fixes. I was not sure with the proper location for the reduction implementation and its structures, as it is an another type of reduction the src/nfa/operations.cc seemed appropriate and the used structures were put into include/mata/nfa/types.hh but can be moved if necessary.

Add implementation of two new types of reduction using solvers. The
basic idea was the possibility of using solvers for reduction and
whether it is possible to create minimal nondeterministic automata.

The solvers work with an automata representation using the CNF
formula with variables representing its transitions, initial and final
states. The solver then tries to find a solution that satisfies the given
clause. The sets are updated until an equivalent automaton is found. The
number of states is increasing from the default value making sure that
the found automaton is minimal.

The reductions are working with structs SatStats and QbfStats, both
inheriting from the base stat AutStats, which defines the number of
states and symbols of the automaton, the two sets of words, and an output
stream. Declarations can be found in include/mata/nfa/types.hh.

The reductions are working with the same interface as the other possible
reduction algorithms which are parametrized using the ParameterMap.
@Adda0 Adda0 requested review from Adda0 and ondrik May 29, 2024 07:00
@ondrik
Copy link
Member

ondrik commented Jun 3, 2024

We discussed with @notValord to remove the linux binaries of SAT/QBF solvers and add them as requirements.

@Adda0
Copy link
Collaborator

Adda0 commented Jun 5, 2024

Hello. Thank you for the PR. One MacOS test seems to be failing. Otherwise, let me know when you finish with the changes so that the PR can be reviewed. In general, the implementation looks great to me after a quick glance.

as it is an another type of reduction the src/nfa/operations.cc seemed appropriate and the used structures were put into include/mata/nfa/types.hh but can be moved if necessary.

types.hh should be only for the general data types used by the user of Nfa. If the data structures are to be used only by the implementation of some algorithm, ideally the types should go as close to the place where they are to be used as possible. Such as in an anonymous namespace inside the source file with the algorithm. Or, if it is too much code, maybe creating some header files with internal namespaces such as plumbing, or internal might be appropriate. In this case, unless you plan on using these types elsewhere, I would just place everything in the anonymous namespace. Where do you think these types will be applicable / plan to use them?

@Adda0
Copy link
Collaborator

Adda0 commented Jun 5, 2024

We discussed with @notValord to remove the linux binaries of SAT/QBF solvers and add them as requirements.

Sounds good to me. However, I would go with optional requirements, ideally. However, we do not currently have a way to specify optional requirements. Maybe for now only a note in the comment when selecting the specific algorithm would be enough? That way, everyone can add complex algorithms with dependencies, but the main Mata implementation will remain as small and clean as possible. What do you and anyone else think about this?

@ondrik
Copy link
Member

ondrik commented Jun 5, 2024

Sounds good to me. However, I would go with optional requirements, ideally. However, we do not currently have a way to specify optional requirements. Maybe for now only a note in the comment when selecting the specific algorithm would be enough? That way, everyone can add complex algorithms with dependencies, but the main Mata implementation will remain as small and clean as possible. What do you and anyone else think about this?

I agree with optional requirements and tests (maybe generating a warning instead of a failed test).

Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass through the code. There are some comments about possible refactorization and some performance fixes. We should discuss these and address them before merging the PR.

#define TSEY_NOT -3

// define charactes used for SAT and QBF solvers
#define SOL_EOL std::string("0\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use only string literals here for the macros. Otherwise, the strings will have to be allocated for every single use of Mata during runtime. Furthermore, it would be ideal if the types were defined using constexpr char* SOL_EOL, or by using string view and string view literals.

@@ -48,6 +50,315 @@ struct Nfa; ///< A non-deterministic finite automaton.
/// An epsilon symbol which is now defined as the maximal value of data type used for symbols.
constexpr Symbol EPSILON = Limits::max_symbol;

// defining indexes of logic operators used in an input vector for tseytin transformation
#define TSEY_AND -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go for clarity here and use TSEYTIN_{AND, OR, NOT}. Furthermore, to ensure proper usage, I would define these as constexpr int TSEYTIN_{AND,OR, NOT} = <value>;.

Are these indices and SOL characters to be used by the users of the library, or is it only necessary inside the library algorithms? If it is the latter, I would move these definitions to the appropriate files and enclosed them in as small scope as possible. For example, TSEY_AND is currently only used in src/nfa/operations.cc. Therefore, they should be defined in an anonymous namespace inside the file. Similarly for SOL_xxx.

* @param[in] output Output stream
* @return The first variable index that was not utilized
*/
size_t reduction_tseytin(const std::vector <int>& input, size_t max_index, std::ostream& output);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be called tseytin_transformation() instead of a reduction? Furthermore, if I am seeing right, this method does not use the underlying automaton in any way, so it can be a utility function, not even a method o
NFA, no? Stored somewhere in utils/ folder, probably.

@@ -582,10 +619,13 @@ inline bool is_included(const Nfa& smaller, const Nfa& bigger, const Alphabet* c
* @param[in] alphabet Alphabet of both NFAs to compute with.
* @param[in] params[ Optional parameters to control the equivalence check algorithm:
* - "algorithm": "naive", "antichains" (Default: "antichains")
* @param[out] default_runs Optional parameter for intersection runs to return the found word
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is wasteful to pass by reference and have a statically initialized pair of pointers (which will get modified when the equivalence check gets run the second time -- therefore, they will no longer be the default runs). A pointer to a pair of runs should be used instead, default initialized to nullptr.

However, I think the global variable for default runs should be removed entirely and the parameter here should be a pointer default initialized to nullptr.

* @param[in] debug Flag for debug output as the reduction can take some time to get the info about the current status
* @return Reduced automaton
*/
Nfa reduce_sat(const Nfa &aut, const ParameterMap& params = {{"solver", "sat"}}, bool debug = false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that both the specific reduction algorithms should be hidden behind the general reduce function. That is, they should not be in the public API for Nfa, and be only accessible in the anonymous namespace of operations.cc. Or, since we are starting to have way to many reduction algorithms, we could create a new source file specifically for reductions reduction.cc with potentially a new header file reduction.hh which can be included when one want to call the specific reduction algorithms. Creating a new source and header files seems like the best approach to me.

size_t mata::nfa::SatStats::example_clauses(size_t max_index) {
size_t transitions_num = this->alpha_num * this->state_num * this->state_num;

for(auto word: this->accept) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a local copy of every vector. Use auto& when you want to create a reference to an auto deduced data type. This appear on multiple places around the code. If possible, try to find all of them and use references.

return max_index;
}

void mata::nfa::SatStats::recurse_tseytin_accept(const std::vector<int>& base, size_t state, Word word, const unsigned pos,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

word should be passed by a reference, here, no? And similarly in other functions. If possible, try to have a look at the function declarations and see whether some arguments cannot be passed by reference.

/**
* Base class for representing the input parameters for SAT and QBF reduction
*/
struct AutStats {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of all of these structs should be somewhere else. These types should not be used by the user of the library, correct? Therefore, I think creating a header file and a source file for reduction (reduction.cc and reduction.hh and moving code of all reductions and all these structs inside is a reasonable approach).

@Adda0 Adda0 marked this pull request as draft October 25, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants