Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new AST pattern-matching library #940

Merged
merged 25 commits into from
Jan 10, 2024
Merged

Implement new AST pattern-matching library #940

merged 25 commits into from
Jan 10, 2024

Conversation

Baltoli
Copy link
Contributor

@Baltoli Baltoli commented Dec 19, 2023

This PR implements a new library to improve the way that we write code to deconstruct and examine sub-terms of large, nested AST patterns. For example, consider getLeftHandSide. This function is around 120 lines of code without explanatory comments, and at one point closes no fewer than 10 nested scopes at once.

Reading, writing and reviewing code of this kind is difficult, tedious and prone to bugs; @Scott-Guest did a good job of untangling the intent of these functions previously, but it's not ideal to be doing a lot of that kind of forensics. There is a sufficient quantity of code in the backend that does this kind of decomposition that I think it's worth finding a more reusable solution.1

My proposed solution is to implement a template metaprogramming library that defines a small DSL modelling similar patterns to Scott's explanatory comments. For example, with a bit of setup, the comment:

// lhs(\rewrites(\and(\equals(_, _), X), _)) = [X]

becomes the following valid C++ code:

map(rewrites(and_(equals_(any, any), X), any), getSingleton)

which matches the structure of the explanation precisely, rather than losing the intent of the code behind a pile of conditional branches and imperative code.

By composing several of these patterns together, we are able to shrink functions like getLeftHandSide down to a tiny fraction of their original size, while also improving their readability and correctness2 and preserving error-checking behaviour.3

The changes in this PR are as follows:

  • Extract the functions in AST.cpp that perform pattern-matching functions to a new, separate translation unit.
  • Implement the pattern-matching DSL in pattern_matching.h; this implementation is well-documented internally and I hope will be easy to review as a standalone feature. Additionally, unit tests for this library are added.
  • Refactor the AST pattern-matching code to use the new abstractions.

Footnotes

  1. Additionally, clang-tidy complains heavily about these functions - simplifying them will allow more code quality and static analysis checks to be enabled.

  2. I found two cases where the specification comments didn't quite match the intent of the code. This is - to be clear - a very good thing for the quality of those comments given the complexity of the code.

  3. One function maps a set of "good" patterns to nullptr and errors for "bad" patterns; we can express that behaviour using the new abstraction.

@Baltoli Baltoli marked this pull request as ready for review December 22, 2023 12:41
Copy link
Contributor

@Scott-Guest Scott-Guest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This was a much-needed code quality improvement.

I have some ideas in terms of allowing multiple subjects, but let's get this merged first, and I'll do that in a follow-up PR.

(Also, big kudos for the cleanliness, documentation, and testing here! I'd love to see this level of quality in the whole codebase.)

auto bar = term("bar");
auto big = term("baz", term("a", term("a1"), term("a2")), term("b"));

for (auto const &t : {foo, bar, big}) {
Copy link
Contributor

@Scott-Guest Scott-Guest Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you could use initializer lists in a range-based for loop like this 🤯

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - I probably wouldn't do this outside of a unit test, though. There are some footguns1 around lifetimes and deduction when you use initializer lists.

Footnotes

  1. That I can't remember in enough detail for my personal rule for real code to sensibly be anything other than "don't do this"

lib/ast/pattern_matching.cpp Outdated Show resolved Hide resolved
@Baltoli Baltoli requested a review from gtrepta January 8, 2024 10:12
@Baltoli
Copy link
Contributor Author

Baltoli commented Jan 8, 2024

some ideas in terms of allowing multiple subjects

Indeed - the subject(L) lens is a specific instance of a more general lens bind(L, std::string) that binds the result of a nested match to a name, and bubbles up a mapping from names to subterms on success. It so happens that the code we use lenses for currently doesn't need multiple subjects, but it would be a nice generalisation to add for the future!

Co-authored-by: Scott Guest <scott.guest@runtimeverification.com>
@rv-jenkins rv-jenkins merged commit 276afab into master Jan 10, 2024
7 checks passed
@rv-jenkins rv-jenkins deleted the patterns branch January 10, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants