Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify type_check_expr and type_check_expr_in #574

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

erszcz
Copy link
Collaborator

@erszcz erszcz commented Oct 15, 2024

This attempts the unification discussed in #362. The are some obvious takeaways from this exercise, some already mentioned by @xxdavid in that thread:

  • exhaustiveness checking is not done in type_check_expr (somewhat recovered in this draft)
  • operator support is not as good in type_check_expr
  • it seems to me that two "parallel" implementations (type_check_expr and type_check_expr_in) lead to features being added/improved ad-hoc in just one of them, because of similar yet duplicate code and noncomprehensive testing.

What do I mean by the last point? Roughly speaking, to check if syntax for Expr is handled properly by the parallel impls, we need two test cases:

-spec f1() -> some_ty().
f1() ->
    Expr.

-spec f2() -> some_ty().
f2() ->
    X = Expr,
    X.

Where in f1 type_check_expr_in would check that Expr :: some_ty() and in f2 type_check_expr would infer the type of Expr to assign it to X and only then check if X :: some_ty(). This is not obvious nor documented anywhere and I think it has lead to the disparity in syntax support in the two impls. From this perspective, merging the two seems to be compelling.

In order to make tests pass in this draft some tests had to be disabled or moved to known problems. I think that if we want to unify these parallel impls, we should bring them all into should pass/fail categories in this PR.

@erszcz
Copy link
Collaborator Author

erszcz commented Oct 15, 2024

Note for self - an example of a class of the self-check errors we now see is this:

$ cat tzt.erl
-module(tzt).

-type options() :: proplists:proplist().

-spec parse_opts([string()], options()) -> {[string()], options()}.
parse_opts([], Opts) ->
    {[], Opts};
parse_opts([A | Args], Opts) ->
    case A of
        "-h"                       -> {[], [help]};
        "--help"                   -> {[], [help]};
        "--verbose"                -> parse_opts(Args, [verbose | Opts])
    end.
$ ./bin/gradualizer tzt.erl
tzt.erl: The list on line 12 at column 56 is expected to have type options() but it has type [verbose | property(), ...]

        "-h"                       -> {[], [help]};
        "--help"                   -> {[], [help]};
        "--verbose"                -> parse_opts(Args, [verbose | Opts])
                                                       ^^^^^^^^^^^^^^^^

but

$ r3 shell
1> OptionTy1 = gradualizer:type("[verbose | proplists:property(), ...]").
{type,0,nonempty_list,
      [{type,0,union,
             [{atom,0,verbose},
              {remote_type,0,
                           [{atom,1,proplists},{atom,1,property},[]]}]}]}
2> OptionTy2 = gradualizer:type("gradualizer:options()").
{remote_type,0,[{atom,1,gradualizer},{atom,1,options},[]]}
3> typechecker:subtype(OptionTy1, OptionTy2, gradualizer:env()).
true
4>

Verbose log:

$ ./bin/gradualizer --verbose tzt.erl
Checking module tzt
Spawning async task...
Checking function parse_opts/2
Pushing {clauses_controls,true}
6:1: Checking clause :: {[string()], options()}
6:16: Setting var type Opts :: [property()]
7:6: Propagated type of [] :: []
7:10: Propagated type of Opts :: [property()]
7:5: Propagated type of {[], Opts} :: {[], [property()]}
7:5: Checking that {[], Opts} :: {[string()], options()}
8:1: Checking clause :: {[string()], options()}
8:13: Setting var type A :: [char()]
8:17: Setting var type Args :: [] | [string(), ...]
8:24: Setting var type Opts :: [property()]
9:10: Propagated type of A :: [char()]
10:40: Propagated type of [] :: []
10:45: Propagated type of help :: help
10:49: Propagated type of [] :: []
10:44: Propagated type of [help] :: [help, ...]
10:39: Propagated type of {[], [help]} :: {[], [help, ...]}
11:40: Propagated type of [] :: []
11:45: Propagated type of help :: help
11:49: Propagated type of [] :: []
11:44: Propagated type of [help] :: [help, ...]
11:39: Propagated type of {[], [help]} :: {[], [help, ...]}
12:50: Propagated type of Args :: [] | [string(), ...]
12:50: Checking that Args :: [string()]
12:57: Propagated type of verbose :: verbose
12:67: Propagated type of Opts :: [property()]
12:56: Propagated type of [verbose | Opts] :: [verbose | property(), ...]
12:56: Checking that [verbose | Opts] :: options()
Task returned from function parse_opts/2 with [{type_error,
                                                {cons,
                                                 {12,56},
                                                 {atom,{12,57},verbose},
                                                 {var,{12,67},'Opts'}},
                                                {type,0,nonempty_list,
                                                 [{type,0,union,
                                                   [{atom,0,verbose},
                                                    {user_type,
                                                     [{file,"proplists.erl"},
                                                      {location,0}],
                                                     property,[]}]}]},
                                                {user_type,0,options,[]}}]
tzt.erl: The list on line 12 at column 56 is expected to have type options() but it has type [verbose | property(), ...]

        "-h"                       -> {[], [help]};
        "--help"                   -> {[], [help]};
        "--verbose"                -> parse_opts(Args, [verbose | Opts])
                                                       ^^^^^^^^^^^^^^^^

mkdir -p "test_data"
erlc $(TEST_ERLC_OPTS) -o test_data $(test_data_erls)
@mkdir -p "test_data"
@erlc $(TEST_ERLC_OPTS) -o test_data $(test_data_erls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I've always been annoyed by build tools that don't show they are doing, i.e. don't print the commands that run (like CMake or Rebar, they hide everything by default). In make you at least see the commands by default. When the commands are printed, you can understand what happens and you can easily repeat the same commands manually.

I'm not going to argue, but I am going to unassign myself as a reviewer of this PR. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree, I don't like opaque complexity either. With this number of repetitions, though, I could tell by heart what it's doing ;) I don't really need to see again and again the parts that don't change, I'm more interested in the output which does change. Anyway, this bit is a marginal detail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the files don't change, they targets shouldn't need to be rebuilt. Some dependency declaration missing here seems to be the cause of repetition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the Makefile can probably be tweaked so that tests are built only when they actually change. I would consider that a better option than just silencing the commands.

Copy link
Collaborator

@xxdavid xxdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, good attempt! Indeed, the only tests failing after merging the two functions seem to be related to anonymous functions, arithmetical operations and exhaustiveness checking. I am not sure with the lambdas but the latter two can probably be implemented inside type_check_expr. I've skimmed over the changes and I see that a lot of known problems aren't known problems anymore. That's great!

What are we gonna do now with this draft? Do you still want to pursue merging these functions or was it just a demo of it?

Comment on lines -17 to -18
-spec return_fun_intersection() -> fun((any()) -> integer()).
return_fun_intersection() -> fun number/1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you just moved this function around, but why should this pass? The spec says that return_fun_intersection/0 returns a function that returns an integer, but it actually returns a function that can also return a float (if I pass it a float). A function that might return a float should not be a subtype of a function that returns an integer.

Comment on lines -20 to -23
-spec return_fun_union_intersection()
-> fun((atom()) -> atom()) |
fun((1..3) -> integer()).
return_fun_union_intersection() -> fun number/1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same goes for this function.

-export([return_fun_no_spec/0]).

-spec return_fun_no_spec() -> integer().
return_fun_no_spec() -> fun no_spec/0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be solved by this commit xxdavid@13d6893
You can cherry pick it if you want.

Comment on lines +2641 to +2651
{InferredTy, NewEnv} = type_check_expr(Env, Expr),
?verbose(Env, "~sChecking that ~ts :: ~ts~n",
[gradualizer_fmt:format_location(Expr, brief), erl_prettypr:format(Expr), typelib:pp_type(ResTy)]),
NormResTy = normalize(ResTy, Env),
R = ?throw_orig_type(do_type_check_expr_in(Env, NormResTy, Expr), ResTy, NormResTy),
?assert_type(R, env()).
[gradualizer_fmt:format_location(Expr, brief),
erl_prettypr:format(Expr),
typelib:pp_type(ResTy)]),
case subtype(InferredTy, ResTy, NewEnv) of
true ->
NewEnv;
false ->
throw(type_error(Expr, InferredTy, ResTy))
end.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of the art in writing typecheckers is to do it the other way around, compared to what you've done here. Always pass in the type that's expected, though if we don't know the type just pass a variable. You'll likely get better propagation of types that way.

See e.g. slide 43 in this presentation by Simon Peyton Jones: https://haskell.foundation/assets/other/Simon%20Peyton%20Jones%20-%202023-06-08%20-%20Type%20inference%20in%20GHC%20Jun%2023.pdf
Video can be found here: https://haskell.foundation/events/2023-ghc-development-workshop.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants