Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Pair bug #77

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Fix Pair bug #77

wants to merge 16 commits into from

Conversation

haberdashPI
Copy link
Member

@haberdashPI haberdashPI commented Oct 9, 2024

Description

edit: I've tracked down the original problem from #75 and #71 to how transformer was implemented for Pair.

TL;DR: transformer for Pair returned a tuple. This means that the types for a Pair field can go from Any to the concrete type of a value in the Tuple. This violates the assumptions of hoist_type=true. To fix the problem, Pair is now treated as an UnorderedStruct rather than as a DictType.

  • is it possible for isconcretetype to fail during hash_fields: if so, what's an example that fails without the bufix???
    • I don't think this is possible: isconcretetype works so long as hash_fields / hash_elements gets to visit the actual types of the original object: it normally sees all relevant, nested type parameters — problems arise when transformer causes a change in the inferred types and host_type=true
  • update benchmarks

isconcretetype(Vector{Any}) == true causes some problems for StableHashTraits (#75). At the end of the day, to use the fast branch of hashing (with type hoisting) we need to know that the current type and all contained types (where containment is defined in terms of StructType) must be concrete to use the fast type-hashing branch.

Because the code incorrectly used isconcretetype to use the fast branch, this means that before the current PR:

julia> stable_hash(Dict{Symbol,Any}(:a=>Foo(1)); version=4) == stable_hash(Dict{Symbol,Any}(:a=>Bar(1)); version=4)
true

julia> stable_hash(Pair{Symbol,Any}(:a,Foo(1)); version=4) == stable_hash(Pair{Symbol,Any}(:a,Bar(1)); version=4)
true

julia> stable_hash(Pair{Symbol,Any}[:a=>Foo(1)]; version=4) == stable_hash(Pair{Symbol,Any}[:a=>Bar(1)]; version=4)
true

This PR implements is_fully_concrete to check the contained types for concreteness, caching results to avoid visiting a Type structure repeatedly.

FYI for @rasmushenningsson: is_fully_concrete signals Union as being non-concrete. StableHashTraits handles union splitting for arrays only, but has not yet solved for the general case of fast hashing of generic, union-split data structures. (And this PR does not attempt to solve that problem.)

Benchmarks

I've made a few QOL improvements to the benchmarks, and added a test for deeply nested type-unstable values. So these numbers are not perfectly comparable to the older tests (but what we see below looks a lot like the results for #58).

32×6 DataFrame
 Row │ benchmark   hash       version    base        trait       ratio
     │ SubStrin…   SubStrin…  SubStrin…  String      String      Float64
─────┼─────────────────────────────────────────────────────────────────────
   1 │ structs     crc        3          59.209 μs   1.177 ms    19.8759
   2 │ tuples      crc        3          59.083 μs   934.791 μs  15.8217
   3 │ numbers     crc        3          30.750 μs   140.459 μs   4.56777
   4 │ dataframes  crc        3          71.625 μs   290.375 μs   4.0541
   5 │ dicts       crc        3          1.809 ms    7.216 ms     3.98913
   6 │ missings    crc        3          324.250 μs  206.083 μs   0.635568
   7 │ strings     crc        3          1.103 ms    645.875 μs   0.585717
   8 │ symbols     crc        3          1.315 ms    728.042 μs   0.553626
   9 │ structs     sha256     3          505.666 μs  3.122 ms     6.17321
  10 │ tuples      sha256     3          505.750 μs  2.604 ms     5.14904
  11 │ dicts       sha256     3          2.301 ms    9.977 ms     4.33518
  12 │ numbers     sha256     3          253.083 μs  364.375 μs   1.43975
  13 │ dataframes  sha256     3          520.209 μs  743.042 μs   1.42835
  14 │ strings     sha256     3          2.050 ms    2.222 ms     1.08379
  15 │ symbols     sha256     3          2.232 ms    2.211 ms     0.99061
  16 │ missings    sha256     3          591.708 μs  521.666 μs   0.881627
  17 │ dicts       crc        4          1.831 ms    106.825 ms  58.3413
  18 │ structs     crc        4          59.041 μs   723.416 μs  12.2528
  19 │ tuples      crc        4          59.000 μs   490.334 μs   8.31075
  20 │ numbers     crc        4          30.750 μs   143.792 μs   4.67616
  21 │ dataframes  crc        4          72.208 μs   327.542 μs   4.53609
  22 │ symbols     crc        4          1.315 ms    1.478 ms     1.12386
  23 │ missings    crc        4          336.916 μs  254.833 μs   0.75637
  24 │ strings     crc        4          1.096 ms    398.208 μs   0.363384
  25 │ dicts       sha256     4          2.306 ms    113.127 ms  49.0602
  26 │ structs     sha256     4          505.625 μs  1.673 ms     3.3096
  27 │ tuples      sha256     4          505.667 μs  1.461 ms     2.88876
  28 │ dataframes  sha256     4          518.875 μs  784.833 μs   1.51257
  29 │ numbers     sha256     4          252.875 μs  369.709 μs   1.46202
  30 │ symbols     sha256     4          2.253 ms    2.793 ms     1.23962
  31 │ missings    sha256     4          582.250 μs  504.375 μs   0.866252
  32 │ strings     sha256     4          2.058 ms    1.696 ms     0.824243

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.56%. Comparing base (65b918c) to head (dc3b6bb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   93.57%   93.56%   -0.01%     
==========================================
  Files           7        7              
  Lines         669      668       -1     
==========================================
- Hits          626      625       -1     
  Misses         43       43              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@haberdashPI haberdashPI marked this pull request as ready for review October 9, 2024 21:01
@haberdashPI haberdashPI changed the title Fix isconcretetype bug Fix Pair bug Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant