Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vm: replace jump table with switch #479

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

serprex
Copy link

@serprex serprex commented Mar 12, 2024

A couple years ago the go compiler finally implemented jumptables for switch:
https://go-review.googlesource.com/c/go/+/357330

In order to avoid function call overhead of evalInstruction, a mainLoop/mainLoopWithContext are combined for inlining

Benchmarks done with: AMD Ryzen 7 7840U w/ Radeon 780M Graphics

before:

BenchmarkCallFrameStackPushPopAutoGrow-16                 549974              2160 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   2073770               575.6 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        24444289                50.84 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           56415529                21.05 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2096766               568.6 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  746608              2117 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2377479               498.8 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2349202               498.5 ns/op
BenchmarkRegistryPushPopAutoGrow-16                        10000            110720 ns/op
BenchmarkRegistryPushPopFixed-16                           10887            110825 ns/op
BenchmarkRegistrySetTop-16                                297716              4107 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.827s

with only switch:

BenchmarkCallFrameStackPushPopAutoGrow-16                 575610              2178 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   2014279               602.0 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        23916462                51.39 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           58842040                21.84 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2033984               571.5 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  741211              1685 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2379696               503.4 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2392480               500.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16                         9115            112333 ns/op
BenchmarkRegistryPushPopFixed-16                           10574            111390 ns/op
BenchmarkRegistrySetTop-16                                294739              4028 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.605s

with combined main loops:

BenchmarkCallFrameStackPushPopAutoGrow-16                 555296              2249 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   1945405               588.3 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        23535645                51.98 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           60960530                21.30 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2071460               590.7 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  705776              1691 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2370794               508.9 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2382297               513.4 ns/op
BenchmarkRegistryPushPopAutoGrow-16                        10000            110263 ns/op
BenchmarkRegistryPushPopFixed-16                           10845            110933 ns/op
BenchmarkRegistrySetTop-16                                295574              4091 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.584s

with evalInstruction inlined & lifting reg assignment out of loop:

BenchmarkCallFrameStackPushPopAutoGrow-16                 573594              2150 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   1806942               674.8 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        24414471                51.14 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           59304620                18.99 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2063114               596.8 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  650049              1675 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2300353               511.5 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2361276               513.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16                         9720            114780 ns/op
BenchmarkRegistryPushPopFixed-16                            9316            114070 ns/op
BenchmarkRegistrySetTop-16                                289208              4033 ns/op
PASS
ok      github.com/yuin/gopher-lua      23.390s

@serprex
Copy link
Author

serprex commented Mar 12, 2024

bit off topic, but the Benchmarks in wiki are quite outdated. Despite golang improvements since 1.7 (seen by now only being ~4x slower than upstream lua), python3's performance has greatly improved since then. I reran for lua/luajit/glua/python3

> time lua _glua-tests/fib35.lua
real    0m0.358s
user    0m0.354s
sys     0m0.004s

> time luajit _glua-tests/fib35.lua
real    0m0.052s
user    0m0.040s
sys     0m0.004s

> time python _glua-tests/fib35.py
real    0m0.813s
user    0m0.804s
sys     0m0.004s

> time ./glua _glua-tests/fib35.lua # this PR
real    0m1.698s
user    0m1.692s
sys     0m0.007s

> time ./glua _glua-tests/fib35.lua # master
real	0m1.732s
user	0m1.706s
sys	0m0.004s

Granted it's a rather synthetic benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant