-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Templatize enregisterLocalVars
in various methods of LSRA
#85144
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsLet's see how much TP gain we get and if it is even worth pursuing. Fixes: #82848
|
Will need to wait until we have superpmi collection ready. |
Here are the latest numbers. I have no idea why linux-x64 numbers are exactly opposite of windows-x64 numbers, but i am not even sure how much we could rely on linux-x64 numbers as I mentioned in #85144 (comment). I will try out templatizing |
@dotnet/jit-contrib |
The TP diffs are quite mixed, with big improvements and big regressions. Is it worth it? |
From my understanding, the ones showed for windows are true/real and remaining all are fabricated/artificial IMO because those binaries are not even compiled with VC++ and it is very hard to believe the numbers flip between windows and altjit binaries.
Probably. We are still eliminating branches in lot of hot code (e.g. #85144 (comment)) and some of that is eliminated from the iterations over |
@BruceForstall - any thoughts on this? |
Reminds me of @tannergooding comment in #83648 (comment) |
[There's some bug with the Diffs Extension page where it's showing multiple diffs outputs. I guess that's because the diffs was run twice in the same PR] The MinOpts TP numbers look good for all platforms except linux-x64. The FullOpts TP numbers are mixed for all platforms (regression for coreclr_tests especially, mostly improvements otherwise), but pure regression for linux-x64. So:
@jakobbotsch There's an interesting point here: can we create a linux-x64 build of our instruction counting PIN tool, and add native linux-x64 TP measurement, such that we're measuring the clang-built RyuJIT? PIN does support linux-x64 builds. We could keep doing the linux-x64 cross-compiler TP which we could use to validate how representative our cross-compiler TP measurements are (by manually comparing the two runs). We currently don't do any SPMI AzDO pipeline runs on Linux, but it should work. |
Yes, and I don't have any explanation to why that must be happening specially the windows numbers shows the opposite result and both windows/linux (cross-compiled) binaries are produced by same VC++ compiler.
Yes, I think that too. My point is that VC++ compiler could have made certain decisions that is increasing the number of instructions executed and eliminating branch instructions are included in that number. Even if number of instructions might have increased, the number of branches (which contains higher latency and branch mis-predictions) are reduced (as shown in my analysis #85144 (comment)) and that should lead to better execution performance. Current TP diff can no way find out if execution speed has increased or not because we don't take into account various details of instruction pipeline and their latency/throughput. I feel that the TP diff is a good yardstick to guide us if the change could actually cause TP impact or not, but we need to interpret those numbers on case-by-case basis. EDIT:
I am not sure how to find that out. Is there a way to do that? |
Yes it would be possible (and we have talked about it before), it's just work. I've opened #85749 to track it. |
My typical trick is the following: diff --git a/src/coreclr/tools/superpmi/superpmi/superpmi.cpp b/src/coreclr/tools/superpmi/superpmi/superpmi.cpp
index c0d64df1acc..e31e423bf15 100644
--- a/src/coreclr/tools/superpmi/superpmi/superpmi.cpp
+++ b/src/coreclr/tools/superpmi/superpmi/superpmi.cpp
@@ -607,6 +607,12 @@ int __cdecl main(int argc, char* argv[])
break;
case NearDifferResult::SuccessWithoutDiff:
+ PrintDiffsCsvRow(
+ diffCsv,
+ reader->GetMethodContextIndex(),
+ mcb.size,
+ baseMetrics.NumCodeBytes, diffMetrics.NumCodeBytes,
+ baseMetrics.NumExecutedInstructions, diffMetrics.NumExecutedInstructions);
break;
case NearDifferResult::Failure:
if (o.mclFilename != nullptr) Then you can invoke superpmi.exe with
You can open It would be nice to make this easier.. I opened #85755 to track that. |
086324d
to
32b9a30
Compare
@BruceForstall @jakobbotsch - how do we fix this? I assume the jitrollingbuild should be using |
Also hitting this although the |
It will fix itself. #84676 changed builds to use clang 16, but the rolling build hasn't run since then (it's running right now). After it builds, you'll need to merge & push (I think) so your baseline will be a newly built JIT. |
While we should protect against division by zero in the code, there's probably something more fundamental failing. What run was the python crash from? I see your diffs run for win-x64 has:
|
Yes, this is the same error I am seeing in |
The TP diffs look like a pure in on linux-x64 native, which doesn't match the linux-x64 cross TP. Odd. The only regression still is the slight win-x86 regression. |
Let's see how much TP gain we get and if it is even worth pursuing.
Fixes: #82848