Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Some cleanup of assembly helpers in hijack area. #72542

Merged
merged 17 commits into from
Jul 22, 2022

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Jul 20, 2022

Mostly removing stuff to simplify porting to unix-arm64

  • Deleting some dead code
  • Merged macros with too few uses into callers, where it made the code more clear or similar across platforms.
  • Making win-x64 to use the same shape of hijack helpers as other working implementations
    (re:[NativeAOT] thread suspension on Unix #71187 (comment))
  • Making RhpGcProbeHijack to be similar between win-x64, unix-x64, win-arm64.
    Especially the win-arm64 implementation that had many differences.
  • Added/tweaked a few comments

There should be no new functionality or any behavior changes here. Just deleting/refactoring code.

@dotnet dotnet deleted a comment from azure-pipelines bot Jul 21, 2022
@dotnet dotnet deleted a comment from azure-pipelines bot Jul 21, 2022
@VSadov VSadov marked this pull request as ready for review July 21, 2022 20:11
@VSadov VSadov requested review from janvorli and jkotas July 22, 2022 03:49
@@ -320,7 +320,7 @@ C_FUNC(\Name):
DEFAULT_FRAME_SAVE_FLAGS = PTFF_SAVE_ALL_PRESERVED + PTFF_SAVE_RSP

.macro PUSH_COOP_PINVOKE_FRAME trashReg
push_nonvol_reg rbp // push RBP frame // TODO: do we need this? not on windows.
Copy link
Member Author

@VSadov VSadov Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and Windows debuggers do not have issues with unwinding without introducing rbp frame, so I did not change anything.

@@ -1413,7 +1352,7 @@ COOP_PINVOKE_HELPER(void, RhpReversePInvoke, (ReversePInvokeFrame * pFrame))
if (pCurThread->InlineTryFastReversePInvoke(pFrame))
return;

RhpReversePInvokeAttachOrTrapThread2(pFrame);
pCurThread->ReversePInvokeAttachOrTrapThread(pFrame);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the C++ code generated for this going to be as good as before? It looks like this was carefully setup to allow the slow path to be tailcalled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slow path would be very rare. It handles a case when a thread has not seen managed code yet.
I thought that the helper was added to call from assembly and now we don't

I will check if we are still tailcalling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c++ actually inlines RhpReversePInvokeAttachOrTrapThread2 after the change.
The impact on the fast part of the method is not too bad, but keeping the slow path separate still has some minor benefits like pushing fewer registers in prolog.

I will revert this part.

for (size_t i = 0; i < ARRAY_SIZE(NormalHijackTargets); i++)
{
if (NormalHijackTargets[i] == address)
if (&RhpGcProbeHijack == address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation looks off.

;; Build a map of symbols representing offsets into the transition frame (see PInvokeTransitionFrame in
;; rhbinder.h) and keep these two in sync.
map 0
field OFFSETOF__PInvokeTransitionFrame__m_PreservedRegs
field 10 * 8 ; x19..x28
m_CallersSP field 8 ; SP at routine entry
field 19 * 8 ; x0..x18
Copy link
Member Author

@VSadov VSadov Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no point in saving/restoring volatile registers as at return site they contain garbage.
lr,fp are saved as a part of stack alloc/dealloc, so no need to save lr again.
NZCV was not actually stored and there is no need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to save returns though as they may have live data.

jnz LOCAL_LABEL(DoRhpGcProbe)
ret

LOCAL_LABEL(DoRhpGcProbe):
or ecx, DEFAULT_FRAME_SAVE_FLAGS + PTFF_SAVE_RAX + PTFF_SAVE_RDX
jmp C_FUNC(RhpGcProbe)
NESTED_END RhpGcProbeHijack, _TEXT

NESTED_ENTRY RhpGcProbe, _TEXT, NoHandler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RhpGcProbe does not match what the funciton does now that RhpTrapThreads check was removed from it. Should it be called RhpWaitForGC instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RhpGcProbe was not the best name.
The actual purpose of the function is to set up the stackwalk-enabling frame around the RhpWaitForGC2, which does the actual waiting. It is hard to capture that in couple words.

RhpWaitForGC seems a bit better. The function does typically result in waiting for GC.

@@ -326,8 +286,8 @@ __PPF_ThreadReg SETS "r2"
NESTED_ENTRY RhpGcProbeRare
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete RhpGcProbe and rename RhpGcProbeRare to RhpGcProbe to match other platforms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not put much effort in keeping 32bit platforms in sync. I only did some easy opportunistic changes.

I think we may be iterating on 64bit implementation a bit more and keeping 32bit behind is not a big deal. When/if we get to 32bit, absorbing all the changes should not be hard. It might even be that porting 64bit code, that works and tested, would be a shorter path than adjusting existing 32bit code. Probably it will be a mix of both approaches.

I will check if these changes are easy to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the entry points so that all platforms look the same - one RhpGcProbeHijack that checks for the trap flag, and if set, sets bits for saved registers and calls RhpWaitForGC. It might even work when we get to 32bit.

@@ -198,7 +162,8 @@ RhpGcProbe proc
SynchronousRendezVous:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete RhpTrapThreads check to match other platforms?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments. Thank you!

@VSadov
Copy link
Member Author

VSadov commented Jul 22, 2022

/azp run runtime-extra-platforms

@dotnet dotnet deleted a comment from azure-pipelines bot Jul 22, 2022
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Jul 22, 2022

Thanks!!

@VSadov VSadov merged commit 41def7c into dotnet:main Jul 22, 2022
@VSadov VSadov deleted the cleanup1 branch July 22, 2022 14:44
@ghost ghost locked as resolved and limited conversation to collaborators Aug 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants