matrix4x4createfromaxisangletest failing on NativeAOT #72149

runfoapp · 2022-07-14T00:27:21Z

Frequency - last 30 days as of 8/27:

8/26 - Rolling run 1971318 (7.0) - 4x hits
8/22 - PR 1959582 (main) - PR Enable Console Logging test in NativeAOT #74381 - 4x hits
8/19 - PR 1953466 - PR #52851 skip CurrentDirectorySet on MacCatalyst #74228 - 4x hits
8/15 - Rolling run 1943460 - 4x hits
7/28-8/26 - 57x Rolling runs of release/7.0-preview7 - see last 30 days
- Note: Except Rolling run 1913180 with 5x hits, all other runs had 1x hit

Note: main branch failures in last 30 days

Runfo report

Build	Kind	Start Time
1871493	PR 71819	2022-11-07
1871687	Rolling	2022-11-07
1871847	PR 71819	2022-11-07
1872236	PR 71948	2022-11-07
1872339	PR 71943	2022-11-07
1873007	Rolling	2022-11-07
1873836	PR 71988	2022-12-07
1873835	PR 71175	2022-12-07
1874128	PR 71385	2022-12-07
1874181	PR 71187	2022-12-07
1874234	Rolling	2022-12-07
1874881	PR 71971	2022-12-07
1875008	PR 71385	2022-12-07
1877271	PR 72078	2022-13-07
1877160	Rolling	2022-13-07
1877428	Rolling	2022-13-07
1877456	PR 71948	2022-13-07
1878318	PR 72108	2022-13-07
1878499	Rolling	2022-13-07
1878950	PR 71385	2022-13-07
1878990	Rolling	2022-13-07
1880070	Rolling	2022-14-07
1880254	Rolling	2022-14-07
1880316	PR 72167	2022-14-07
1880554	PR 72091	2022-14-07
1880908	PR 72184	2022-14-07
1881247	Rolling	2022-14-07
1881678	Rolling	2022-14-07
1882597	Rolling	2022-15-07
1882765	PR 72145	2022-15-07
1882819	Rolling	2022-15-07
1882810	PR 72252	2022-15-07
1882817	PR 72236	2022-15-07
1884136	Rolling	2022-15-07

dotnet-issue-labeler · 2022-07-14T00:27:24Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

jkotas · 2022-07-14T00:28:09Z

cc @LakshanF

jtschuster · 2022-07-14T21:19:58Z

Deleted duplicates in runfo, didn't mean to close this

LakshanF · 2022-07-14T21:38:39Z

#72108 has the RR passing (including the Linux-arm64 platform). Waiting for some checks to complete in the runtime pipeline that seems to be due to service outage

LakshanF · 2022-07-15T11:02:53Z

The NativeAOT rolling runs are passing with #72108

vitek-karas · 2022-07-19T08:18:45Z

Failed again in #71485

MichalStrehovsky · 2022-07-19T08:24:56Z

Blocking the test in #72437.

The failure looks non-determinstic. We've seen it fail on Windows and Linux x64 (I didn't go over all the above failures, but it might not be a problem on ARM64).

The test is using Vectors. I wonder if there's a problem with preservation of XMM registers or something like that.

Cc @VSadov as this might be in your area of interest. (This has been happening for a while so it's not from the Linux suspension PR that just merged.)

VSadov · 2022-07-19T13:47:38Z

Have we seen this failure on win-x64?

I would suspect preserving of the xmm result in 2-register return case. Win-x64 would not be affected by that.
Otherwise it is something else, but likely still related to preserving xmm results.

vitek-karas · 2022-07-19T17:23:38Z

It failed for me on win-x64: https://dev.azure.com/dnceng/public/_build/results?buildId=1888829&view=ms.vss-test-web.build-test-results-tab&runId=49307428&resultId=129815&paneView=debug

VSadov · 2022-07-20T17:23:35Z

I've looked through a few failures and it looks like the failures are x64-specific. They were seen on Linux and Windows, but I have not seen a failure on ARM64

This is interesting because Windows and Linux have different specifics in terms of preserving xmm registers (i.e. in Linux all xmm are volatile, but not on Windows, on the other hand Linux can return two xmm registers from a call)

It is hard to see commonality here, that would also be specific to NativeAOT.
The only thing that I can think of is that, I believe, NativeAOT disables use of AVX by default, so vector math may be emitted differently.

VSadov · 2022-08-28T00:51:53Z

now I can build the same commit locally and see what the diffs are in the bits or when executing/debugging

VSadov · 2022-08-28T02:11:45Z

The difference start when the following line is executed:

            expected = Matrix4x4.CreateFromQuaternion(Quaternion.CreateFromAxisAngle(Vector3.Normalize(Vector3.One), radians));

LOADS DIFFERENT DATA ==> 00007FF763CB7E5E  movups      xmm0,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+40h (07FF7647DF070h)]  
00007FF763CB7E65  movups      xmm1,xmmword ptr [__readonlydata_System_Numerics_Vectors_Tests_System_Numerics_Tests_Matrix4x4Tests__Matrix4x4CreateFromAxisAngleTest+50h (07FF7647DF080h)]  
00007FF763CB7E6C  andps       xmm1,xmm0  
00007FF763CB7E6F  mulps       xmm0,xmm1  
00007FF763CB7E72  movaps      xmm1,xmm0  
00007FF763CB7E75  shufps      xmm1,xmm0,0B1h

The first movups loads some static value (I suspect it is Vector.One ?)

My locally built exe, that passes the test, loads

XMM0 = DDDDDDDD3F800000-3F8000003F800000

The failing exe from the lab build loads:

XMM0 = FFFFFFFF3F800000-3F8000003F800000

3f800000 is float32 1.0 , so we have three 1.0 floats. Not sure what the rest is. Garbage? it seems it makes a difference.

@MichalStrehovsky
I think someone else should take a look. I have only very vague idea where these __readonlydata_ things come from.

jkotas · 2022-08-28T03:33:30Z

Vector3 is 12 bytes, but the readonly data that Vector3.One is loaded from is 16 bytes. The JIT leaves the top 4 bytes of the readonly data uninitialized that leads to failures depending on what the uninitialized data happened to be. This looks like a JIT bug. @dotnet/jit-contrib Could you please take a look?

The questions to ask:

Are the top 4 bytes of xmm register that holds Vector3 data required to be initialized to 0?
Is the JIT expected to initialize the top 4 bytes of the Vector3 readonly data to 0?

jkotas · 2022-08-28T03:35:05Z

This is silent bad codegen. We should fix it for .NET 7.

ghost · 2022-08-28T03:35:49Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Frequency - last 30 days as of 8/27:

8/26 - Rolling run 1971318 (7.0) - 4x hits
8/22 - PR 1959582 (main) - PR Enable Console Logging test in NativeAOT #74381 - 4x hits
8/19 - PR 1953466 - PR #52851 skip CurrentDirectorySet on MacCatalyst #74228 - 4x hits
8/15 - Rolling run 1943460 - 4x hits
7/28-8/26 - 57x Rolling runs of release/7.0-preview7 - see last 30 days
- Note: Except Rolling run 1913180 with 5x hits, all other runs had 1x hit

Note: main branch failures in last 30 days

Runfo report

Build	Kind	Start Time
1871493	PR 71819	2022-11-07
1871687	Rolling	2022-11-07
1871847	PR 71819	2022-11-07
1872236	PR 71948	2022-11-07
1872339	PR 71943	2022-11-07
1873007	Rolling	2022-11-07
1873836	PR 71988	2022-12-07
1873835	PR 71175	2022-12-07
1874128	PR 71385	2022-12-07
1874181	PR 71187	2022-12-07
1874234	Rolling	2022-12-07
1874881	PR 71971	2022-12-07
1875008	PR 71385	2022-12-07
1877271	PR 72078	2022-13-07
1877160	Rolling	2022-13-07
1877428	Rolling	2022-13-07
1877456	PR 71948	2022-13-07
1878318	PR 72108	2022-13-07
1878499	Rolling	2022-13-07
1878950	PR 71385	2022-13-07
1878990	Rolling	2022-13-07
1880070	Rolling	2022-14-07
1880254	Rolling	2022-14-07
1880316	PR 72167	2022-14-07
1880554	PR 72091	2022-14-07
1880908	PR 72184	2022-14-07
1881247	Rolling	2022-14-07
1881678	Rolling	2022-14-07
1882597	Rolling	2022-15-07
1882765	PR 72145	2022-15-07
1882819	Rolling	2022-15-07
1882810	PR 72252	2022-15-07
1882817	PR 72236	2022-15-07
1884136	Rolling	2022-15-07

Author:	runfoapp[bot]
Assignees:	VSadov
Labels:	`area-CodeGen-coreclr`, `area-NativeAOT-coreclr`
Milestone:	7.0.0

BruceForstall · 2022-08-28T03:59:10Z

cc @tannergooding

EgorBo · 2022-08-28T07:14:04Z

We try to promote 12b simd to 16b everywhere where possible so I guess the data section's value has to be 16 bytes with zeroed last 4 bytes indeed

tannergooding · 2022-08-28T15:58:46Z

Are the top 4 bytes of xmm register that holds Vector3 data required to be initialized to 0?
Is the JIT expected to initialize the top 4 bytes of the Vector3 readonly data to 0?

We don't have any strict requirement here. For most operations (add, and, subtract, etc) the actual functionality is "element-wise" and so the value of the top 4-bytes doesn't matter.

The few cases where it crosses lanes (such as DotProduct or Sum) we're supposed to ensure the upper bits are zeroed or ignored. In the case of DotProduct we emit the encoding that says to only use the lowest 3 elements.

tannergooding · 2022-08-28T16:00:38Z

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

MichalStrehovsky · 2022-08-28T23:21:25Z

Given this is NAOT only and NAOT must target a lower baseline

If we don't have any other test coverage for the lower baselines, it might be a good quality week topic. With NAOT one at least has a deterministic repro once a bad binary has been produced. Running into this in a JIT configuration would be a lot more difficult to root cause.

This is not the first issue caused by this test hole - there were #72081 and #72158 just in the past month. NAOT covers the lowest denominator but there's a whole spectrum in between the lowest denominator and whatever machines we run the tests on that can happen in the real world and it looks like we don't test.

tannergooding · 2022-08-28T23:45:29Z

We do test, but only in the outer loop and only for the runtime tests. There has been a longstanding issue to also run the library tests under the various JIT Stress options for ISA enablement

JulieLeeMSFT · 2022-08-29T16:29:54Z

Given this is NAOT only and NAOT must target a lower baseline, its possible there is a bug in the dpps emulation emitted for SSE2 only hardware that isn't correctly accounting for this.

I'll take a look tomorrow.

Assigning to Tanner.

jkotas · 2022-08-30T13:48:32Z

@tannergooding Did you have a chance to take a look?

tannergooding · 2022-08-30T14:03:19Z

Yes. but it doesn't look to be in the Vector3.Dot emulation.

The relevant logic is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3350

We pick Sse.Multiply, Sse3.HorizontalAdd -or- Sse.Shuffle, and Sse.Add and then the very first thing we do is mask things so that the unused component is zero: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lowerxarch.cpp#L3418

Notably we could do this "more efficiently" now that we have GT_CNS_VEC, but that's an unrelated JIT throughput optimization

tannergooding · 2022-08-30T14:07:39Z

The called Matrix and Quaternion methods aren't accelerated, just the Vector3.Normalize call, the various field accesses, the UnitX/Y/Z property accesses, and the Vector3 constructor call.

The test doesn't really use any special tricks or code paths either:

runtime/src/libraries/System.Numerics.Vectors/tests/Matrix4x4Tests.cs

Line 582 in 4c24000

public void Matrix4x4CreateFromAxisAngleTest()

jkotas · 2022-08-31T00:37:15Z

Here is codegen for Vector3.Length that I believe is the root cause of the problem:

[MethodImpl(MethodImplOptions.NoInlining)]
static float f(Vector3 v)
  => v.Length();

movss   xmm0,dword ptr [rcx+8] ds:00000052`9e6ff828=3f800000
movsd   xmm1,mmword ptr [rcx]
shufps  xmm1,xmm0,44h
                 
// xmm1 has Vector3. The top 4 bytes of xmm1 have undefined value. They happen to be well defined in this specific case, but the JIT assumes the general case where they are undefined.

movups  xmm0,xmmword ptr [xxx!_readonlydata_xxx_Program____Main___g__f_0_0 (00007ff6`59060ee0)] // 0x0000 FFFF FFFF FFFF - mask for the top 4 bytes
andps   xmm0,xmm1 // xmm0 has properly masked Vector3
mulps   xmm0,xmm1 // We are multiplying properly masked Vector3 with unmasked Vector3!

movaps  xmm1,xmm0
shufps  xmm1,xmm0,0B1h
addps   xmm1,xmm0
movaps  xmm0,xmm1
shufps  xmm0,xmm1,4Eh
addps   xmm0,xmm1
sqrtss  xmm0,xmm0
ret

I think mulps xmm0,xmm1 is wrong. It should be mulps xmm0,xmm0 so that we multiply the masked values. The bad code works most of the time (0 multiplied by undefined value is zero) except when the undefined value happens to be NaN that will turn the whole thing into NaN.

Does this analysis make sense? Can you take it from here?

tannergooding · 2022-08-31T02:14:43Z

Yes, that makes sense. It should definitely be multiplying tmp * tmp, rather than tmp * original.

I'll see about getting a fix up first thing tomorrow.

ghost added the untriaged New issue has not been triaged by the area owner label Jul 14, 2022

agocke changed the title ~~system.numerics.tests.matrix4x4tests.matrix4x4createfromaxisangletest~~ matrix4x4createfromaxisangletest failing on NativeAOT Jul 14, 2022

agocke added the blocking-clean-ci-optional Blocking optional rolling runs label Jul 14, 2022

jkotas added the area-NativeAOT-coreclr label Jul 14, 2022

runfoapp bot mentioned this issue Jul 14, 2022

Infrastructure - Status/Health #702

Closed

runfoapp bot closed this as completed Jul 14, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Jul 14, 2022

jtschuster reopened this Jul 14, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Jul 14, 2022

LakshanF self-assigned this Jul 14, 2022

LakshanF closed this as completed Jul 15, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Jul 15, 2022

MichalStrehovsky mentioned this issue Jul 18, 2022

Make Enum.GetValues AOT-safe #72236

Closed

vitek-karas mentioned this issue Jul 19, 2022

Linker to NativeAOT sync #71485

Merged

vitek-karas reopened this Jul 19, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Jul 19, 2022

MichalStrehovsky added this to the 7.0.0 milestone Jul 19, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Jul 19, 2022

LakshanF removed their assignment Jul 19, 2022

MichalStrehovsky removed the blocking-clean-ci-optional Blocking optional rolling runs label Jul 19, 2022

jkotas modified the milestones: 8.0.0, 7.0.0 Aug 28, 2022

jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-NativeAOT-coreclr labels Aug 28, 2022

JulieLeeMSFT assigned tannergooding and unassigned VSadov Aug 29, 2022

jkotas mentioned this issue Aug 31, 2022

Fix use of uninitialized memory for Vector3 constants #74857

Merged

tannergooding mentioned this issue Sep 1, 2022

Ensure that the SSE fallback for Vector3.Dot masks off the unused element of op1 and op2 #74932

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 1, 2022

tannergooding closed this as completed in #74932 Sep 2, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 2, 2022

karelz mentioned this issue Sep 8, 2022

[release/7.0] Ensure that the SSE fallback for Vector3.Dot masks off the unused element of op1 and op2 #74980

Merged

ghost locked as resolved and limited conversation to collaborators Oct 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matrix4x4createfromaxisangletest failing on NativeAOT #72149

matrix4x4createfromaxisangletest failing on NativeAOT #72149

runfoapp bot commented Jul 14, 2022 •

edited by karelz

Loading

dotnet-issue-labeler bot commented Jul 14, 2022

jkotas commented Jul 14, 2022

jtschuster commented Jul 14, 2022

LakshanF commented Jul 14, 2022

LakshanF commented Jul 15, 2022

vitek-karas commented Jul 19, 2022

MichalStrehovsky commented Jul 19, 2022

VSadov commented Jul 19, 2022

vitek-karas commented Jul 19, 2022

VSadov commented Jul 20, 2022

VSadov commented Aug 28, 2022

VSadov commented Aug 28, 2022 •

edited

Loading

jkotas commented Aug 28, 2022

jkotas commented Aug 28, 2022

ghost commented Aug 28, 2022

Runfo report

BruceForstall commented Aug 28, 2022

EgorBo commented Aug 28, 2022 •

edited

Loading

tannergooding commented Aug 28, 2022

tannergooding commented Aug 28, 2022

MichalStrehovsky commented Aug 28, 2022

tannergooding commented Aug 28, 2022

JulieLeeMSFT commented Aug 29, 2022

jkotas commented Aug 30, 2022

tannergooding commented Aug 30, 2022

tannergooding commented Aug 30, 2022

jkotas commented Aug 31, 2022 •

edited

Loading

tannergooding commented Aug 31, 2022

matrix4x4createfromaxisangletest failing on NativeAOT #72149

matrix4x4createfromaxisangletest failing on NativeAOT #72149

Comments

runfoapp bot commented Jul 14, 2022 • edited by karelz Loading

Runfo report

dotnet-issue-labeler bot commented Jul 14, 2022

jkotas commented Jul 14, 2022

jtschuster commented Jul 14, 2022

LakshanF commented Jul 14, 2022

LakshanF commented Jul 15, 2022

vitek-karas commented Jul 19, 2022

MichalStrehovsky commented Jul 19, 2022

VSadov commented Jul 19, 2022

vitek-karas commented Jul 19, 2022

VSadov commented Jul 20, 2022

VSadov commented Aug 28, 2022

VSadov commented Aug 28, 2022 • edited Loading

jkotas commented Aug 28, 2022

jkotas commented Aug 28, 2022

ghost commented Aug 28, 2022

Runfo report

BruceForstall commented Aug 28, 2022

EgorBo commented Aug 28, 2022 • edited Loading

tannergooding commented Aug 28, 2022

tannergooding commented Aug 28, 2022

MichalStrehovsky commented Aug 28, 2022

tannergooding commented Aug 28, 2022

JulieLeeMSFT commented Aug 29, 2022

jkotas commented Aug 30, 2022

tannergooding commented Aug 30, 2022

tannergooding commented Aug 30, 2022

jkotas commented Aug 31, 2022 • edited Loading

tannergooding commented Aug 31, 2022

runfoapp bot commented Jul 14, 2022 •

edited by karelz

Loading

VSadov commented Aug 28, 2022 •

edited

Loading

EgorBo commented Aug 28, 2022 •

edited

Loading

jkotas commented Aug 31, 2022 •

edited

Loading