Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Assert when compiling System.Runtime.Tests #73027

Closed
VSadov opened this issue Jul 28, 2022 · 6 comments · Fixed by #73085
Closed

[NativeAOT] Assert when compiling System.Runtime.Tests #73027

VSadov opened this issue Jul 28, 2022 · 6 comments · Fixed by #73085

Comments

@VSadov
Copy link
Member

VSadov commented Jul 28, 2022

The assert happens with current bits from main.
I hoped that it could be somehow caused by the barrier issue that was fixed in #72919.

However the assert reproduces after the fix has been merged and appears to be deterministic.

It looks like it is happening in the constructor of DataflowAnalyzedMethodNode

        public DataflowAnalyzedMethodNode(MethodIL methodIL)
        {
            Debug.Assert(methodIL.OwningMethod.IsTypicalMethodDefinition);
here ==>    Debug.Assert(!CompilerGeneratedState.IsNestedFunctionOrStateMachineMember(methodIL.OwningMethod));
            _methodIL = methodIL;
        }
  Process terminated. Assertion failed.
     at ILCompiler.DependencyAnalysis.DataflowAnalyzedMethodNode..ctor(MethodIL methodIL) in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/DataflowAnalyzedMethodNode.cs:line 27
     at ILCompiler.DependencyAnalysis.NodeFactory.<>c.<CreateNodeCaches>b__39_33(MethodILKey il) in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/NodeFactory.cs:line 386
     at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
     at ILCompiler.UsageBasedMetadataManager.AddDataflowDependency(DependencyList& dependencies, NodeFactory factory, MethodIL methodIL, String reason) in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/UsageBasedMetadataManager.cs:line 931
     at Internal.IL.ILImporter.ImportCall(ILOpcode opcode, Int32 token) in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/IL/ILImporter.Scanner.cs:line 264
     at Internal.IL.ILImporter.ImportBasicBlock(BasicBlock basicBlock) in /home/robox/vsadov/runtime01/src/coreclr/tools/Common/TypeSystem/IL/ILImporter.cs:line 427
     at Internal.IL.ILImporter.ImportBasicBlocks() in /home/robox/vsadov/runtime01/src/coreclr/tools/Common/TypeSystem/IL/ILImporter.cs:line 311
     at Internal.IL.ILImporter.Import() in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/IL/ILImporter.Scanner.cs:line 156
     at ILCompiler.ILScanner.CompileSingleMethod(ScannedMethodNode methodCodeNodeNeedingCode) in /home/robox/vsadov/runtime01/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ILScanner.cs:line 123
     at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`1.<ForWorker>b__1(RangeWorker& currentWorker, Int32 timeout, Boolean& replicationDelegateYieldedBeforeCompletion)
     at System.Threading.Tasks.TaskReplicator.Replica.Execute()
     at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
     at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
     at System.Threading.ThreadPoolWorkQueue.Dispatch()
     at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
  Aborted

To repro, compile runtime and tests on a linux-arm64 machine like:

./build.sh -arch arm64 -os Linux -s clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs+libs.tests -rc Checked -lc Release /p:TestNativeAot=true -test -maxcpucount:10

The failure is not specific to linux-arm64

win-x64 repro:

build.cmd -s clr.alljits+clr.tools+clr.nativeaotlibs+clr.nativeaotruntime+libs+libs.tests -rc Checked -lc Release /p:TestNativeAot=true -test --maxcpucount:16
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 28, 2022
@VSadov
Copy link
Member Author

VSadov commented Jul 28, 2022

CC: @MichalStrehovsky - this is with current bits in main.
Not sure yet if this is specific to linux-arm64 or reproduces on other configs as well.

@MichalStrehovsky
Copy link
Member

The logic is new (#71485) - I don't think it's severe - I think you can just comment it out locally to unblock yourself.

Cc @vitek-karas

@VSadov
Copy link
Member Author

VSadov commented Jul 28, 2022

It happens on win-x64 as well

@VSadov VSadov changed the title [NativeAOT] Assert when compiling System.Runtime.Tests on linux-arm64 [NativeAOT] Assert when compiling System.Runtime.Tests Jul 28, 2022
@VSadov
Copy link
Member Author

VSadov commented Jul 28, 2022

There is a similar assert in InterproceduralScan

        // Scans the method as well as any nested functions (local functions or lambdas) and state machines
        // reachable from it.
        public virtual void InterproceduralScan(MethodIL startingMethodBody)
        {
            MethodDesc startingMethod = startingMethodBody.OwningMethod;
            Debug.Assert(startingMethod.IsTypicalMethodDefinition);

            // We should never have created a DataFlowAnalyzedMethodNode for compiler generated methods
            // since their data flow analysis is handled as part of their parent method analysis.
            Debug.Assert(!CompilerGeneratedState.IsNestedFunctionOrStateMachineMember(startingMethod));

I am guessing if I comment one, I will start hitting the other.

@VSadov
Copy link
Member Author

VSadov commented Jul 29, 2022

after commenting the assert in both places tests can build and pass.

@MichalStrehovsky MichalStrehovsky removed the untriaged New issue has not been triaged by the area owner label Jul 29, 2022
@MichalStrehovsky MichalStrehovsky added this to the 7.0.0 milestone Jul 29, 2022
@vitek-karas
Copy link
Member

Thanks for finding this Vlad.

The short version is that we haven't ported all of the functionality from linker yet - in this case specifically the "Reflection access to compiler generated methods".

The situation of this repro:

  • Assembly which is fully preserved via a descriptor
  • User method which looks something like this:
void UserMethod()
{
    if (AlwaysFalse)  // Or a feature switch which will evaluate to constant false
    {
        LocalFunction();
    }

    void LocalFunction() { do something which requires data flow }
}
  • In this case the compiler generated method (the LocalFunction) will be "marked" (both by linker and AOT)
  • But AOT will try to do data flow on it
    • This is because it doesn't have an associated user method - the code to find that doesn't see the call to it since it has been removed by branch removal (constant prop)
    • This leads to the asserts, since we never expect to do data flow on compiler generated method directly (only indirectly through its user method)
  • Linker has a guard against this which we haven't ported yet - basically this line: https://github.com/dotnet/linker/blob/65e21d78731ae5825b3921a870221fbc8fdeee67/src/linker/Linker.Steps/MarkStep.cs#L3425

I'll look into the fix some more - there's also an interesting question regarding warning behavior in this case (both for linker and AOT).

/cc @sbomer

vitek-karas added a commit to vitek-karas/linker that referenced this issue Jul 29, 2022
See the test for description of what it does. It's basically the linker repro for the case hit in dotnet/runtime#73027 (comment)
vitek-karas added a commit to vitek-karas/runtime that referenced this issue Jul 29, 2022
… code.

The problem occurs when an entire type/assembly is preserved through explicit rooting (command line, rd.xml, ...). If such type contains a local function (for example) which is only called from a branch which is going to be removed by constant-prop/branch removal the internal tracking of compiler generated methods will see this local function as orphaned (not belonging to any user method). This leads to a case where we will try to run data flow on the local function - but that should never happen for compiler generated methods directly -> assert.

The fix is (just like in the linker), to never run data flow on compiler generated methods directly - they should only run data flow as part of their respective user method.

Fixes dotnet#73027
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 29, 2022
vitek-karas added a commit to dotnet/linker that referenced this issue Jul 29, 2022
See the test for description of what it does. It's basically the linker repro for the case hit in dotnet/runtime#73027 (comment)
MichalStrehovsky pushed a commit that referenced this issue Aug 1, 2022
… code. (#73085)

The problem occurs when an entire type/assembly is preserved through explicit rooting (command line, rd.xml, ...). If such type contains a local function (for example) which is only called from a branch which is going to be removed by constant-prop/branch removal the internal tracking of compiler generated methods will see this local function as orphaned (not belonging to any user method). This leads to a case where we will try to run data flow on the local function - but that should never happen for compiler generated methods directly -> assert.

The fix is (just like in the linker), to never run data flow on compiler generated methods directly - they should only run data flow as part of their respective user method.

Fixes #73027
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants