Skip to content

[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395

Open
max-charlamb wants to merge 4 commits intodotnet:mainfrom
max-charlamb:cdac-stackrefs-pr1
Open

[cDAC] Stack walk GC reference scanning and bug fixes (1/5)#127395
max-charlamb wants to merge 4 commits intodotnet:mainfrom
max-charlamb:cdac-stackrefs-pr1

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Summary

Part 1 of 5 stacked PRs splitting #126408 into reviewable pieces.

What this PR contains

  • Stack walker fixes (IsFirst preserved for skipped frames, UpdateState refactor)
  • PromoteCallerStack / PromoteCallerStackUsingGCRefMap for transition frames
  • GCRefMapDecoder + FindGCRefMap with ReadyToRun import section resolution
  • SOSDacImpl.GetStackReferences using cDAC contract
  • GCInfoDecoder.EnumerateLiveSlots goto removal
  • GcSignatureTypeProvider for GC type classification
  • Data descriptor additions (TransitionBlock, ExceptionInfo, frame types)
  • RequiresInstArg, IsAsyncMethod, HasRetBuffArg on IRuntimeTypeSystem

Stack overview

PR Content Status
→ This PR Stack walk fixes + GC scanning 🔄
PR 2 RuntimeSignatureDecoder (ELEMENT_TYPE_INTERNAL) Pending
PR 3 ArgIterator port from crossgen2 Pending
PR 4 Native stress framework (cdacstress.cpp) Pending
PR 5 Managed stress tests + CI pipeline Pending

Testing

  • 1707/1707 unit tests pass ✅
  • Dump tests (StackWalkDumpTests, StackReferenceDumpTests) validate end-to-end

Note

This PR description was created with AI assistance from Copilot.

Implement GC reference scanning for stub/transition frames and fix
stack walker state machine bugs:

- PromoteCallerStack/PromoteCallerStackUsingGCRefMap for transition frames
- GCRefMap decoder for ReadyToRun import section resolution
- FindGCRefMap with FindReadyToRunModule fallback
- SOSDacImpl.GetStackReferences using cDAC contract
- Fix IsFirst preserved for skipped frames
- Fix skipped frame handling moved to UpdateState
- GCInfoDecoder goto removal (ReportUntrackedAndSucceed local function)
- RequiresInstArg, IsAsyncMethod, HasRetBuffArg on IRuntimeTypeSystem
- ExceptionInfo ClauseForCatch fields for catch handler detection
- Data descriptor additions for frame types and TransitionBlock layout

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the first slice of cDAC stack-walk GC reference scanning, extending data descriptors/contracts and wiring SOSDacImpl.GetStackReferences to use the cDAC StackWalk contract (including transition-frame root scanning via GCRefMap and signature-based fallback).

Changes:

  • Add/extend cDAC data descriptors and contract data types for transition frames, ReadyToRun import sections, and exception clause ranges needed for stack GC root enumeration.
  • Refactor StackWalk filtering/state transitions and implement managed/live-slot + frame-based GC root scanning paths.
  • Extend contract APIs (ExecutionManager/GCInfo/RuntimeTypeSystem) and update solution/test project structure to include StressTests.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/MockDescriptors/MockDescriptors.ExecutionManager.cs Updates mock ReadyToRunInfo descriptor layout with import section fields.
src/native/managed/cdac/tests/Microsoft.Diagnostics.DataContractReader.Tests.csproj Treats StressTests as a separate test project and excludes it from unit-test compilation.
src/native/managed/cdac/cdac.slnx Adds StressTests project to the cdac solution filter.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements GetStackReferences using cDAC StackWalk contract, with DEBUG cross-check.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ReadyToRunInfo.cs Reads/imports R2R import section pointer + count into the contract data.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/TransitionBlock.cs Exposes descriptor-provided layout offsets used for GCRefMap/arg-reg scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/StubDispatchFrame.cs Adds GCRefMap + lazy-resolution fields for stub dispatch frame root scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/ExternalMethodFrame.cs New contract data type for ExternalMethodFrame GCRefMap-based scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Frames/DynamicHelperFrame.cs New contract data type for DynamicHelperFrame flag-based scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/ExceptionInfo.cs Adds catch-handler clause range fields for interruptible-offset override logic.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/StackWalk_1.cs Refactors stack-walk filtering/Next integration and adds GC root enumeration behaviors.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GcSignatureTypeProvider.cs Adds signature-type classifier used by signature decoding for GC scanning.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/GC/GCRefMapDecoder.cs Adds managed decoder for the compact GCRefMap bitstream.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StackWalk/FrameHandling/FrameIterator.cs Adds return-address retrieval and frame-type-specific GC root scanning (GCRefMap/MetaSig paths).
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs Implements RequiresInstArg and IsAsyncMethod for TransitionFrame argument layout decisions.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/IGCInfoDecoder.cs Extends decoder surface to expose interruptible ranges.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfo_1.cs Plumbs GetInterruptibleRanges through the GCInfo contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/GCInfoDecoder.cs Removes goto, fixes untracked-slot reporting behavior, and keeps decoding accessible.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_2.cs Exposes FindReadyToRunModule in ExecutionManager v2 contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManager_1.cs Exposes FindReadyToRunModule in ExecutionManager v1 contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/ExecutionManager/ExecutionManagerCore.cs Implements R2R module resolution via RangeSection lookup.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/DataType.cs Adds DataType entries for new frame data types.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs Adds RequiresInstArg and IsAsyncMethod to abstractions interface.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IGCInfo.cs Adds InterruptibleRange and GetInterruptibleRanges to the public abstraction contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IExecutionManager.cs Adds FindReadyToRunModule to the abstraction contract.
src/coreclr/vm/readytoruninfo.h Adds cdac_data offsets for import sections in ReadyToRunInfo.
src/coreclr/vm/frames.h Adds cdac_data for ExternalMethodFrame/DynamicHelperFrame and fields for StubDispatchFrame GCRefMap resolution.
src/coreclr/vm/datadescriptor/datadescriptor.inc Adds/extends descriptors for new frame fields, transition block offsets, and catch clause ranges.
docs/design/datacontracts/StackWalk.md Documents new descriptors used by stack walking + GC reference scanning.
docs/design/datacontracts/RuntimeTypeSystem.md Documents new RuntimeTypeSystem contract methods.
docs/design/datacontracts/GCInfo.md Documents new GCInfo contract API for interruptible ranges.


/// <summary>
/// Finds the R2R module that contains the given address.
/// Used by FindGCRefMap to resolve m_pZapModule when it's null.
Copy link
Copy Markdown
Member

@jkotas jkotas Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this unconditionally and drop ZapModule from the constract? ZapModule seems to be a nice-to-have cache.

(Also, ZapModule can be renamed to Module or ReadyToRunModule. Zap is a very old codename name for crossgen/ngen that we have almost eradicated from the codebase.)

Copilot AI review requested due to automatic review settings April 27, 2026 21:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Comment thread docs/design/datacontracts/StackWalk.md Outdated
Comment thread src/native/managed/cdac/cdac.slnx Outdated
- Use ex.HResult instead of E_FAIL in GetStackReferences catch block
- Remove ZapModule references from StackWalk.md documentation
- Replace magic 0x11 with (byte)SignatureTypeKind.ValueType
- Remove StressTests csproj reference from cdac.slnx (PR5 only)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good, but there are LOTS of details missing their contract documentation.

Comment thread docs/design/datacontracts/GCInfo.md
| `StubDispatchFrame` | `MethodDescPtr` | Pointer to Frame's method desc |
| `StubDispatchFrame` | `RepresentativeMTPtr` | Pointer to Frame's method table pointer |
| `StubDispatchFrame` | `RepresentativeSlot` | Frame's method table slot |
| `StubDispatchFrame` | `GCRefMap` | Cached pointer to GC reference map blob for caller stack promotion |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the description of how the GCRefMap works, and its layout?

/// Returns TargetPointer.Null if the Frame has no return address (e.g., non-active ICF,
/// base Frame types, FuncEvalFrame during exception eval).
/// </summary>
public TargetPointer GetReturnAddress()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs documentation.

Copilot AI review requested due to automatic review settings April 28, 2026 20:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.

Comment on lines 66 to 72

GC_SPBASE_FIRST = GC_CALLER_SP_REL,
GC_SPBASE_LAST = GC_FRAMEREG_REL,
}

public readonly record struct InterruptibleRange(uint StartOffset, uint EndOffset);

public readonly record struct GcSlotDesc
{
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterruptibleRange was moved out of this file, but GcInfoDecoder<TTraits> still references InterruptibleRange (e.g., List<InterruptibleRange>). Since this file is in ...Contracts.GCInfoHelpers and the type now lives in Microsoft.Diagnostics.DataContractReader.Contracts, the simple name won’t resolve without adding a using Microsoft.Diagnostics.DataContractReader.Contracts; (or fully-qualifying the type). As-is, this should fail to compile.

Copilot uses AI. Check for mistakes.
_ => throw new InvalidOperationException($"Unknown stack slot base: {spBase}"),
};

TargetPointer addr = new(baseAddr.Value + (ulong)(long)spOffset);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When computing the stack slot address, baseAddr.Value + (ulong)(long)spOffset relies on unsigned wraparound for negative offsets. That can produce incorrect addresses on 32-bit targets (high bits become set) and is generally fragile. Prefer computing with signed arithmetic (e.g., cast baseAddr.Value to long, add spOffset, then cast back to ulong) so negative offsets subtract correctly without wrap issues.

Suggested change
TargetPointer addr = new(baseAddr.Value + (ulong)(long)spOffset);
TargetPointer addr = new((ulong)((long)baseAddr.Value + spOffset));

Copilot uses AI. Check for mistakes.
// InlinedCallFrame: returns 0 if inactive, else m_pCallerReturnAddress
case FrameType.InlinedCallFrame:
Data.InlinedCallFrame icf = target.ProcessedData.GetOrAdd<Data.InlinedCallFrame>(currentFramePointer);
return InlinedCallFrameHasActiveCall(icf) ? new TargetPointer(icf.CallerReturnAddress) : TargetPointer.Null;
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetReturnAddress(), the InlinedCallFrame case wraps an existing TargetPointer in new TargetPointer(icf.CallerReturnAddress). Since CallerReturnAddress is already a TargetPointer, this extra conversion is unnecessary; returning icf.CallerReturnAddress directly would be simpler and avoids implicit conversions.

Suggested change
return InlinedCallFrameHasActiveCall(icf) ? new TargetPointer(icf.CallerReturnAddress) : TargetPointer.Null;
return InlinedCallFrameHasActiveCall(icf) ? icf.CallerReturnAddress : TargetPointer.Null;

Copilot uses AI. Check for mistakes.
CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, MethodDescPtr, cdac_data<FramedMethodFrame>::MethodDescPtr)
CDAC_TYPE_FIELD(StubDispatchFrame, T_UINT32, RepresentativeSlot, cdac_data<StubDispatchFrame>::RepresentativeSlot)
CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, GCRefMap, cdac_data<StubDispatchFrame>::GCRefMap)
CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, ZapModule, cdac_data<StubDispatchFrame>::ZapModule)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete ZapModule from the cdac contracts?

I see my comment about this got resolved, but it is still here - is it needed for some reason?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Claude resolved that. I'm planning to push another iteration. I'll include this and the gcrefmap removal

CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, RepresentativeMTPtr, cdac_data<StubDispatchFrame>::RepresentativeMTPtr)
CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, MethodDescPtr, cdac_data<FramedMethodFrame>::MethodDescPtr)
CDAC_TYPE_FIELD(StubDispatchFrame, T_UINT32, RepresentativeSlot, cdac_data<StubDispatchFrame>::RepresentativeSlot)
CDAC_TYPE_FIELD(StubDispatchFrame, T_POINTER, GCRefMap, cdac_data<StubDispatchFrame>::GCRefMap)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCRefMap field can be deleted too. It is a nice to have cache as well. It should be fine to do the full lookup in cdac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants