[LLHD] Run Mem2Reg per slot to fix cubic scaling by fabianschuiki · Pull Request #10321 · llvm/circt

fabianschuiki · 2026-04-25T00:12:16Z

Reapply commit bcc1685 with an additional fix to make the block-entry merge logic monotone. The previous attempt was reverted in 82ec37f because it caused Mem2Reg to hang on cyclic CFGs: mergeFlavor could return either the unique non-null predecessor def (common) or a cached merge def, and across iterations the value at an entry would flip between the two as back-edge state propagated. Make the merge sticky by always returning the cached merge def once it has been created, so each block entry moves through null -> common -> merged at most once and the fixpoint terminates.

The lattice used to track sets and maps of all slots in the region at every program point, and every propagation update copied and compared the full state. Composing three O(N) factors yielded roughly O(N^3) total work, with a 1000-signal stress test taking on the order of two minutes.

Run the analysis once per slot instead, creating a separate tiny lattice that focuses only on interactions with that single slot. State in the LatticeValue collapses to a single needed flag and a pair of reaching-def pointers (one for the blocking flavor, one for the delayed flavor of assignments), so every propagation update is O(1). Block-entry merge tracking, inserted-probe tracking, and the loops in insertProbes, insertDrives, and insertBlockArgs simplify correspondingly. Cross-slot state shrinks to a small cache of the llhd.constant_time ops inserted at block terminators, so we don't end up with one constant op per promoted slot.

After the change the 1000-signal stress test runs in around 700 ms -- roughly two orders of magnitude faster. Add four scaling stress tests of increasing size to guard against regressions.

Fixes #10314.

Assisted-by: Claude Opus 4.7

Reapply commit bcc1685 with an additional fix to make the block-entry merge logic monotone. The previous attempt was reverted in 82ec37f because it caused Mem2Reg to hang on cyclic CFGs: `mergeFlavor` could return either the unique non-null predecessor def (`common`) or a cached merge def, and across iterations the value at an entry would flip between the two as back-edge state propagated. Make the merge sticky by always returning the cached merge def once it has been created, so each block entry moves through null -> common -> merged at most once and the fixpoint terminates. The lattice used to track sets and maps of all slots in the region at every program point, and every propagation update copied and compared the full state. Composing three O(N) factors yielded roughly O(N^3) total work, with a 1000-signal stress test taking on the order of two minutes. Run the analysis once per slot instead, creating a separate tiny lattice that focuses only on interactions with that single slot. State in the LatticeValue collapses to a single `needed` flag and a pair of reaching-def pointers (one for the blocking flavor, one for the delayed flavor of assignments), so every propagation update is O(1). Block-entry merge tracking, inserted-probe tracking, and the loops in `insertProbes`, `insertDrives`, and `insertBlockArgs` simplify correspondingly. Cross-slot state shrinks to a small cache of the `llhd.constant_time` ops inserted at block terminators, so we don't end up with one constant op per promoted slot. After the change the 1000-signal stress test runs in around 700 ms -- roughly two orders of magnitude faster. Add four scaling stress tests of increasing size to guard against regressions. Fixes #10314. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

seldridge · 2026-04-25T00:23:45Z

Kind of an extreme nit, but could you structure this PR as two revert commits and a patch commit on top of it?

fabianschuiki · 2026-04-25T00:37:34Z

Results of circt-tests run for c7e3c01 compared to results for 7ca0d4a: no change to test results.

seldridge · 2026-04-25T02:10:37Z

@jpienaar: Could you test that this fixes the internal hang in your flow?

fabianschuiki · 2026-04-27T15:34:53Z

but could you structure this PR as two revert commits and a patch commit on top of it?

That's what I initially tried. But the second of the two reverted commits fully supersedes the first one. So this is more like a re-implementation that gets us to the same end point as the two commits combined, with a different implementation.

jpienaar · 2026-04-28T17:08:22Z

Sorry for delay, will try tonight

jpienaar · 2026-04-28T17:29:01Z

Yes looked like it resolved the previous issue we ran into.

fabianschuiki requested a review from seldridge April 25, 2026 00:12

fabianschuiki requested a review from maerhart as a code owner April 25, 2026 00:12

fabianschuiki added the LLHD label Apr 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLHD] Run Mem2Reg per slot to fix cubic scaling#10321

[LLHD] Run Mem2Reg per slot to fix cubic scaling#10321
fabianschuiki wants to merge 1 commit intomainfrom
fschuiki/mem2reg-reapply

fabianschuiki commented Apr 25, 2026

Uh oh!

seldridge commented Apr 25, 2026

Uh oh!

fabianschuiki commented Apr 25, 2026

Uh oh!

seldridge commented Apr 25, 2026

Uh oh!

fabianschuiki commented Apr 27, 2026

Uh oh!

jpienaar commented Apr 28, 2026

Uh oh!

jpienaar commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fabianschuiki commented Apr 25, 2026

Uh oh!

seldridge commented Apr 25, 2026

Uh oh!

fabianschuiki commented Apr 25, 2026

Uh oh!

seldridge commented Apr 25, 2026

Uh oh!

fabianschuiki commented Apr 27, 2026

Uh oh!

jpienaar commented Apr 28, 2026

Uh oh!

jpienaar commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants