Skip to content

[LLHD] Run Mem2Reg per slot to fix cubic scaling#10321

Open
fabianschuiki wants to merge 1 commit intomainfrom
fschuiki/mem2reg-reapply
Open

[LLHD] Run Mem2Reg per slot to fix cubic scaling#10321
fabianschuiki wants to merge 1 commit intomainfrom
fschuiki/mem2reg-reapply

Conversation

@fabianschuiki
Copy link
Copy Markdown
Contributor

Reapply commit bcc1685 with an additional fix to make the block-entry merge logic monotone. The previous attempt was reverted in 82ec37f because it caused Mem2Reg to hang on cyclic CFGs: mergeFlavor could return either the unique non-null predecessor def (common) or a cached merge def, and across iterations the value at an entry would flip between the two as back-edge state propagated. Make the merge sticky by always returning the cached merge def once it has been created, so each block entry moves through null -> common -> merged at most once and the fixpoint terminates.

The lattice used to track sets and maps of all slots in the region at every program point, and every propagation update copied and compared the full state. Composing three O(N) factors yielded roughly O(N^3) total work, with a 1000-signal stress test taking on the order of two minutes.

Run the analysis once per slot instead, creating a separate tiny lattice that focuses only on interactions with that single slot. State in the LatticeValue collapses to a single needed flag and a pair of reaching-def pointers (one for the blocking flavor, one for the delayed flavor of assignments), so every propagation update is O(1). Block-entry merge tracking, inserted-probe tracking, and the loops in insertProbes, insertDrives, and insertBlockArgs simplify correspondingly. Cross-slot state shrinks to a small cache of the llhd.constant_time ops inserted at block terminators, so we don't end up with one constant op per promoted slot.

After the change the 1000-signal stress test runs in around 700 ms -- roughly two orders of magnitude faster. Add four scaling stress tests of increasing size to guard against regressions.

Fixes #10314.

Assisted-by: Claude Opus 4.7

Reapply commit bcc1685 with an additional fix to make the block-entry
merge logic monotone. The previous attempt was reverted in 82ec37f
because it caused Mem2Reg to hang on cyclic CFGs: `mergeFlavor` could
return either the unique non-null predecessor def (`common`) or a
cached merge def, and across iterations the value at an entry would
flip between the two as back-edge state propagated. Make the merge
sticky by always returning the cached merge def once it has been
created, so each block entry moves through null -> common -> merged at
most once and the fixpoint terminates.

The lattice used to track sets and maps of all slots in the region at
every program point, and every propagation update copied and compared
the full state. Composing three O(N) factors yielded roughly O(N^3)
total work, with a 1000-signal stress test taking on the order of two
minutes.

Run the analysis once per slot instead, creating a separate tiny
lattice that focuses only on interactions with that single slot. State
in the LatticeValue collapses to a single `needed` flag and a pair of
reaching-def pointers (one for the blocking flavor, one for the
delayed flavor of assignments), so every propagation update is O(1).
Block-entry merge tracking, inserted-probe tracking, and the loops in
`insertProbes`, `insertDrives`, and `insertBlockArgs` simplify
correspondingly. Cross-slot state shrinks to a small cache of the
`llhd.constant_time` ops inserted at block terminators, so we don't
end up with one constant op per promoted slot.

After the change the 1000-signal stress test runs in around 700 ms --
roughly two orders of magnitude faster. Add four scaling stress tests
of increasing size to guard against regressions.

Fixes #10314.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@seldridge
Copy link
Copy Markdown
Member

Kind of an extreme nit, but could you structure this PR as two revert commits and a patch commit on top of it?

@fabianschuiki
Copy link
Copy Markdown
Contributor Author

Results of circt-tests run for c7e3c01 compared to results for 7ca0d4a: no change to test results.

@seldridge
Copy link
Copy Markdown
Member

@jpienaar: Could you test that this fixes the internal hang in your flow?

@fabianschuiki
Copy link
Copy Markdown
Contributor Author

but could you structure this PR as two revert commits and a patch commit on top of it?

That's what I initially tried. But the second of the two reverted commits fully supersedes the first one. So this is more like a re-implementation that gets us to the same end point as the two commits combined, with a different implementation.

@jpienaar
Copy link
Copy Markdown
Member

Sorry for delay, will try tonight

@jpienaar
Copy link
Copy Markdown
Member

Yes looked like it resolved the previous issue we ran into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[circt-verilog] Hang in llhd-mem2reg

3 participants