fix: replace lazy regex in removeXmlComments with depth-tracking scan to prevent nested comment bypass by Copilot · Pull Request #28927 · github/gh-aw

Copilot · 2026-04-28T11:31:35Z

removeXmlComments used a lazy regex (<!--[\s\S]*?--!?>) that only consumed the innermost comment pair in nested inputs — leaving the outer payload visible. The do...while loop didn't help because no second <!-- opener remained after the first pass.

Root cause

Input:  <!-- <!-- --> PAYLOAD -->
Lazy regex matches: <!-- <!-- -->
Remaining:          PAYLOAD -->   ← reaches the agent

Fix

Replaced the lazy regex with a character-by-character depth-tracking scan:

Increments commentDepth on each <!-- opener
Decrements on each --> / --!> closer (only when inside a comment)
Appends to output only at commentDepth === 0

<!-- <!-- --> PAYLOAD -->
<!--          depth → 1
    <!--      depth → 2
         -->  depth → 1
              PAYLOAD   skipped (depth=1)
                    --> depth → 0
Result: ""  ✓

Tests added

Unit tests (sanitize_content.test.cjs) — 10 new cases covering:

 PAYLOAD --> → "" (original bypass)
before  PAYLOAD --> after → "before after"
 --> PAYLOAD --> → "" (arbitrary nesting depth)
Multiple independent comments leaving surrounding text
Empty comment 
Unclosed comment (all content after opener stripped to EOF)
Adjacent comments with no text between
Nested bypass with --!> outer closer
Stray --> without opener preserved as literal text

JS fuzz harness (actions/setup/js/fuzz_remove_xml_comments_harness.cjs) — wraps removeXmlComments for the Go fuzzer.

Go fuzz test (pkg/workflow/remove_xml_comments_fuzz_test.go) — FuzzRemoveXmlComments with 33 seed cases. Invariants verified per seed:

No panics / crashes
Output length ≤ input length
Output is a character-by-character subsequence of input (no characters synthesised)
 always strips to ""
The nested bypass  text --> always strips to ""

The lazy regex ` PAYLOAD -->`, leaving PAYLOAD visible in the sanitized output. The do-while loop did not help because no second `` or `--!>` closer, skipping all content at commentDepth > 0. This correctly handles arbitrary nesting depth. Add regression tests: - ` PAYLOAD -->` → `""` - `before  PAYLOAD --> after` → `"before after"` - ` --> PAYLOAD -->` → `""` Fixes: nested HTML comment bypass (gh-aw-security#2066) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d33fff74-a28c-40d0-8636-53c99bda69ca Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

pelikhan · 2026-04-28T11:56:59Z

@copilot add fuzz tests + more unit tests

- Add fuzz_remove_xml_comments_harness.cjs (JS harness for Go fuzzer) - Add remove_xml_comments_fuzz_test.go (Go fuzz test with 33 seed cases) - Validates: no crashes, output ≤ input length, output is a subsequence of input (no chars synthesised), simple/nested comment bypass invariants - Add 7 more unit tests in sanitize_content.test.cjs: - Multiple independent comments - Empty comment  - Unclosed comment (all content stripped to EOF) - Adjacent comments with no text between - Nested bypass with --!> outer closer - Stray closer --> without opener (preserved as literal text) Agent-Logs-Url: https://github.com/github/gh-aw/sessions/03e7d81d-149b-4b4f-b71c-aab59da2b609 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-28T12:17:08Z

@copilot add fuzz tests + more unit tests

Done in f56a127. Added:

JS fuzz harness (actions/setup/js/fuzz_remove_xml_comments_harness.cjs) — wraps removeXmlComments for the Go fuzzer, reads { text } from stdin and returns { result, error }.

Go fuzz test (pkg/workflow/remove_xml_comments_fuzz_test.go) — 33 seed cases covering simple comments, nested opener bypass patterns, unclosed comments, stray closers, adjacent comments, and unicode. Invariants checked per seed:

No panic / crash
Output length ≤ input length
Output is a character-by-character subsequence of input (no characters synthesised)
 always strips to ""
 text --> (the bypass pattern) always strips to ""

7 more unit tests in sanitize_content.test.cjs:

Multiple independent comments leaving surrounding text
Empty 
Unclosed comment (all content after opener stripped to EOF)
Adjacent comments with no text between
Nested bypass with --!> outer closer
Stray --> without opener preserved as literal text

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

invalid.example.invalid
- Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git git conf�� user.email test@example.com ndor/bin/git /tmp/bare-incremgit . e/git git init�� --bare --initial-branch=main nfig/composer/vendor/bin/git emental-vyOnlO' emental-vyOnlO' de_modules/.bin/agent-change.txt git (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

github-actions · 2026-04-28T14:23:49Z

🧪 Test Quality Sentinel Report

Test Quality Score: 95/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	11 (10 JS + 1 Go fuzz)
✅ Design tests (behavioral contracts)	11 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	10 (91%)
Duplicate test clusters	0
Test inflation detected	No (JS: 1.7:1; Go fuzz: 7.1:1 — expected for fuzz infrastructure)
🚨 Coding-guideline violations	None

Test Classification Details

View All 11 Tests

Test	File	Classification	Issues Detected
`should remove nested comment opener bypass`	`sanitize_content.test.cjs`	✅ Design	Regression test for CVE pattern; verifies observable output
`should remove nested comment bypass with surrounding text`	`sanitize_content.test.cjs`	✅ Design	Verifies surrounding text preserved
`should remove deeply nested comment openers`	`sanitize_content.test.cjs`	✅ Design	Edge case: 3-level nesting
`should remove multiple independent comments leaving surrounding text`	`sanitize_content.test.cjs`	✅ Design	Multi-comment output verified
`should strip all content after unclosed comment opener`	`sanitize_content.test.cjs`	✅ Design	Edge case: unclosed opener consumes to EOF
`should remove adjacent comments with no text between them`	`sanitize_content.test.cjs`	✅ Design	Edge case: adjacent comments
`should preserve a stray closer --> with no matching opener`	`sanitize_content.test.cjs`	✅ Design	Edge case: stray closer is literal text
`FuzzRemoveXmlComments` (corpus: 30+ seeds)	`remove_xml_comments_fuzz_test.go`	✅ Design	Behavioral invariants: no crashes, output ≤ input length, nested bypass fully stripped, output is subsequence of input

Flagged Tests — Requires Review

None. All new tests are classified as high-value design tests.

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 10 tests (vitest)

Notable Highlights

The test additions are exemplary for a security fix:

Go fuzz test (FuzzRemoveXmlComments) cross-tests the JavaScript implementation from Go's fuzzing framework, providing continuous random-input validation. It enforces four non-trivial invariants:
- Function never panics/crashes Node.js
- Output length ≤ input length (only removal, never synthesis)
- Output is a subsequence of the input (no characters synthesized)
- Nested-opener bypass pattern always produces empty output
The 7.1:1 line ratio (206 Go lines vs 29 production lines changed) is expected and acceptable for a fuzz harness — the extra lines are cross-language bridge infrastructure, not duplicated test logic.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 11 new tests verify observable behavioral contracts and include edge-case coverage directly relevant to the security fix.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References:

§25058278946

🧪 Test quality analysis by Test Quality Sentinel · ● 413.4K · ◷

github-actions

✅ Test Quality Sentinel: 95/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 11 new tests verify observable behavioral contracts with strong edge-case coverage for the nested XML comment bypass fix.

Copilot

Pull request overview

Fixes an XML/HTML comment stripping bypass in removeXmlComments by replacing the previous lazy regex with a depth-tracking scanner, and adds regression + fuzz coverage.

Changes:

Replace regex-based removeXmlComments with a depth-tracking scan that correctly strips nested comment patterns.
Add unit tests covering nested comment bypasses, malformed closers, unclosed comments, and stray closers.
Add a JS harness and a Go fuzz test to fuzz removeXmlComments via Node.

Show a summary per file

File	Description
actions/setup/js/sanitize_content_core.cjs	Implements depth-tracking scanning to remove nested XML/HTML comments safely.
actions/setup/js/sanitize_content.test.cjs	Adds regression tests for nested comment bypass and related edge cases.
actions/setup/js/fuzz_remove_xml_comments_harness.cjs	Provides a Node/JS entrypoint to call `removeXmlComments` from Go fuzzing.
pkg/workflow/remove_xml_comments_fuzz_test.go	Adds a Go fuzz test that drives the JS harness and checks invariants.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (4)

pkg/workflow/remove_xml_comments_fuzz_test.go:196

runRemoveXmlCommentsTest treats any non-empty stderr as an “expected error” and returns (nil, nil), which can silently mask real failures (Node crash, syntax error, missing harness, etc.) and make the fuzz target effectively do nothing. It would be better to either return the error to the caller, or only t.Skip on clearly-detected environment issues (e.g., missing node/missing file) while failing on other stderr-producing errors.

	err = cmd.Run()
	if err != nil {
		if stderr.Len() > 0 {
			return nil, nil // Expected error (e.g., harness not found)
		}

pkg/workflow/remove_xml_comments_fuzz_test.go:138

Similarly, the nested-bypass invariant wraps arbitrary fuzzed text inside  + text + -->, but if text contains -->/--!> it can close the outer comment early and legitimately leave trailing characters, causing false positives. The invariant should either sanitize/escape text to exclude closers, or only assert the bypass property for inputs that don’t contain closer sequences.

			// The nested-opener bypass must always be stripped: wrapping the text in
			// <!-- <!-- --> ... --> must produce no output
			nestedBypass := "<!-- <!-- --> " + text + " -->"
			nestedResult, nestedErr := runRemoveXmlCommentsTest(nestedBypass)
			if nestedErr == nil && nestedResult != nil && nestedResult.Error == nil {
				if nestedResult.Result != "" {
					t.Errorf("Nested comment bypass not fully stripped: input=%q, output=%q",
						nestedBypass, nestedResult.Result)
				}
			}

pkg/workflow/remove_xml_comments_fuzz_test.go:161

isSubsequenceOf iterates sub by rune but scans s by byte (len(s) and rune(s[si])). This will mis-handle any non-ASCII input (multi-byte UTF-8), which is especially relevant since the fuzz seeds include Unicode/emoji. Consider comparing using []rune for both strings, or comparing byte slices consistently if you want a byte-level subsequence.

func isSubsequenceOf(sub, s string) bool {
	si := 0
	for _, c := range sub {
		found := false
		for si < len(s) {
			if rune(s[si]) == c {
				si++
				found = true
				break
			}
			si++
		}
		if !found {
			return false
		}
	}
	return true

pkg/workflow/remove_xml_comments_fuzz_test.go:186

The harness path is likely incorrect: this test looks for js/fuzz_remove_xml_comments_harness.cjs relative to the package working directory, but the harness added in this PR is located at actions/setup/js/fuzz_remove_xml_comments_harness.cjs. As written, the fuzz test will fail to find/run the harness. Consider resolving the path relative to this Go file (via runtime.Caller) or pointing to actions/setup/js/... explicitly, consistent with where the harness lives in this repo.

	harnessPath := filepath.Join("js", "fuzz_remove_xml_comments_harness.cjs")

	cmd := exec.Command("node", harnessPath)
	cmd.Stdin = bytes.NewReader(inputJSON)

Files reviewed: 4/4 changed files
Comments generated: 2

Copilot · 2026-04-28T14:31:40Z

+			// A simple comment with no nested openers must be fully removed
+			simpleComment := "<!-- " + text + " -->"
+			simpleResult, simpleErr := runRemoveXmlCommentsTest(simpleComment)
+			if simpleErr == nil && simpleResult != nil && simpleResult.Error == nil {
+				if simpleResult.Result != "" {
+					t.Errorf("Simple comment not fully removed: input=%q, output=%q",
+						simpleComment, simpleResult.Result)
+				}
+			}


The “simple comment must be fully removed” invariant is tested by wrapping the fuzzed text inside , but text may contain comment delimiters like -->/--!>/" yields leftover text). Consider restricting/escaping text for this invariant, or using a fixed non-delimiter payload for the simple-comment check.

This issue also appears in the following locations of the same file:

line 129

line 145

line 183

line 192

Copilot · 2026-04-28T14:31:41Z

+  let result = "";
+  let commentDepth = 0;
+  let position = 0;
+  while (position < s.length) {
+    const ch = s[position];
+    if (ch === "<" && s.startsWith("<!--", position)) {
+      // Comment opener — increase nesting depth regardless of current depth
+      commentDepth++;
+      position += 4;
+    } else if (commentDepth > 0 && ch === "-" && s.startsWith("--!>", position)) {
+      // Malformed comment closer --!> (only meaningful inside an open comment)
+      commentDepth--;
+      position += 4;
+    } else if (commentDepth > 0 && ch === "-" && s.startsWith("-->", position)) {
+      // Normal comment closer --> (only meaningful inside an open comment)
+      commentDepth--;
+      position += 3;
+    } else {
+      // Include character in output only when outside all comment regions
+      if (commentDepth === 0) {
+        result += ch;
+      }
+      position++;
+    }


Building result via result += ch in a per-character loop can become quadratic and memory-heavy for large inputs (this sanitizer supports ~500KB strings). Consider accumulating into an array of chunks/characters and join("") at the end (or another linear-time builder pattern) to keep worst-case runtime predictable.

Initial plan

7f13c7e

Copilot AI assigned Copilot and szabta89 Apr 28, 2026

Copilot started work on behalf of szabta89 April 28, 2026 11:31 View session

Copilot AI linked an issue Apr 28, 2026 that may be closed by this pull request

removeXmlComments lazy regex strips inner nested comment only — outer injection payload reaches agent as literal text #28926

Closed

Copilot AI changed the title ~~[WIP] Fix lazy regex in removeXmlComments to prevent payload leakage~~ fix: replace lazy regex in removeXmlComments with depth-tracking scan to prevent nested comment bypass Apr 28, 2026

Copilot finished work on behalf of szabta89 April 28, 2026 11:43

Copilot AI requested a review from szabta89 April 28, 2026 11:43

Copilot started work on behalf of pelikhan April 28, 2026 11:57 View session

Copilot finished work on behalf of pelikhan April 28, 2026 12:19

Copilot AI requested a review from pelikhan April 28, 2026 12:19

github-actions Bot mentioned this pull request Apr 28, 2026

[aw] No-Op Runs #28859

Open

pelikhan marked this pull request as ready for review April 28, 2026 14:17

Copilot AI review requested due to automatic review settings April 28, 2026 14:17

Copilot started reviewing on behalf of pelikhan April 28, 2026 14:18 View session

github-actions Bot approved these changes Apr 28, 2026

View reviewed changes

pelikhan merged commit 965a61b into main Apr 28, 2026
43 of 45 checks passed

pelikhan deleted the copilot/fix-remove-xml-comments-regex branch April 28, 2026 14:28

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace lazy regex in removeXmlComments with depth-tracking scan to prevent nested comment bypass#28927

fix: replace lazy regex in removeXmlComments with depth-tracking scan to prevent nested comment bypass#28927
pelikhan merged 3 commits intomainfrom
copilot/fix-remove-xml-comments-regex

Copilot AI commented Apr 28, 2026 •

edited

Loading

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

Copilot AI commented Apr 28, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Fix

Tests added

Uh oh!

pelikhan commented Apr 28, 2026

Uh oh!

Copilot AI commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

github-actions Bot commented Apr 28, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 95/100

Test Classification Details

Flagged Tests — Requires Review

Language Support

Notable Highlights

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 28, 2026 •

edited

Loading

Copilot AI commented Apr 28, 2026 •

edited

Loading