Skip to content

fix: gracefully skip invalid repositories in webhooks instead of aborting#6360

Open
AftAb-25 wants to merge 2 commits intomindersec:mainfrom
AftAb-25:fix/6359-webhook-batch-abort
Open

fix: gracefully skip invalid repositories in webhooks instead of aborting#6360
AftAb-25 wants to merge 2 commits intomindersec:mainfrom
AftAb-25:fix/6359-webhook-batch-abort

Conversation

@AftAb-25
Copy link
Copy Markdown
Contributor

@AftAb-25 AftAb-25 commented Apr 13, 2026

Description

This PR improves batch resilience inside the GitHub App Webhook processor (processInstallationRepositoriesAppEvent) by preventing a single invalid repository entry from aborting the entire synchronization event.

When an installation_repositories payload arrives from GitHub, it iterates through both addedRepos and removedRepos to synchronize the installation. Previously, if any single repository failed the basic name or ID parsing validation (e.g. an empty name string), the function would instantly return an error.

This caused two minor but unwanted behaviors:

  1. It returned a 500 error to GitHub, causing GitHub to retry the exact same webhook payload and repeatedly hit the same validation error.
  2. It dropped subsequent valid repositories in the same batch, meaning that minor data anomalies could temporarily stall syncing.

Changes

  • Swapped the return nil, err across both the added and removed loops with a graceful zerolog warning and a continue, allowing valid entries to continue processing.
  • Ensured the repositoryRemoved() loop validates repo.GetID() != 0 to safely surface errors rather than passing zero-values downstream.
  • Added explicit unit tests (app_test.go) validating batch resilience when dropping mixed invalid repositories into both the added and removed slices.

Fixes #6359

Checklist

  • Code compiles cleanly
  • Includes tests for the changes

@AftAb-25 AftAb-25 requested a review from a team as a code owner April 13, 2026 17:14
@AftAb-25 AftAb-25 force-pushed the fix/6359-webhook-batch-abort branch from 0b41aa4 to b58a74b Compare April 13, 2026 17:18
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 13, 2026

Coverage Status

coverage: 60.349% (+1.0%) from 59.39% — AftAb-25:fix/6359-webhook-batch-abort into mindersec:main

@AftAb-25 AftAb-25 force-pushed the fix/6359-webhook-batch-abort branch from b58a74b to 2fca364 Compare April 13, 2026 17:30
@evankanderson
Copy link
Copy Markdown
Member

Description

This PR patches a massive loop abort vulnerability inside the GitHub App Webhook processor (processInstallationRepositoriesAppEvent) that was causing data starvation and ghost-access leaks.

  1. It instantly returned a 500 error to GitHub, tricking GitHub into repeatedly retrying the exact same webhook payload, crashing on the exact same bad repo every time.
  2. Every subsequent valid repository in the addedRepos batch was completely dropped.
  3. Because the abort happened top-down, the loop never even reached event.GetRepositoriesRemoved(). Any repositories the user explicitly revoked access to in that same batch were ignored, causing Minder to inappropriate retain data access tracking indefinitely.

Do you have evidence that any of these actually happen, or is this hypothetical on bad data from GitHub? I'm happy to harden Minder against bad GitHub data, but given that the payload is delivered via HTTPS with a signature from GitHub, it seems a bit of a stretch to claim this is a "massive vulnerability" causing "data starvation and ghost-access leaks".

In particular, if Minder misses notification that a repository was removed from the app installation, GitHub should still enforce the access control and prevent Minder from accessing that repository. (Minder may churn extra on the API call failures, but this shouldn't stop Minder from processing other repositories.)

Comment on lines +148 to +149
func strPtr(s string) *string { return &s }
func int64Ptr(i int64) *int64 { return &i }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the generic ptr.Ptr() from internal/util/ptr rather than defining your own.

Comment on lines +27 to +70
validRepo := func(id int, name, fullName string) *repo {
idVal := int64(id)
return &repo{
ID: &idVal,
Name: &name,
FullName: &fullName,
}
}

invalidRepo := func() *repo {
// name is empty, triggers repositoryAdded validation error
emptyName := ""
return &repo{
Name: &emptyName,
}
}

zeroIDRepo := func() *repo {
// ID is 0, triggers repositoryRemoved validation error
var zero int64
name := "bad-repo"
return &repo{
ID: &zero,
Name: &name,
}
}

mockInstallation := db.ProviderGithubAppInstallation{
ProjectID: uuid.NullUUID{UUID: projectID, Valid: true},
ProviderID: uuid.NullUUID{UUID: providerID, Valid: true},
}

baseMocks := func(ctrl *gomock.Controller) db.Store {
return df.NewMockStore(
df.WithSuccessfulGetInstallationIDByAppID(mockInstallation, 54321),
df.WithSuccessfulGetProviderByID(
db.Provider{
ID: providerID,
Definition: json.RawMessage(autoregEnabled),
},
providerID,
),
)(ctrl)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't generally define private functions like this in tests; declare it as a pure function below the test framework if you need to modularize it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just call this file app_test.go. Alternatively, consider extending the existing test cases in internal/providers/github/webhook/handlers_githubwebhooks_test.go, which include cases for garbage content, but not 0 repository IDs.

autoregEnabled := `{"github-app": {}, "auto_registration": {"entities": {"repository": {"enabled": true}}}}`

validRepo := func(id int, name, fullName string) *repo {
idVal := int64(id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take an int64 argument here?

Comment on lines +36 to +42
invalidRepo := func() *repo {
// name is empty, triggers repositoryAdded validation error
emptyName := ""
return &repo{
Name: &emptyName,
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the next seems like the callers could re-use the validRepo function, rather than needing to create a separate function.

@AftAb-25
Copy link
Copy Markdown
Contributor Author

Hi @evankanderson, thank you for the detailed pushback — you're raising completely fair points and I want to address them honestly.

On the severity framing:

You're right that "massive vulnerability" and "ghost-access leaks" were overstatements on my part. GitHub's HTTPS signature validation means the payload is authentic, and as you correctly note, GitHub's own access control layer still enforces revocation even if Minder misses a repositories_removed notification. I should not have framed this as a security vulnerability — the more accurate framing is a batch resilience gap.

On the actual observable behavior:

The concrete problem is narrower than I described: if GitHub delivers a malformed or partially-corrupt installation_repositories payload (e.g. a repo entry with an empty Name field, which can happen during transient API issues or GitHub App rollouts), the current code aborts the entire batch with a 500. This causes:

  1. GitHub retries the same payload repeatedly (GitHub retries 5xx responses with exponential backoff for up to 72 hours per the GitHub Webhooks docs)
  2. All valid repositories in that same payload are skipped for that delivery window

The fix is minimal defensive coding — swap return nil, err for a Warn log + continue — which makes the handler consistent with how most other Minder webhook handlers treat per-item validation errors.

Adjusted PR description:

I've toned down the description to accurately reflect this as a batch resilience improvement rather than a security fix. Happy to update the PR title too if that helps.

Let me know if you'd like me to make any other adjustments!

@AftAb-25
Copy link
Copy Markdown
Contributor Author

AftAb-25 commented Apr 23, 2026

@evankanderson You're 100% right to ask for evidence—this was actually a proactive finding discovered during a code audit, rather than a bug flagged from a live production incident or customer report.

While reviewing the webhook processing logic, I noticed that processInstallationRepositoriesAppEvent lacked the fault tolerance we usually apply to batch processors. My concern was that if an installation_repositories webhook from GitHub ever arrived with a repository missing its Name or ID (e.g., due to a temporary GitHub API degradation, a repo being transferred/deleted mid-flight during the payload generation, or a malformed test payload), our handler would abort the entire batch and return a 500.

I don't have concrete log evidence of GitHub sending malformed installation_repositories payloads in the wild, so my original PR description calling it a "massive vulnerability" was definitely overstated on my part!

My goal here is simply to proactively harden the loop in the engine so that if an anomaly does occur, Minder degrades gracefully by skipping the bad entry and syncing the rest, rather than dropping the entire payload batch.

@AftAb-25 AftAb-25 changed the title fix: prevent silent abort of webhook payload when a single repo validation fails fix: gracefully skip invalid repositories in webhooks instead of aborting Apr 23, 2026
- Rename app_batch_test.go -> app_test.go per reviewer suggestion
- Use ptr.Ptr[T]() from internal/util/ptr instead of local strPtr/int64Ptr helpers
- Move validRepo/invalidRepo/zeroIDRepo to top-level package functions
- Change validRepo to accept int64 instead of int
- Fix processInstallationRepositoriesAppEvent to skip (not abort) on invalid repos
- Add zero-ID guard for removed repos in the same loop

Addresses review comments from @evankanderson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: GitHub App Webhook silently drops entire batch of added/removed repositories if a single repo fails validation

3 participants