[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-04-27 #28719
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot Session Insights. A newer discussion is available at Discussion #28933. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Key Metrics
📈 Session Trends Analysis
Completion Patterns
The overall completion rate has remained consistently low (2–10%) since tracking began, with the notable exception of Apr 23 (24% — the all-time record driven by 4 concurrent agent successes). Today's 2% rate reflects a low-agent-count day with one agent still in-progress at snapshot time. The Apr 26 parallelism peak (5 agents) produced only 20% success — suggesting diminishing returns when too many branches compete simultaneously.
Duration & Efficiency
Session duration peaked in mid-April (51+ min for complex multi-CI-round tasks) and has trended down toward 8–16 min through late April, suggesting simpler or more focused tasks are being assigned. The Apr 26 record-5-agent day had only 1.8 min average — indicating many agents were snapshotted very early in their lifecycles. Today's 8.3 min reflects the single sub-PR agent making meaningful progress.
Active Sessions (2026-04-27)
Branch 1:
copilot/sub-pr-28676→ PR #28688[WIP] docs: Organization Practices— Adding Organization Practices pages (Safe Rollout, Sharing Workflows)Branch 2:
copilot/create-agentic-workflowsSuccess Factors ✅
Patterns associated with successful task completion (based on 16-day history):
Focused, scoped tasks under ~15 minutes: Sessions in the 8–20 min range consistently complete. Multi-CI-round sessions (51+ min) also succeed but are uncommon.
Functional / security / code-quality tasks attract agents:
update-golang-org-x-vuln,fix-concurrency-issues,fix-package-specification-extractorall had agents assigned and succeeded.Single-branch focus days: Days with 1–2 concurrent Copilot branches produce higher per-agent success rates than high-parallelism days (3+ branches).
Sub-PR iterative model enables refinement: Branches like
copilot/sub-pr-28676allow Copilot to re-engage multiple times to address reviewer feedback — a pattern that supports incremental quality improvement.Failure Signals⚠️
Startup failure (
copilot/create-agentic-workflows): Infrastructure couldn't launch the workflow. This is a new failure mode — previously unseen in the dataset. Root cause unknown from available data; no conversation log retrievable (auth constraint).Snapshot-time in-progress agents: 2 out of 7 analysis days recorded agents as still in-progress at snapshot time (Apr 18, Apr 27). These are counted as partial results and skew completion rates downward. Not a true failure.
Gate-saturation without Copilot agent: Branches accumulating 14+
action_requiredgate sessions but no Copilot agent stall indefinitely (e.g.,fix-cli-integer-paramson Apr 23 with 18 gate-only sessions).High parallelism / diminishing returns: Apr 26 record of 5 simultaneous Copilot branches produced only 20% success vs. near-100% on single-branch days. Possible resource contention or snapshot-timing effects.
Prompt Quality Analysis 📝
High-Quality Task Characteristics
fix(go-logger): replace MCP build verification with native bash commands— precise, actionablefix:,feat:,refactor:,perf:): Found in ~75% of successful recent PRsNormalize report formatting guidelines across daily workflow promptsExample High-Quality PR Title:
Low-Quality Task Characteristics
againsuffix:fix-daily-issues-report-generator-again— signals the previous fix didn't hold; agents lack context on prior attemptcreate-agentic-workflows— too broad, prone to scope creep and startup failuresExperimental Analysis — Sub-PR Iteration Pattern Analysis
This run applied the experimental strategy: Sub-PR Iteration Pattern Analysis
What Was Measured
The sub-PR branching model (
copilot/sub-pr-NNNN) differs from standard Copilot branches: instead of a single end-to-end session, the branch accumulates multiple short "addressing comment" agent sessions as reviewers leave feedback. Today'scopilot/sub-pr-28676(PR #28688) showed:Findings
Effectiveness: Medium
Recommendation: Keep and refine — track how many commenting cycles a sub-PR branch averages before merge, and whether WIP labels correlate with lower gate pass rates.
Notable Observations
Startup Failure — New Pattern
The
copilot/create-agentic-workflowsbranch recorded 1startup_failureand 17skippedsessions. This is the first startup_failure observed across 16 days of tracking. The branch name suggests it may be a meta-workflow (testing the agentic workflow infrastructure itself), which could explain why startup conditions are more fragile.PR Ecosystem Health
Of the 1,000 PRs sampled from the Copilot swe-agent:
Label distribution reveals a healthy review process:
lgtm: 63 PRs — human approval signal workingsmoke-copilot/smoke-claude: 98 PRs — automated smoke test coverageneeds-work: 34 PRs — reviewer feedback loop activeConversation Log Availability
The agent conversation transcript for session 28676 returned a GitHub authentication error (
this command requires an OAuth token). Deep behavioral analysis (reasoning patterns, tool selection, error recovery) was not possible today. Infrastructure-level session metadata provided sufficient data for structural analysis.Trends Over Time
View 16-Day Historical Summary
Key trend: Agent success rate is inversely correlated with concurrent agent count. 1–2 agent days consistently show 100% success; 4–6 agent days show 20–50%.
Actionable Recommendations
For Users Writing Task Descriptions
fix:,feat:,docs:,perf:,refactor:): Provides semantic clarity for both the agent and gate routing.againsuffixes on repeat tasks: Instead, reference the original PR or describe what specifically failed last time. Agents lack memory of prior attempts without explicit context.create-agentic-workflowscorrelate with startup failures and skipped runs. Specific names likefix-mcp-timeout-issuedrive better outcomes.For System Improvements
Startup failure alerting: The
copilot/create-agentic-workflowsstartup_failure is the first observed. An alert or automatic retry for startup_failure outcomes would recover this work without manual intervention.Sub-PR session tracking: Add telemetry to correlate how many "addressing comment" re-engagements a single sub-PR branch accumulates before merge. This would reveal optimal reviewer-feedback cadence.
Parallelism cap consideration: Apr 22 and Apr 26 (5–6 concurrent agents) showed the lowest success rates. A soft cap at 3–4 concurrent Copilot branches may improve per-agent throughput.
For Tool Development
Statistical Summary
Next Steps
startup_failureroot cause oncopilot/create-agentic-workflowsAnalysis generated automatically on 2026-04-27
Run ID: §24993437782
Workflow: Copilot Session Insights
Beta Was this translation helpful? Give feedback.
All reactions