Skip to content

Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows#630

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/implement-dataset-automation
Draft

Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows#630
Copilot wants to merge 4 commits intomainfrom
copilot/implement-dataset-automation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

The dataset collection code was tightly coupled to Azure DevOps/NAV. This removes all ADO logic and replaces it with GitHub-native tooling, adds a PR screener with minimum quality gates, and wires up two new CI workflows for the candidate pipeline.

Removed

  • ado_client.py, ado_utils.py, collect_nav.py, version_resolver.py, build_entry.py
  • collect nav CLI command
  • ado_token / resolve_ado_token from Config

Added: bcbench collect screen

Screens a BCApps PR against minimum dataset requirements before collection:

  • ≥ 2 project paths (ensures both fix and test project are touched)
  • Non-empty fix patch and test patch
  • At least one extractable test codeunit

Exits with code 1 on failure, making it CI-friendly.

$ bcbench collect screen 12345
Screening PR #12345 from microsoft/BCApps
--------------------------------------------------
Project paths (2):
  - App\Apps\W1\Sustainability\app
  - App\Apps\W1\Sustainability\test

✓ >= 2 project paths: PASS (2 found)
✓ Fix patch present: PASS
✓ Test patch present: PASS
✓ Testable functions found: PASS (1 found)

Result: PASSED - Suitable for dataset inclusion

Added: GitHub Actions workflows

screen-candidate.yml — manual dispatch; takes a PR number + optional repo, runs bcbench collect screen and reports pass/fail.

verify-candidate.yml — manual dispatch; takes an instance ID already on a branch, spins up a BC container, and runs Verify-BuildAndTests.ps1 — same verification path as dataset-validation.yml but for a single candidate entry.

Original prompt

Do NOT need to worry about forks, now implement

[Chronological Review: The conversation began with the user requesting a review of the codebase and brainstorming solutions for issues with the existing dataset in `bcbench.jsonl`. The user highlighted that many tasks from older versions were obsolete and that capabilities from version 28 were not represented. The need to automate dataset collection from the BCApps repository was emphasized, along with a plan to rewrite the `collection` code to focus on GitHub. The user then specified requirements for the screener and verification workflows, indicating preferences for simplicity and prototype development. The conversation concluded with the user instructing not to worry about forks and to proceed with implementation.]

[Intent Mapping:

  1. "Review the codebase and help me brainstorm" - User seeks assistance in addressing dataset issues.
  2. "We need to build a smart way to automate the dataset collection from BCApps repository." - User wants to automate data collection due to migration from NAV.
  3. "We should probably rewrite it, ignore all the ADO related logic, focus on GitHub." - User requests a rewrite of the collection code.
  4. "For the screener: it need >= 2 project paths..." - User specifies requirements for the screener workflow.
  5. "For the verify-candidate workflow, can we use the dataset validation and verification workflow?" - User inquires about workflow integration.
  6. "Do NOT need to worry about forks, now implement." - User instructs to proceed with implementation without considering forks.]

[Technical Inventory:

  • Dataset: bcbench.jsonl - Contains tasks and instances for the project.
  • Repository: BC-Bench - Managed by Microsoft, current branch is main.
  • Workflows: Dataset verification, screener, and verify-candidate workflows discussed for automation and validation.]

[Code Archaeology:

  • File: collection - Code related to dataset collection that needs rewriting.
  • Workflows: Discussion on screener and verification workflows, focusing on simplicity for prototyping.]

[Progress Assessment:

  • Completed: Initial brainstorming and identification of issues with the dataset.
  • Pending: Implementation of the new collection code and workflows as per user specifications.]

[Context Validation: All critical information for continuation is captured, including user requests for automation, workflow requirements, and the focus on GitHub.]

[Recent Commands Analysis:

  • Last Agent Commands: User requested to implement the new workflows and collection code without considering forks.
  • Tool Results Summary: No specific tool results were returned as the focus was on user requests and instructions.
  • Pre-Summary State: The agent was actively discussing the implementation of workflows and dataset collection automation when the token budget was exceeded.
  • Operation Context: The commands executed were directly related to the user's goal of automating dataset collection and improving workflow efficiency.]
1. Conversation Overview: - Primary Objectives: - "Review the codebase and help me brainstorm" - Address dataset issues. - "We need to build a smart way to automate the dataset collection from BCApps repository." - Automate data collection due to NAV migration. - "Do NOT need to worry about forks, now implement." - Proceed with implementation. - Session Context: The conversation flowed from identifying issues with the dataset to discussing automation and workflow requirements, culminating in a directive to implement changes. - User Intent Evolution: The user transitioned from brainstorming to specifying detailed requirements for workflows and ultimately directed to implement without considering forks.
  1. Technical Foundation:
  • Dataset: bcbench.jsonl - Contains tasks and instances for the project.
  • Repository: BC-Bench - Managed by Microsoft, current branch is main.
  • Workflows: Dataset verification, screener, and verify-candidate workflows discussed for automation and validation.
  1. Codebase Status:
  • File Name: collection
    • Purpose: Code related to dataset collection that needs rewriting.
    • Current State: Needs to be rewritten to focus on GitHub.
    • Key Code Segments: Not specified yet, but will involve automation logic.
  • Workflows: Discussion on screener and verification workflows, focusing on simplicity for prototyping.
  1. Problem Resolution:
  • Issues Encountered: Obsolete tasks in the dataset and the need for automation.
  • Solutions Implemented: Initial brainstorming and identification of requirements for new workflows.
  • Debugging Context: No specific debugging mentioned, but the focus is on creating a prototype.
  • Lessons Learned: Emphasis on simplicity and adaptability in the prototype phase.
  1. Progress Tracking:
  • Completed Tasks: Initial brainstorming and identification of dataset issues.
  • Partially Complete Work: Implementation of new collection code and workflows is pending.
  • Val...

Created from VS Code.

Copilot AI and others added 2 commits April 28, 2026 09:49
…idate workflows

- Remove ado_client.py, ado_utils.py, collect_nav.py, version_resolver.py, build_entry.py
- Remove ADO config fields (ado_token, resolve_ado_token)
- Remove `collect nav` CLI command
- Add ScreeningResult dataclass and screen_gh_candidate() to collect_gh.py
- Add `collect screen` CLI command with pass/fail output
- Add screen-candidate.yml and verify-candidate.yml GitHub Actions workflows
- Update tests: replace NAV CLI tests with screen CLI tests

Agent-Logs-Url: https://github.com/microsoft/BC-Bench/sessions/52de5655-9bb5-4d4f-b23c-21dc80a3ddda

Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement automation for dataset collection from BCApps repository Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows Apr 28, 2026
Copilot AI requested a review from haoranpb April 28, 2026 09:55
…ve failure handling and documentation

Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants