Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows#630
Draft
Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows#630
Conversation
…idate workflows - Remove ado_client.py, ado_utils.py, collect_nav.py, version_resolver.py, build_entry.py - Remove ADO config fields (ado_token, resolve_ado_token) - Remove `collect nav` CLI command - Add ScreeningResult dataclass and screen_gh_candidate() to collect_gh.py - Add `collect screen` CLI command with pass/fail output - Add screen-candidate.yml and verify-candidate.yml GitHub Actions workflows - Update tests: replace NAV CLI tests with screen CLI tests Agent-Logs-Url: https://github.com/microsoft/BC-Bench/sessions/52de5655-9bb5-4d4f-b23c-21dc80a3ddda Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>
…ment Agent-Logs-Url: https://github.com/microsoft/BC-Bench/sessions/52de5655-9bb5-4d4f-b23c-21dc80a3ddda Co-authored-by: haoranpb <27280733+haoranpb@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Implement automation for dataset collection from BCApps repository
Remove ADO collection, add GitHub-based dataset screener and candidate verification workflows
Apr 28, 2026
…ve failure handling and documentation Co-authored-by: Copilot <copilot@github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The dataset collection code was tightly coupled to Azure DevOps/NAV. This removes all ADO logic and replaces it with GitHub-native tooling, adds a PR screener with minimum quality gates, and wires up two new CI workflows for the candidate pipeline.
Removed
ado_client.py,ado_utils.py,collect_nav.py,version_resolver.py,build_entry.pycollect navCLI commandado_token/resolve_ado_tokenfromConfigAdded:
bcbench collect screenScreens a BCApps PR against minimum dataset requirements before collection:
Exits with code 1 on failure, making it CI-friendly.
Added: GitHub Actions workflows
screen-candidate.yml— manual dispatch; takes a PR number + optional repo, runsbcbench collect screenand reports pass/fail.verify-candidate.yml— manual dispatch; takes an instance ID already on a branch, spins up a BC container, and runsVerify-BuildAndTests.ps1— same verification path asdataset-validation.ymlbut for a single candidate entry.Original prompt
Do NOT need to worry about forks, now implement
[Chronological Review: The conversation began with the user requesting a review of the codebase and brainstorming solutions for issues with the existing dataset in `bcbench.jsonl`. The user highlighted that many tasks from older versions were obsolete and that capabilities from version 28 were not represented. The need to automate dataset collection from the BCApps repository was emphasized, along with a plan to rewrite the `collection` code to focus on GitHub. The user then specified requirements for the screener and verification workflows, indicating preferences for simplicity and prototype development. The conversation concluded with the user instructing not to worry about forks and to proceed with implementation.][Intent Mapping:
[Technical Inventory:
bcbench.jsonl- Contains tasks and instances for the project.[Code Archaeology:
collection- Code related to dataset collection that needs rewriting.[Progress Assessment:
[Context Validation: All critical information for continuation is captured, including user requests for automation, workflow requirements, and the focus on GitHub.]
[Recent Commands Analysis:
1. Conversation Overview: - Primary Objectives: - "Review the codebase and help me brainstorm" - Address dataset issues. - "We need to build a smart way to automate the dataset collection from BCApps repository." - Automate data collection due to NAV migration. - "Do NOT need to worry about forks, now implement." - Proceed with implementation. - Session Context: The conversation flowed from identifying issues with the dataset to discussing automation and workflow requirements, culminating in a directive to implement changes. - User Intent Evolution: The user transitioned from brainstorming to specifying detailed requirements for workflows and ultimately directed to implement without considering forks.- Technical Foundation:
- Dataset:
- Repository: BC-Bench - Managed by Microsoft, current branch is main.
- Workflows: Dataset verification, screener, and verify-candidate workflows discussed for automation and validation.
- Codebase Status:
- File Name:
- Purpose: Code related to dataset collection that needs rewriting.
- Current State: Needs to be rewritten to focus on GitHub.
- Key Code Segments: Not specified yet, but will involve automation logic.
- Workflows: Discussion on screener and verification workflows, focusing on simplicity for prototyping.
- Problem Resolution:
- Issues Encountered: Obsolete tasks in the dataset and the need for automation.
- Solutions Implemented: Initial brainstorming and identification of requirements for new workflows.
- Debugging Context: No specific debugging mentioned, but the focus is on creating a prototype.
- Lessons Learned: Emphasis on simplicity and adaptability in the prototype phase.
- Progress Tracking:
- Completed Tasks: Initial brainstorming and identification of dataset issues.
- Partially Complete Work: Implementation of new collection code and workflows is pending.
- Val...
bcbench.jsonl- Contains tasks and instances for the project.collectionCreated from VS Code.