Reproducible evidence
Real plots, real datasets
This page separates measured demo evidence from conceptual architecture. The current plots are generated locally from public sklearn datasets and saved with CSV outputs.
Selective routing run
Dataset: Wisconsin Diagnostic Breast Cancer, 569 cases, 30 features, 2 classes. Model: standardized logistic regression. Split: 200 held-out test cases. Full-test accuracy in this run is 99.0%.
Conformal prediction-set run
Dataset: sklearn Digits, 1,797 handwritten digit images, 64 features, 10 classes. Model: standardized logistic regression. Split: train, 405 calibration cases, 450 test cases. Auto-execution means the conformal prediction set contains exactly one label.
Real agent benchmark context
The site also includes a fetched SWE-bench Verified test split: 500 real GitHub issue-fix tasks from public repositories. This is not a router result, but it grounds the enterprise-agent story in an actual software-agent benchmark rather than only classifier proxies.