For funders

Fund the benchmark the
field doesn't yet have.

We are seeking $500K - $1M to build k12eval-bench v1: a 5,000-item, 3-rater gold-standard evaluation set for AI grading in K-12, released under open licenses and made permanently public.

The funded artifact remains public infrastructure. Cograder is the steward, not the owner. Every AI grading system in K-12, including our own, will be measured against the same yardstick.

Lead funders are recognized in the dataset citation, the published methodology paper, and the bench's permanent attribution.

What the funding pays for
  • ·3-rater scoring at scale. 5,000 items × 3 expert teachers = 15,000 paid scoring sessions, plus reconciliation.
  • ·Methodology paper + datasheet. Peer-reviewable documentation of how the bench was built, who's in it, and how to use it.
  • ·Open eval suite + leaderboard. Submission API and public leaderboard for any model, any vendor.
  • ·Bilingual + fairness coverage. Stratified sampling across language, grade band, and student demographics.
Submit application
The case for funding

Why this can't be funded the usual way.

AI grading is being deployed in K-12 classrooms faster than the field can study it. Districts are buying tools without a shared way to evaluate them. Vendors self-report. Researchers don't have a current corpus to study. State procurement officers don't have a yardstick.

Foundational research exists (ASAP, PERSUADE, ETS, work from the AES community) but it predates the LLM era, covers narrow tasks, and was built before AI grading became something teachers actually used at scale.

No commercial actor has both the data and the incentive to build the missing infrastructure alone. Vendors won't release a benchmark that scores them objectively. Districts can't fund this from procurement budgets. Foundation model labs evaluate themselves on their own terms.

This is the kind of gap foundations exist to fill: a public-good infrastructure project that has a clear public benefit, a credible steward willing to release the work openly, and a multi-year scope too large for any single actor to build in isolation.

Funder tiers

Three tiers. Defined attribution. Public output.

All funders contribute to the same public benefit project. Tier defines attribution and engagement, not ownership.

Lead funder
$2M+
1-2 slots
What you get
  • ·Name on the project at every level
  • ·Attribution in every methodology paper and dataset citation
  • ·Advisory seat on framework governance
  • ·Annual readout from the research team
Major funder
$500K - $2M
3-5 slots
What you get
  • ·Attribution in dataset citations and the annual State of AI Grading report
  • ·Listed prominently on funder roster
  • ·Early access to preprints and methodology drafts
  • ·Quarterly research updates
Supporting funder
$100K - $500K
Multiple slots
What you get
  • ·Acknowledged in datasheet credits
  • ·Listed on funder roster
  • ·Annual research summary
  • ·Welcomed contributor to a public-good project
Program scope

What v1 will build.

Five workstreams, sequenced over a multi-year program. Detailed budget, milestone schedule, and reporting cadence are shared with funders during due diligence.

01
A trained evaluator network
A standing roster of credentialed K-12 teachers, formally calibrated as research raters and paid for their time. The single largest investment in the program. Without this, every metric on the leaderboard is contestable.
02
A gold-standard bench at scale
10,000+ student responses across writing and math, independently scored by three expert raters and reconciled to consensus. Stratified by grade, subject, language, and student demographics. Versioned, datasheeted, and citable.
03
Peer-reviewed methodology
A formal framework document defining metrics, sample sizes, rater calibration, and fairness analysis for any IRR study on AI grading. Target venues: NeurIPS, AERA, EDM, AIED. Includes a separate fairness audit toolkit for districts.
04
Comprehensive vendor and model coverage
Every major foundation model and every commercial AI grading vendor in the K-12 market, evaluated on the same yardstick. Annual rerun cycles as new models drop. Public submission API for vendors to enter their own systems.
05
Independent governance and the annual report
Expansion to an independent advisory board with seats for academic researchers, district leaders, and lead funders. Production of the annual State of AI Grading in K-12 report, the field's reference publication for the year.

Detailed budget allocation, milestone schedule, and quarterly reporting cadence are provided to funders during due diligence. The program is designed to support a single Lead funder underwriting the full scope, or a coordinated group of Major and Supporting funders covering individual workstreams.

The funded artifact remains public

K12Eval is steward, not owner.

All datasets are released under CC BY 4.0. All code, eval scripts, and the methodology framework are released under MIT. The bench, the leaderboard, and the annual State of AI Grading report are free to use, cite, and build on. Funders are recognized contributors to permanent public infrastructure.

Cograder, the convening organization, commits in writing that no funded artifact will be commercialized, paywalled, or restricted. The roadmap for v1 includes expanding governance to an independent advisory board with seats reserved for academic researchers, district leaders, and lead funders.

Application

Start a conversation.

Reviewed personally within 5 business days. Expect a response with the full prospectus, methodology spec, line-item budget, and next steps for due diligence.

Reviewed personally. Response within 5 business days.
Prefer to talk first?
For program officers and foundation leaders who want a direct conversation before formally applying.
[email protected]
© 2026 The K12Eval Project · Convened by cograder · Public benefit research initiative