Data Sources

The Decoded Rank, which is SchoolDecoder's school ranking adjusted for school context, is computed from public data only. Every input on this page is downloadable from the agency that publishes it. SchoolDecoder's contribution is the spatial join, the composite socioeconomic index, the Alpha model, the shrinkage step, and the editorial layer that turns numbers into a SchoolDecoder Scorecard.

This page lists every source, what it provides, the granularity, the update cycle, and the cases where we substitute one source for another. The intent is that a journalist or researcher could reproduce the inputs from the agency-of-record without going through SchoolDecoder.

Assessment data

State departments of education publish school-level test results. The format and the level of detail varies enormously by state, so SchoolDecoder writes a separate parser per state, then normalizes results into a common schema.

Neither Oregon nor Washington publishes school-level scale scores in its public Report Card data — only the proficiency-level distribution (the percent of tested students at Level 1, Level 2, Level 3, and Level 4). SchoolDecoder turns those four numbers into a single continuous index per school, grade, subject, and year, called the Performance Level Composite (or, in plainer copy, the achievement composite). The composite is a weighted average of the four levels, lands in a fixed range, and preserves the school's rank ordering against the unpublished underlying scale score. The "Why a composite, not a scale score?" subsection below explains the formula, the published evidence behind the rank-correlation claim, and what we will switch to if researcher-access scale scores become available.

Oregon — OSAS

Oregon administers the Oregon Statewide Assessment System (OSAS), built on the Smarter Balanced consortium's items and scale for grades 3–8 in ELA and math. The Oregon Department of Education's Assessment Group Reports publish school-level percent-by-performance-level (Levels 1 through 4) and tested counts for each grade and subject, but not school-level mean scale scores — those are retained inside ODE and not posted publicly. School-level English-language-learner and special-education shares are also published by ODE. Results are typically posted in fall following spring testing.

  • Source of record: Oregon Department of Education, Assessment Group Reports
  • Scale: Smarter Balanced consortium scale for grades 3–8 ELA/math
  • Public school-level metrics: percent-by-performance-level and tested counts (scale scores are not posted publicly)
  • SchoolDecoder modeling input: achievement composite derived from the four published levels
  • Subjects used at launch: ELA and math, grades 3–8
  • High school: Oregon tests grade 11. High schools can receive a separate HS Decoded Rank when the high-school assessment data clears eligibility; that rank is Alpha-only in the current release and is not blended with graduation, AP, or on-track outcomes.

Washington — OSPI

Washington administers the Washington Comprehensive Assessment of Student Learning, also a Smarter Balanced assessment for grades 3–8 in ELA and math. The Office of Superintendent of Public Instruction (OSPI) publishes the Washington State Report Card with school-level percent-by-performance-level and tested counts. As with Oregon, OSPI does not publish school-level mean scale scores; the public surface reports the four levels and the percent meeting standard. School-level ELL and special-education shares are also published by OSPI. The cut scores for grades 3–8 ELA and math match the Smarter Balanced consortium cut scores published in the Smarter Balanced technical report.

  • Source of record: Washington OSPI, Washington State Report Card
  • Scale: Smarter Balanced consortium scale for grades 3–8 ELA/math
  • Public school-level metrics: percent-by-performance-level and tested counts (scale scores are not posted publicly)
  • SchoolDecoder modeling input: achievement composite derived from the four published levels
  • Subjects used at launch: ELA and math, grades 3–8
  • High school: Washington tests grade 10. High schools can receive a separate HS Decoded Rank when the high-school assessment data clears eligibility; that rank is Alpha-only in the current release and is not blended with graduation, AP, or on-track outcomes.

High-school outcome context

High-school pages may show graduation, AP coursework, AP exam, SAT/ACT participation, and 9th-grade on-track context when public source files are available. These are descriptive outcome fields, not ingredients in the current HS Decoded Rank. The HS rank is Alpha-only: it uses the high-school tested grade, public socioeconomic context, shrinkage, and HS-pool eligibility rules.

The K-8 Decoded Rank composite and the HS Alpha-only rank are intentionally separate. K-8 pages use the v1.2 composite that blends Alpha with cohort progression, performance spread, trajectory, and participation. HS pages do not yet have an equivalent multi-signal composite because the public HS outcome sources have different lags, coverage, and definitions by state.

AP coursework and AP exam fields can come from different source years. Course enrollment is usually sourced from CRDC-style coursework access files, while exam takers and 3+ pass rates are sourced from AP exam outcome files when a state publishes them. A school can therefore show "AP coursework" for one source year and "AP exams" for another; the high-school card prints those source years separately when both are present.

Why Smarter Balanced for the cross-state metro

Portland-Vancouver crosses the Oregon-Washington line. Because both states use Smarter Balanced for grades 3–8 ELA and math on a common scale, raw scores in those grades and subjects are comparable across the state line. This is the only setting in which SchoolDecoder shows a single cross-state raw Achievement Rank. For all other cross-state metros, raw rankings are shown within each state separately.

Why a composite, not a scale score?

Both ODE and OSPI keep the underlying scale-score distributions inside their assessment-services groups and release only the four-level summary to the public Report Card surface. Anyone can confirm this by opening a school's page on the Oregon Assessment Group Reports portal or the Washington State Report Card and reading what is posted: tested counts, percent at Level 1, Level 2, Level 3, and Level 4, and percent meeting standard. There is no public column for mean scale score.

SchoolDecoder needs a continuous, school-level performance measure for the Alpha model — a single number that says "this is roughly how the school's tested students did this year." Percent meeting standard alone collapses the four levels into one cut-point and throws away the difference between a school where half the non-proficient students are at Level 2 and one where they are at Level 1. The four levels carry that information; we just need to turn them into a single number.

The standard way to do this, used by most state accountability systems when scale scores are not posted, is a Performance Level Composite (sometimes called an Achievement Level Indicator). For each school, grade, subject, and year, SchoolDecoder computes:

composite = (1 × %L1 + 2 × %L2 + 3 × %L3 + 4 × %L4) / (%L1 + %L2 + %L3 + %L4) × 100

The result lands in 100, 400 and is unit-invariant — the formula returns the same number whether the input levels are expressed as 0–100 percentages or 0–1 fractions. A school where every tested student is at Level 1 scores 100; a school where every tested student is at Level 4 scores 400. A school evenly split across the four levels lands at 250.

The composite is a derived index, not a scale score. We do not call it a scale score on any SchoolDecoder page, in the API, or in editorial copy. The advantage of the composite is that it preserves rank ordering: across the published validation literature, the rank correlation between a Performance Level Composite and the true scale score on the same students is typically at or above 0.95. The disadvantage is that it is lossier than the underlying scale score — a fourteen-point spread between two schools on the composite does not correspond to a clean fourteen-point spread in the underlying continuous score, and SchoolDecoder treats composite distances as ordinal rather than cardinal once they hit the Alpha model. The Alpha step z-scores the composite within each (state, grade, subject, year) cell before running the regression, so most of the residual ordering is preserved through the pipeline.

If researcher-access scale scores become available — for example through a state DOE data request or a NAEP-linked calibration — SchoolDecoder will switch the modeling input to the true scale score without changing the published rankings dramatically. The rank-correlation evidence suggests most schools' Decoded Ranks will move by at most a few positions, and the methodology page will be updated to note the switch.

Expansion roadmap

State coverage expands as parsers are written, validated against the common schema, and reviewed against the model checklist. The order is shaped by assessment compatibility and metro priority.

PhaseStatesReason
LaunchOregon, WashingtonPortland-Vancouver launch metro; shared Smarter Balanced scale
Near-termCalifornia, Connecticut, MassachusettsSmarter Balanced (CA, CT) plus a strong-thesis market (MA)
Mid-termTexas, Colorado, Florida, Georgia, Illinois, Virginia, North CarolinaLarge states with state-specific scales; each gets its own parser
Long-termRemaining statesEDFacts proficiency as a baseline; state DOE data where parsers exist

School directory — NCES CCD

The National Center for Education Statistics' Common Core of Data (CCD) is the canonical school directory for U.S. public schools. CCD provides the school name, NCES ID, address, latitude/longitude, district (LEA), grades offered, enrollment, free/reduced-price lunch counts, locale code, charter and magnet flags, and student-teacher ratio.

SchoolDecoder accesses CCD via the Urban Institute's Education Data Portal, which harmonizes CCD variables and exposes them through a clean JSON API. The Urban Institute distributes CCD under the Open Database License (ODbL); the underlying NCES data is in the public domain.

  • Source of record: NCES Common Core of Data, accessed via Urban Institute Education Data Portal
  • Granularity: school for most fields; LEA for English-language-learner and special-education counts
  • Update cycle: annual, typically finalized in October–November
  • Coverage: all U.S. public schools

Civil rights data — CRDC

The Civil Rights Data Collection (CRDC) is the federal source for school-level English-language-learner enrollment, special-education enrollment, discipline, AP/IB course access, and wider course access. CRDC is biennial and lagged — the most recent published collection is typically about two years behind the current year.

For ELL and special-education shares, SchoolDecoder prefers state DOE data where the state publishes school-level numbers cleanly. CRDC is used as a fallback when state-level data is not available, and LEA-level CCD numbers are used only as a last resort. Each school record stores which source supplied its ELL and special-education share so the provenance is auditable.

  • Source of record: Civil Rights Data Collection, U.S. Department of Education
  • Access: Urban Institute Education Data Portal or direct ED downloads
  • Granularity: school
  • Update cycle: biennial with a roughly two-year lag
  • Use: ELL share, special-education share, course access (where shown), discipline (where shown)

Census ACS 5-year

The American Community Survey (ACS) 5-year estimates are the primary socioeconomic input to the SES index. SchoolDecoder uses tract-level data on median household income, poverty rate, and adult educational attainment. The 5-year pooled estimates are deliberately preferred over 1-year estimates because pooled estimates are more stable across small geographies; we do not want noisy single-year tract fluctuations driving a school's ranking.

  • Source of record: U.S. Census Bureau, ACS 5-year Detailed Tables
  • Granularity: census tract (block group used where tract is unavailable)
  • Update cycle: annual release, with each release covering a 5-year pooled window
  • Variables used: B19013 (median household income), B17001 (poverty status), B15003 (educational attainment)
  • Vintage at launch: latest available 5-year window at the time of each computation run

Historical rank context vintage

Historical school rows currently use the same normalized context vintage as the current release rather than a separate ACS vintage for each assessment year. For example, a 2019 historical rank may use the same ACS 5-year tract context used by the current release. This is intentional for the current product: historical movement is meant to show how the school's results and ranks move under one consistent public-context frame, not to model year-by-year neighborhood change.

A strict same-year historical rebuild would require pulling and validating the matching ACS 5-year windows for each assessment year, rebuilding the tract joins and SES index for those years, and then recomputing Alpha, ranks, peer groups, and exports. SchoolDecoder may add that mode later, but the current historical tables should be read as fixed-context rank history.

NCES EDGE geocodes

NCES's Education Demographic and Geographic Estimates (EDGE) program publishes school latitude/longitude geocodes, locale classification, and CBSA assignment. EDGE geocodes are how SchoolDecoder places each school on a map and how a school is associated with a Census tract through a spatial join. EDGE is also where the locale classification (city, suburb, town, rural) and the urban-centric locale code come from.

  • Source of record: NCES EDGE
  • Granularity: school point and school-to-geography crosswalks
  • Update cycle: annual

OMB CBSA definitions

Metropolitan and micropolitan statistical areas (CBSAs) are defined by the Office of Management and Budget. CBSA definitions are county-based and stable from year to year, with periodic revisions. Each school's county comes from CCD; the county is mapped to a CBSA using the OMB delineation file. This produces the metro assignment for every school in the country.

For the Portland-Vancouver launch metro, the relevant CBSA code is 38900 and the county composition is Clackamas, Columbia, Multnomah, Washington, and Yamhill in Oregon, plus Clark and Skamania in Washington.

  • Source of record: OMB Bulletin Delineation Files
  • Granularity: county-to-CBSA crosswalk
  • Update cycle: revised periodically; SchoolDecoder updates when OMB publishes a new bulletin

How a school is matched to a census tract

SchoolDecoder uses the school latitude/longitude from NCES EDGE and the Census Bureau's TIGER/Line tract polygons to assign each school to a tract. The operation is a spatial point-in-polygon join: the school's geocode is the point; the tract is the polygon containing it.

import geopandas as gpd

schools = gpd.read_file("schools_with_latlong.geojson")
tracts = gpd.read_file("census_tracts.geojson")
schools_with_tracts = gpd.sjoin(
    schools, tracts, how="left", predicate="within"
)

The tract assignment then drives the ACS lookup: the median household income, poverty rate, and adult educational attainment for that tract enter the SES index for that school.

A school's geocode places its building. It does not describe the residential distribution of enrolled students. For neighborhood schools, the building's tract is a reasonable proxy for the families served. For opt-in schools, magnets, charters, virtual schools, and boundary-exception schools, the tract is a less accurate proxy. SchoolDecoder surfaces this with an enrollment-model badge on affected pages, and the caveat is one of the named limitations on the limitations page.

SES index construction

SchoolDecoder does not use any single variable as the school's socioeconomic context. The SES index is a composite of percentiles drawn from several sources:

  • ACS tract median household income
  • ACS tract poverty rate
  • ACS tract adult educational attainment
  • Free/reduced-price lunch share from CCD, with adjustments for the Community Eligibility Provision (described below)
  • Direct-certification rate where the state publishes it
  • State-level "economically disadvantaged" field where the state publishes it

The composite is more robust than any single variable. It is also more robust to Community Eligibility Provision distortions, which are described in the limitations page.

Data freshness

Updated within 2 weeks of state release for monitored launch states. National coverage updates state-by-state as data becomes available.

Concretely, that means:

  • For Oregon and Washington, SchoolDecoder ingests new state release files, runs the parser and validation pipeline, and rebuilds affected pages within two weeks of public release.
  • For Census ACS, the SES index is recomputed after each annual ACS 5-year release, typically in December, and then propagated through Alpha and ranks.
  • For NCES CCD, the school directory and demographic context are refreshed after each annual CCD finalization, typically in October–November.
  • For CRDC, the ELL and special-education context is refreshed after each biennial CRDC release.

Every school page surfaces three vintages near the bottom of the page: the assessment year, the ACS vintage, and the CCD year used for that school's calculation. The page's last-updated date is also shown.

Update cadence per source

SourceCadenceTypical timing
State DOE assessment file (Oregon, Washington)AnnualFall following spring testing
NCES CCD directory + demographicsAnnualOctober–November
NCES EDGE geocodesAnnualAligned with CCD
CRDCBiennial, laggedAbout two-year lag
Census ACS 5-yearAnnual release of 5-year poolDecember
OMB CBSA definitionsPeriodicWhen OMB publishes a new bulletin
SchoolDecoder Alpha + ranksPer release windowWithin two weeks of state release

Source attribution and licensing

The underlying data sources used by SchoolDecoder are public.

  • State assessment data is published by state departments of education and is in the public domain unless the state's site terms note otherwise.
  • NCES CCD, CRDC, and EDGE data are produced by the U.S. Department of Education and are in the public domain.
  • Urban Institute's Education Data Portal harmonizes CCD and CRDC and is distributed under the Open Database License (ODbL); attribution to Urban Institute is appropriate when their harmonized form is used.
  • Census ACS data is produced by the U.S. Census Bureau and is in the public domain.
  • OMB CBSA delineation files are produced by the Office of Management and Budget and are in the public domain.

SchoolDecoder's derived outputs — School Alpha, the Decoded Rank, the SES index, the rank shift, and the SchoolDecoder Scorecard — are SchoolDecoder's. For citation guidance and access tiers, see SchoolDecoder data and access.