teadata

TEA Data Engine (teadata)

teadata is a snapshot-first Python engine for Texas education data. It provides:

Installation

PyPI

pip install teadata
git clone https://github.com/adpena/teadata.git
cd teadata
uv sync --all-extras

Quick Start

from teadata import DataEngine

# Preferred runtime path: load the latest discovered snapshot.
engine = DataEngine.from_snapshot(search=True)

# District lookup by district number, campus number, or name.
aldine = engine.get_district("101902")
print(aldine.name)

# Campuses physically inside district boundaries.
for campus in aldine.campuses[:5]:
    print(campus.name, campus.campus_number)

Public API Surface

Primary imports:

from teadata import DataEngine, District, Campus

Core behaviors:

Snapshot and Asset Behavior

teadata is intentionally cache-first.

Artifacts typically used at runtime:

If snapshot/store files are Git LFS pointers or missing locally, runtime asset resolvers can fetch real files when URL env vars are provided.

Environment Variables

Query DSL

DataEngine and Query chains use >>.

# Resolve district then expand to district-operated campuses.
q = engine >> ("district", "ALDINE ISD") >> ("campuses_in",)

# Filter, sort, and take.
top = (
    q
    >> ("filter", lambda c: (c.enrollment or 0) > 1000)
    >> ("sort", lambda c: c.enrollment or 0, True)
    >> ("take", 10)
)

rows = top.to_df(columns=["name", "campus_number", "enrollment"])

Supported lookup semantics include:

Spatial and transfer helpers include:

Enrichment Pipeline

teadata/enrichment provides registered enrichers for district and campus datasets.

Included enrichers cover:

Pipeline behavior is fault-tolerant by design: dataset-level failures are generally logged and do not hard-stop the full build.

Data Build Pipeline

teadata/load_data.py builds a full DataEngine and updates cached artifacts.

uv run python -m teadata.load_data

At a high level, it:

  1. resolves year-aware source paths from teadata/teadata_sources.yaml
  2. warm-loads compatible snapshot cache when signatures match
  3. otherwise builds districts/campuses from spatial files
  4. applies enrichment datasets
  5. writes snapshot + sqlite sidecars back to .cache/

Config and CLI (teadata-config)

teadata/teadata_config.py provides YAML/TOML config loading, year resolution, schema checks, and dataset joins.

CLI entrypoint:

uv run teadata-config --help

Subcommands:

Testing

uv run pytest

Current tests cover:

PyPI Size Limits and Current Packaging Status

PyPI defaults currently documented at:

Reference: https://docs.pypi.org/project-management/storage-limits/

Before packaging trim, 0.0.118 artifacts were above the per-file limit:

Current slimmed artifacts are below the limit:

To stay under PyPI file limits while preserving runtime behavior:

Release Policy

License

Apache License 2.0. See LICENSE.