teadata)teadata is a snapshot-first Python engine for Texas education data.
It provides:
District and Campus domain models>>pip install teadata
git clone https://github.com/adpena/teadata.git
cd teadata
uv sync --all-extras
from teadata import DataEngine
# Preferred runtime path: load the latest discovered snapshot.
engine = DataEngine.from_snapshot(search=True)
# District lookup by district number, campus number, or name.
aldine = engine.get_district("101902")
print(aldine.name)
# Campuses physically inside district boundaries.
for campus in aldine.campuses[:5]:
print(campus.name, campus.campus_number)
Primary imports:
from teadata import DataEngine, District, Campus
Core behaviors:
DataEngine.from_snapshot(...) supports .pkl and .pkl.gz snapshots and multiple payload shapes..cache, and parent .cache directories.District and Campus support dynamic metadata attributes through meta.Campus.to_dict() always includes percent_enrollment_change (numeric when available, otherwise "N/A").teadata is intentionally cache-first.
Artifacts typically used at runtime:
repo_*.pkl / repo_*.pkl.gz (engine snapshot)boundaries_*.sqlite (boundary WKB sidecar)map_payloads_*.sqlite (map payload sidecar)entities_*.sqlite (entity lookup sidecar)If snapshot/store files are Git LFS pointers or missing locally, runtime asset resolvers can fetch real files when URL env vars are provided.
TEADATA_SNAPSHOT: explicit snapshot path.TEADATA_SNAPSHOT_URL: URL used when snapshot candidate is missing or a Git LFS pointer.TEADATA_BOUNDARY_STORE: explicit boundary sqlite path.TEADATA_BOUNDARY_STORE_URL: URL fallback for boundary store.TEADATA_MAP_STORE: explicit map sqlite path.TEADATA_MAP_STORE_URL: URL fallback for map store.TEADATA_ENTITY_STORE: explicit entity sqlite path.TEADATA_ENTITY_STORE_URL: URL fallback for entity store.TEADATA_ASSET_CACHE_DIR: override cache directory used for downloaded assets.TEADATA_DISABLE_INDEXES: disable default spatial acceleration indexes.TEADATA_LOG_MEMORY: enable memory snapshot logging.DataEngine and Query chains use >>.
# Resolve district then expand to district-operated campuses.
q = engine >> ("district", "ALDINE ISD") >> ("campuses_in",)
# Filter, sort, and take.
top = (
q
>> ("filter", lambda c: (c.enrollment or 0) > 1000)
>> ("sort", lambda c: c.enrollment or 0, True)
>> ("take", 10)
)
rows = top.to_df(columns=["name", "campus_number", "enrollment"])
Supported lookup semantics include:
*, ?, SQL-like %/_)"123" and "'000123")Spatial and transfer helpers include:
nearest_charter_same_type(...)transfers_out(...) / transfers_in(...)teadata/enrichment provides registered enrichers for district and campus datasets.
Included enrichers cover:
Pipeline behavior is fault-tolerant by design: dataset-level failures are generally logged and do not hard-stop the full build.
teadata/load_data.py builds a full DataEngine and updates cached artifacts.
uv run python -m teadata.load_data
At a high level, it:
teadata/teadata_sources.yaml.cache/teadata-config)teadata/teadata_config.py provides YAML/TOML config loading, year resolution, schema checks, and dataset joins.
CLI entrypoint:
uv run teadata-config --help
Subcommands:
init <out.yaml>resolve <cfg> <section> <dataset> <year>report <cfg> [--json] [--min N] [--max N]join <cfg> <year> [--datasets a,b,c] [--parquet out.parquet] [--duckdb out.duckdb --table t]uv run pytest
Current tests cover:
percent_enrollment_change)PyPI defaults currently documented at:
100 MB10 GBReference: https://docs.pypi.org/project-management/storage-limits/
Before packaging trim, 0.0.118 artifacts were above the per-file limit:
448 MB446 MBCurrent slimmed artifacts are below the limit:
dist/teadata-0.0.118-py3-none-any.whl about 74 MBdist/teadata-0.0.118.tar.gz about 72 MBTo stay under PyPI file limits while preserving runtime behavior:
.pkl.gz) and selected sidecars (boundaries_*.sqlite, entities_*.sqlite)..pkl files are excluded from distributions.map_payloads_*.sqlite is excluded from distributions; provide it at runtime via TEADATA_MAP_STORE or TEADATA_MAP_STORE_URL.TEADATA_*_URL can hydrate missing local sidecars automatically.v0.0.101, v0.0.102, …).Apache License 2.0. See LICENSE.