Coding
PromptBeginner5 minmarkdown
Markdown Converter
Agent skill for markdown-converter
21
This file provides guidance to AI agents when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to AI agents when working with code in this repository.
This is an HDX (Humanitarian Data Exchange) scraper that downloads Common Operational Datasets - Administrative Boundaries (COD-AB) from OCHA's ArcGIS server (gis.unocha.org) and publishes them as country datasets to HDX.
# Setup environment uv sync source .venv/bin/activate pre-commit install # Run the pipeline python run.py # Or via taskipy: uv run task app # Run tests with coverage uv run task test # Run a single test pytest tests/test_cod_ab.py::TestCODAB::test_cod_ab -v # Linting and formatting uv run task ruff # Pre-commit will also run ruff on commit # Export requirements files uv run task export
__main__.py)generate_token()metadata_all.parquet (all versions) and metadata_latest.parquet (latest per country)src/hdx/scraper/cod_ab_country/ ├── __main__.py # Entry point, main pipeline ├── config.py # Configuration/environment variables ├── utils.py # Core utilities (token, HTTP client, metadata access) ├── dataset.py # HDX dataset generation ├── dataset_utils.py # GDB comparison to detect changes ├── formats.py # Format conversion (GDB/SHP/GeoJSON/XLSX) ├── download/ │ ├── utils.py # Field parsing from ArcGIS │ ├── metadata/ │ │ ├── __init__.py # Download global metadata │ │ └── refactor.py # Transform metadata table │ └── boundaries/ │ ├── __init__.py # Download country boundaries │ ├── feature.py # Download individual feature layer │ └── refactor.py # Normalize boundary data └── config/ └── hdx_dataset_static.yaml # Static HDX metadata (license, etc.)
config.py: Centralizes all configuration via environment variables (ArcGIS credentials, retry settings, GDAL options, ISO3 filtering)utils.py: HTTP client with retry logic (tenacity), token generation, layer list extraction, metadata retrievaldownload/metadata/: Downloads global metadata table via ESRIJSON, refactors into two Parquet files (all versions + latest)download/boundaries/: Downloads Feature Layers per country, converts ESRIJSON to normalized GeoParquetformats.py: Converts GeoParquet to GDB, SHP (zipped), GeoJSON, and XLSX using GDAL CLIdataset.py: Builds HDX Dataset objects with metadata, notes, tags, and file resourcesdataset_utils.py: SHA256 comparison of GDB files to prevent re-uploading unchanged data┌─────────────────────────────────────────────────────┐ │ ArcGIS Server (gis.unocha.org) │ │ COD_Global_Metadata + cod_ab_XXX_vYY services │ └───────────────────────┬─────────────────────────────┘ │ ESRIJSON queries ↓ ┌─────────────────────────────────────────────────────┐ │ GDAL Vector Operations │ │ (read ESRIJSON, validate geometry, convert) │ └───────────────────────┬─────────────────────────────┘ ↓ ┌─────────┴─────────┐ ↓ ↓ metadata.parquet boundaries.parquet (all + latest) (normalized GeoParquet) │ │ └─────────┬─────────┘ ↓ ┌─────────────────────────────┐ │ Format Conversion (GDAL) │ │ GDB, SHP, GeoJSON, XLSX │ └─────────────┬───────────────┘ ↓ ┌─────────────────────────────┐ │ GDB Compare (vs existing) │ └─────────────┬───────────────┘ ↓ ┌─────────────────────────────┐ │ HDX Dataset Generation │ │ + Upload (create_in_hdx) │ └─────────────────────────────┘
@retry decoratorEnvironment variables (or
.env file):
ARCGIS_USERNAME, ARCGIS_PASSWORD: ArcGIS authenticationISO3_INCLUDE, ISO3_EXCLUDE: Filter countries to process (optional)Home directory files:
~/.hdx_configuration.yaml: HDX API key and site config~/.useragents.yaml: User agent config (key: hdx-scraper-cod-ab)gdal) must be installed and in PATHhdx-python-api, hdx-python-country, hdx-python-utilities)