Methodology

How FiscalGuard works

FiscalGuard combines unsupervised machine learning with a cryptographic audit ledger to surface high-risk transactions in public-sector spending and provide a verifiable record of every analysis performed.

1. Feature engineering

For each transaction we derive seven numerical features designed to capture the signals auditors look for:

log-scaled amount magnitude
amount z-score within the issuing department
vendor rarity (inverse log frequency)
department rarity
day-of-month and day-of-week
"suspiciously round" amount indicator

2. Anomaly scoring

Isolation Forest (Liu, Ting & Zhou, 2008) builds an ensemble of random isolation trees; transactions that are isolated with shorter average path lengths receive higher anomaly scores.

Local Outlier Factor (Breunig et al., 2000) compares each transaction's local density to that of its k-nearest neighbors, surfacing context-dependent outliers that a global model would miss.

A user-controlled contamination parameter sets the cutoff: e.g. 5% flags the top 5% of scores.

3. SHA-256 audit ledger

Every analysis run produces a canonical JSON of the flagged transactions, hashed with SHA-256. That hash, together with the run parameters, timestamp, and the prior block's hash, is itself hashed to form an append-only chain.

The Audit Ledger page can recompute every block hash and confirm the chain has not been tampered with — a civic-tech analogue of the blockchain-anchored audit trails described in JFMIP's federal financial-management playbook and Treasury's Bureau of the Fiscal Service grants pilots.

4. Intended users

Government auditors and Inspectors General
City and state comptrollers
Oversight bodies and legislative budget offices
Civic-tech researchers, NGOs, and data journalists

5. Data sources

The dashboard ships with snapshots of real public spending data fetched directly from official open-data portals. None of the preloaded samples are synthetic. FiscalGuard also accepts any user-uploaded CSV with date, vendor, department, and amount columns.

Loading sources…

Snapshots are deterministic strides taken at fetch time and capped at ~6,000 rows per source for in-browser performance. The original portals are the canonical source of truth.