How FiscalGuard works
FiscalGuard combines unsupervised machine learning with a cryptographic audit ledger to surface high-risk transactions in public-sector spending and provide a verifiable record of every analysis performed.
1. Feature engineering
For each transaction we derive seven numerical features designed to capture the signals auditors look for:
- log-scaled amount magnitude
- amount z-score within the issuing department
- vendor rarity (inverse log frequency)
- department rarity
- day-of-month and day-of-week
- "suspiciously round" amount indicator
2. Anomaly scoring
Isolation Forest (Liu, Ting & Zhou, 2008) builds an ensemble of random isolation trees; transactions that are isolated with shorter average path lengths receive higher anomaly scores.
Local Outlier Factor (Breunig et al., 2000) compares each transaction's local density to that of its k-nearest neighbors, surfacing context-dependent outliers that a global model would miss.
A user-controlled contamination parameter sets the cutoff: e.g. 5% flags the top 5% of scores.
3. SHA-256 audit ledger
Every analysis run produces a canonical JSON of the flagged transactions, hashed with SHA-256. That hash, together with the run parameters, timestamp, and the prior block's hash, is itself hashed to form an append-only chain.
The Audit Ledger page can recompute every block hash and confirm the chain has not been tampered with — a civic-tech analogue of the blockchain-anchored audit trails described in JFMIP's federal financial-management playbook and Treasury's Bureau of the Fiscal Service grants pilots.
4. Intended users
- Government auditors and Inspectors General
- City and state comptrollers
- Oversight bodies and legislative budget offices
- Civic-tech researchers, NGOs, and data journalists
5. Data sources
The dashboard ships with snapshots of real public spending data fetched directly from official open-data portals. None of the preloaded samples are synthetic. FiscalGuard also accepts any user-uploaded CSV with date, vendor, department, and amount columns.
Loading sources…
Snapshots are deterministic strides taken at fetch time and capped at ~6,000 rows per source for in-browser performance. The original portals are the canonical source of truth.