Skip to content

Methodology

CalcFi Open Data is a CC-BY mirror, not a research dataset of original observations. Every value comes from a named primary source. This page is a reference for how the mirror is built and how its provenance is documented.

The full methodology paper (under review at SSRN) is published as a working paper.

Design principles

  1. Primary-source first. Every series carries the canonical agency URL in the CSV header (# Primary URL:). If a number changes upstream, the next refresh reflects that.
  2. Tidy schema. One row per observation, three canonical columns: date, value, unit. Comment lines (# ...) prefix the CSV so the file is both human-readable and Frictionless-compatible.
  3. Reproducible. Every commit corresponds to a Hugging Face dataset revision. Every release has a Figshare/Zenodo/OSF DOI.
  4. License-clear. Code is MIT. Data is CC-BY 4.0. Every primary source is itself CC-BY, public-domain (US Government), or a publicly-stated license-permitted mirror (Federal Reserve releases, World Bank Open Data).
  5. No paywalled redirects. Every primary URL in the catalog points to a publicly-available data page.

CSV layout

# Source: BLS via FRED (CPIAUCSL)
# Primary URL: https://fred.stlouisfed.org/series/CPIAUCSL
# License: Public domain (US Government)
# Updated: 2026-05-15
# Cadence: monthly
date,value,unit
1947-01-01,21.48,index
1947-02-01,21.62,index
...

The comment block is the provenance trail. Tools that strip CSV comments (pandas with comment="#", polars pl.read_csv(comment_prefix="#"), d3 with custom parsing) will see only the tidy columns.

Refresh cadence

Each series ships with a # Cadence: line in the header. Typical values:

  • Daily — Treasury yields, FX rates, energy
  • Weekly — Freddie Mac PMMS, EIA gasoline
  • Monthly — CPI, PCE, unemployment, hourly earnings, FDIC deposit rates
  • Quarterly — GDP per capita
  • Annual — World Bank inflation and unemployment
  • Snapshot — crypto (CoinGecko), FDIC deposit rates currently snapshot-only

The mirror is refreshed in batches. For the freshest numbers, always check the primary source. For reproducible analysis, pin to a specific dataset DOI revision.

Provenance schema

Every series ships a Frictionless datapackage.json with:

  • name — slug
  • title — human-readable name
  • licenseCC-BY-4.0 for the mirror; primary-source license listed in sources[0].license
  • sources[0].title — agency name + dataset identifier
  • sources[0].path — canonical primary URL
  • keywords — for catalog search
  • schema.fields — CSV column types and units
  • schema.primaryKey["date"]

This is what calcfidata.metadata(slug) returns.

Series selection criteria

Selected the 34 series because each:

  1. Has a long, consistent history (≥10 years for most; longer for CPI/Treasury/Fed Funds)
  2. Is referenced regularly in financial journalism and consumer-facing calculators
  3. Has a clear, free primary source
  4. Updates on a known cadence so the mirror cost is bounded

Crypto and deposit rates are intentionally snapshot-only — building reliable longitudinal mirrors for those requires either a paid API (out of scope) or a multi-year scrape window (in progress).

What's not included (and why)

  • Stock prices. Yahoo / Stooq / Tiingo all have ToS that conflict with re-distribution; CalcFi's individual calculators link out to the primary source instead.
  • Real-time intraday data. Out of scope; the mirror is for trend/research/calculator use, not live trading.
  • Country-by-country breakdowns of every World Bank indicator. Selected only the most-referenced US/world summaries to keep the package light.

Limitations

  • Snapshot-only series (crypto, FDIC deposits) have one observation. They're useful for cross-referencing against today's value but not for trend analysis.
  • The mirror lags primary sources by 0-7 days depending on cadence. For the freshest available number, use the primary URL.
  • Vintage data (revised vs originally-released values) follows the primary source. For most series, the value you see is the latest revision, not the original release.

Citation

If you use this dataset in published research, please cite it.