April 2026 Updates
May 01, 2026
The WiNDC Household build is complete and will generate data for all years 2011 onward. I’ve also been working on comparing the Julia generated data to the GAMS generated data. So far, the data agrees up to a five percent difference on 99.9% of the data points. The remaining 0.1% of the data points have a difference of more than five percent, and I’m currently investigating those differences to understand why they occur. Overall, I’m pleased with the progress and the level of agreement between the two implementations.
Overview of YAML Files
In an effort to make the data easier to generate when new data is release, I’ve put all the configuration into a YAML file. YAML stands for “YAML Ain’t Markup Language” and is a human-readable data serialization format.
At a high level, YAML is designed to describe structured data in a way that is easy for people to read and edit. It is commonly used for configuration files because it is much less cluttered than formats such as XML, while still being expressive enough to represent nested objects, lists, strings, numbers, and boolean values.
The most important idea in YAML is that structure is defined by
indentation. A set of key: value pairs represents a
mapping, which is similar to a dictionary in Python. A list is written
with leading dashes, and nested content is created by indenting
underneath a key or list item. Because whitespace carries meaning,
consistent indentation is essential.
YAML supports a few basic kinds of values:
- scalars, such as strings, integers, floating point values, and booleans
- mappings, which associate keys with values
- sequences, which are ordered lists of values
It also supports comments using the # symbol, which
makes it useful for documenting configuration choices directly inside
the file. For longer text, YAML can represent multi-line strings in a
readable way, and it can be used to organize repeated or hierarchical
settings without much visual noise.
For example, a small YAML file might look like this:
dataset:
year: 2024
region: USA
include_households: true
sectors:
- agriculture
- manufacturing
- servicesIn this example, dataset is a mapping with several
fields, while sectors is a sequence. That combination of
named settings and lists is typical of how YAML is used in practice.
In short, YAML is best thought of as a clean, indentation-based language for describing data. It is not a programming language, but it is a convenient way to store parameters, options, and metadata in a form that both people and software can work with easily.
Updating the Household YAML File
To build the data you need three files:
- here
- here
- here - Contains `capital_tax_rates`, `labor_tax_rates`, `income_elasticities` and `windc_pce_share`
Extract the zip files and note the location of the extracted files.
The household.yaml file will need to be updated in a few
locations, to point at each of these files.
The first section of the household.yaml file is the
metadata section:
metadata:
title: Household Data Configuration
description: Configuration file for household data sources
census_api_key: census_api_key_here
bea_api_key: bea_api_key_here
save_data: true
maps:
state_map:
windc_naics_map:
years:
- 2024
- 2023
- 2022
- 2021
- 2020
- 2019
- 2018
- 2017
- 2014
- 2011You will need a census API key
and a BEA API key to
access the data. You can obtain these keys from the respective websites.
You can adjust which years of data you want to generate by modifying the
years list.
The maps section contains paths to the mapping files. If
these are empty, which they are by default, then we use the provided
mapping files which
can be found on the GitHub. If you have custom mapping files, you
can specify their paths here.
The next section of the YAML file is data.
Be sure to update the paths to the state table and the
capital_tax_rates, labor_tax_rates,
income_elasticities and windc_pce_share files.
Any field that says api: true does not need an updated path
as it will pull the data directly from the API using the provided
keys.
The final section details some magic numbers from specific government sources. They are included here so that they can be easily updated when new data is released.
Building the Data
To build the data, first make sure you have a Julia environment set
up and add the WiNDCHousehold package. I also recommend
adding DataFrames and MPSGE for working with
the data.
Finally, the code to build the data and run the model is as follows.
Be sure to update the hh_path variable to point at your
updated household.yaml file.
using WiNDCHousehold
using WiNDCHousehold.WiNDCContainer
using DataFrames
using MPSGE
hh_path = raw"update/me/to/point/at/household.yaml"
state_table, HH_Raw_Data = WiNDCHousehold.household_raw_data(hh_path)
HH = WiNDCHousehold.build_household_table(
state_table,
HH_Raw_Data;
)
M = household_model(HH);
MPSGE.solve!(M, cumulative_iteration_limit = 0)