Skip to main content

Aggregating WiNDC National

Mitch Phillipson January 06, 2026


This example will walk through the code in thisrepository. This repository contains an example aggregating theWiNDC National dataset to a size that can be run with a trial PATHlicense (less than 300 variables in the model).

We recommend using Julia with VSCode. Download VSCode from this link and Juliafrom this link. Afterinstalling Julia, install the Julia extension in VSCode by searching for“Julia” in the extensions tab.

Setting up the Environment

To run this code, clone the repository to your local machine. StartJulia in the repository folder and activate/instantiate theenvironment:

julia> ](v1.12) pkg> activate .(National_Aggregated.jl) pkg> instantiate

This will download the packages listed in Project.tomland create a Manifest.toml file. If you are familiar withPython, this is essentially creating a virtual environment for thisproject. If you are unfamiliar with environments, they are a way tospecify specific package versions for a project without affecting otherprojects on your machine.

In this example, we are specifying WiNDCNational version0.3.2 in the Project.toml file. We are doingthis because this version of WiNDCNational has asummary.yaml file that does not include 2024.We will run a second example demonstrating how to modify thesummary.yaml file in the WiNDCNational packageto include 2024 in a later example.

The First Example

The first example is in first_example.jl. This exampledemonstrates how to aggregate the sectors/commodities in the WiNDCNational dataset. Open this file, you can run each line by pressingShift + Enter in VSCode with the Julia extension installed.Feel free to run each line in the file. We will explain our logicbelow.

Loading Packages

We are loading packages with two different methods. The first methodis using, with loads all exported functions from thepackage. For example, using DataFrames means we can use allthe functions in the DataFrames package without prefixingthem with DataFrames. (e.g. use subset insteadof DataFrames.subset). The second method isimport, which only loads the given items. For example,import MPSGE means to use any function from theMPSGE package, we have to prefix it withMPSGE. (e.g. MPSGE.solve!). We also useimport WiNDCContainer:aggregate, this loads only theaggregate function from the WiNDCContainerpackage, so we can use aggregate without prefixing it withWiNDCContainer.

Building the Data

In WiNDCNational, we load the summary-level US Nationaldataset by calling:

build_us_table(:summary)

This function downloads the necessary data files and builds aNational object that contains the summary-level nationaldata. The WiNDCNational package provides a summary.yamlfile that defines where to download the data files and the structure ofthe data. We will discuss this file in the next example.

WiNDCNational has support to load the detailed-level table as well.You can load the detailed-level table by calling:

build_us_table(:detailed)

The detailed data is available for the years 2007, 2012, and 2017. Ifyou need a few detailed sectors/commodities, you can follow this exampleto aggregate the detailed data to your desired level of detail.

We do not automatically calibrate the data. This is primarily fortransparency, so users can see the data before calibration. To calibratethe data, we call:

summary, M = calibrate(X)

This function returns two items, a National object,summary, containing the calibrated data and a JuMP model,M, that contains the JuMP calibration model.

Exploring the Data

Before diving into the aggregation process, we should discuss how toexplore the data. The National object,summary, contains all the data in a structured format. Thisobject holds three dataframes: one that holds the data, the sets, andthe elements. The data can be viewed using table:

table(summary)

This dataframe has five columns: row,column, year, parameter andvalue. The first three columns define the domain of thedata, the parameter column defines the type of data(e.g. intermediate_demand, import, etc.), and the valuecolumn contains the actual data values. Each entry in a domain columncorresponds to an element in a set. We can understand this by firstlooking at the sets:

sets(summary)

The table below is a subset of the sets:

namedescriptiondomain
commodityCommoditiesrow
sectorSectorscol
yearyear
tradeTradecol
exportExportscol

The domain of the set commodity is row,meaning that the commodities are only in the row column ofthe data table. Similarly, the domain of the set sector iscol, meaning that the sectors are only in thecol column of the data table. We can extract all the datawith elements from a specific set by using the tablefunction with extra arguments. For example, to extract all the dataworking with commodities:

table(summary, :commodity)

The only entries in the row column of the data tablewill be commodities. Notice you didn’t need to specify therow column, because the set commodity only hasa domain of row. If you wanted further restrictions, suchas wanting only commodities and sectors, you can call:

table(summary, :commodity, :sector)

To view the elements in a set, we can use the elementsfunction. For example, to view all the commodities:

elements(summary, :commodity)

Similar to the table function, you can specify multiplesets. For example, to view all the commodities and sectors:

elements(summary, :commodity, :sector)

Parameters are sets with elements in the parametercolumn of the data. To view all the parameters, we can use a DataFramesubset:

subset(sets(summary), :domain => ByRow(==(:parameter)))

By convention, we take parameter names to be Title_Case (or each worduppercase and separated by underscores) with the element beingsnake_case. For example, the parameter Intermediate_Demandcorresponds to the element intermediate_demand:

elements(summary, :Intermediate_Demand)

Just like sets, parameters can be extracted from the data table:

table(summary, :Intermediate_Demand)

You may notice all the values in the above table are negative. Byconvention, we take inputs/demand/uses to be negative and outputs/supplyto be positive. This simplifies the computation of aggregate parametersas we can use DataFrame operations without adjusting the sign of thevalues. Signs can be flipped using the normalize keywordargument in the table function:

table(summary, :Intermediate_Demand; normalize = :Use)

The value of the normalize keyword can be any parameter.We used Use above as that is the parameter that containsall the uses (inputs/demand).

Some parameters, such as Use, are composite parameters.They are made up of multiple underlying parameters. The parameterUse is composed of all the parameters defined in the BEAUse table:

elements(summary, :Use)

Finally, you can filter the data to extract a single element from aset. For example, to extract the naics code 111CAcommodity:

table(summary, :commodity => Symbol("111CA"))

Or, to extract the naics code 111CA commodity in theyear 2023:

table(summary, :commodity => Symbol("111CA"), :year => 2023)

Aggregation

To aggregate a set we need to define a mapping from old elements tonew elements. In this example we will aggregate both commodities andsectors. The easiest way to define the mapping is using a CSV file andmodifying it in Excel. We’ve seen how to extract the commodities andsectors, now we just need to write them to a CSV file:

elements(summary, :commodity, :sector) |>     x -> CSV.write("sets/pre_aggregation.csv", x)

This uses a pipe, |>, in Julia. You can interpretthis as taking the output of the left-hand side and making it bex in the right-hand side. So the above code is equivalentto:

temp = elements(summary, :commodity, :sector)CSV.write("sets/pre_aggregation.csv", temp)

just with no temporary variable.

Open this file in Excel, add a column (ours is calledaggregate) that defines the new aggregated element for eachcommodity/sector. Save this file as sets/aggregation.csv.You can define any aggregation you want, we used the base NAICScodes to aggregate the commodities and sectors to their respective2-digit NAICS codes. You can add as many columns as you’d like, you justneed to select them when loading the file back into Julia.

Now we can load this file back into Julia:

aggregate_sectors = CSV.read(    "sets/aggregation.csv",     DataFrame,     select = [:aggregate, :name, :set],     types = Symbol    )

This command will read the CSV file into a DataFrame, only selectingthe columns aggregate, name, andset, and converting the values to Symboltype.

The aggregation is then trivial using the aggregatefunction:

X =  aggregate(    summary,     aggregate_sectors;    set = :set, #optional, defines which column contains the set names    old_element = :name, # optional, defines which column contains the old element names    new_element = :aggregate # optional, defines which column contains the new element names    ) 

The returned X is a new National objectwith the aggregated commodities and sectors. We can check the commodityelements to verify the aggregation worked:

elements(X, :commodity)

MPSGE Model

The model is created using the national_mpsgefunction:

M = national_mpsge(X)

To view the documentation for this function, you can either view theNationalModel Documentation, or use the Julia help mode by typing? in the Julia REPL and then enteringnational_mpsge.

By default, the model is only for the year 2023. If you want tocreate a model for a different year, you can specify theyear keyword argument. To create a model for 2017, use:

M_2017 = national_mpsge(X; year = 2017)

The model prints are quite long, as you may have observed. We areworking on a concise model view.

To solve the model, use the solve! function from theMPSGE package. We will first set thecumulative_iteration_limit to 0 so that we can verify thedata is calibrated at the benchmark solution:

MPSGE.solve!(M; cumulative_iteration_limit=0)

If the model solves successfully, you should see aPostsolved Residual that is very close to 0 (mine is around1e-11). After solving the model, you can extract resultsusing value. For example, to view the value of theY variable for agriculture,

MPSGE.value(M[:Y][:agriculture])

It’s not surprising that the value is 1, that is the benchmarksolution. Let’s set a counterfactual shock and resolve the model. Thereare three parameters available to shock: - Import_Tariff[commodity] -Absorption_Tax[commodity] - Output_Tax[sector]

Let’s set tariffs to be a flat 30% across all commodities:

MPSGE.set_value!(M[:Import_Tariff], 0.3)

Now we resolve the model:

MPSGE.solve!(M)

Now if we check the value of Y foragriculture again:

MPSGE.value(M[:Y][:agriculture])

We expect an increase in the output of agricultural goods of about2.3%. However, it is much more interesting if we could rebuild theentire data table to see the full effects of the tariff shock. We can dothis using the reconstruct_table function from theWiNDCNational package:

Y = WiNDCNational.reconstruct_table(X, M)

This is a new National object, Y, thatcontains the counterfactual data after the tariff shock. We can comparethe benchmark and counterfactual data using an inner join:

innerjoin(  table(X, :year => 2023),  table(Y, :year => 2023),  on = [:row, :col, :year, :parameter],  renamecols = "_benchmark" => "_counterfactual")

We are using an innerjoin as some parameters get aggregated in thereconstruction, like Investment_Final_Demand. We can viewthe elements for both X and Y to see thedifferences:

elements(X, :investment_final_demand)elements(Y, :investment_final_demand)

X is more disaggregated than Y. This is dueto the model using the aggregated data, but us preserving detail in theoriginal data. For a perfect comparison, we should aggregateX to the same level of detail as the model. We’ll leavethis as an exercise for the reader.

Unsurprisingly, the comparison shows both imports and exportsdecrease across most commodities due to the tariff shock. Feel free toexplore the data further, it is fairly straightforward to create graphsand tables from the dataframes.

The Second Example

If you explored the data in the previous example, you may havenoticed that the year 2024 is missing. This is because thesummary.yaml file in WiNDCNational version0.3.2 does not include 2024. In this example, we willdemonstrate how to modify the summary.yaml file to include2024 data.

The file that WiNDCNational is using is locatedhere. This is pinned to the v0.3.2 tag in theWiNDCNational.jl repository. Copy and paste the contents ofthis file into a new file. You can name this file anything you’d like,the repo currently has a summary.yaml file, name yourssomething else.

There are only two changes needed to include 2024 data. First, weneed to add 2024 to the years list near the top of thefile. This should be straightforward, follow the existing formatting.Second, we need to update the download_url_common field.The BEA likes to change the URLs for each data release. The updated URLis:

"https://apps.bea.gov/industry/release/zip/SUPPLY-USE.zip"

And that’s it. Save the file and reload the data using:

X = build_us_table("summary.yaml") # Change this to your file namesummary, M = calibrate(X)

Let’s check, just to verify that the 2024 data has been loaded:

table(summary, :year => 2024)

You can take this data, aggregate it, and run the same simulation asbefore. If you get the exact same results, you’ve probably forgotten theyear = 2024 keyword when creating the model.

Best of luck with your modeling! Reach out if you have questions.

Exercises

  1. Aggregate the summary data to match the reconstructed data aftersolving the model. Use this to compare benchmark and counterfactualdata.
  2. Create a model for each year in the data, set a shock, solve themodel, and reconstruct the data. You should have a single table with allyears. Compare benchmark and counterfactual data across all years.
  3. Create graphs to summarize the effects of your shock.