Summary: The Penn Wharton Budget Model COVID-19 Tracker provides daily, real-time estimates of economic activity and of the evolution of the COVID-19 pandemic at the state level. All estimates are based on publicly available data. The full set of underlying data is available for download.

Download full dataset

Interactive: Key Health and Economic Variables.

Map view:

Description of All Variables in Full Dataset

DOWNLOAD FULL DATASET

Real GDP Growth Tracker

The real GDP growth tracker measures overall economic activity on a given day, expressed in terms of the four-quarter percent change in real GDP. The average value of the tracker over one calendar quarter corresponds to the expected percent change in real GDP in that quarter compared with one year earlier.

The tracker is based on daily measures of consumer spending; traffic at commercial locations; labor market activity; small business employment, wages, and revenues; and concentrations of pollutants related to industrial and commercial activity. These inputs are combined by taking their first principal component, which is then scaled to four-quarter growth in real GDP.

Employment Tracker

The employment tracker measures the number of people working on a given day.

The tracker is based on daily measures of small business employment; employment of low-income workers; time spent at workplaces; and online activity related to job loss or new hiring. These inputs are combined by taking their first principal component, which is then scaled to monthly civilian employment and expressed in terms of change since the beginning of 2020.

Social Contact Rate Tracker

The social contact tracker measures the frequency of social interactions involving close physical proximity between people who do not live together.

The tracker is based on daily measures of physical proximity to other persons and of time spent at home. These inputs are combined by taking their first principal component.

COVID-19 Infections

Reported COVID-19 cases represent only a fraction of the total number of infections. Some people infected with COVID-19 never present with symptoms, and among those that display COVID-19 symptoms, many never receive a diagnostic test (often as a result of limited testing capacity, which varies over time).

We therefore adjust confirmed case counts based on a separate measure of infections that relies solely on confirmed COVID-19 deaths – the reporting of which is more consistent across time and geography than cases. Early in the pandemic, one percent of infections are assumed to result in death; this infection fatality ratio (IFR) is assumed to fall to 0.3 percent over time. Combined with an assumption of an average of 22 days from the onset of symptoms until death, the IFR assumption allows us to back out the “true” number of infections at any point and thus an implied COVID-19 reporting rate.

We build a regression model that predicts the reporting rate as a function of time and the share of COVID-19 tests coming back positive (testing data comes from the Covid Tracking Project). We use this model to fit estimated reporting rates for each state, which is then used to scale up reported cases to arrive at total infections.

COVID-19 Effective Reproduction Number (R)

The effective reproduction number R measures the average number of secondary infections caused by each primary infection. When R is greater than 1, the virus is spreading exponentially; when R is less than 1, its growth is slowing and is currently under control.

We estimate R using a statistical method developed by Cori et al (2013)5 that requires daily new infections and an assumption about the serial interval (the amount of time between successive cases). We then use the estimated time series of R values as an input to a compartmental epidemiological model. We calibrate our estimates of R by iteratively guessing the parameters of the serial interval distribution and finding the assumption that best matches the model’s estimate of COVID-19 infections.

COVID-19 Confirmed Cases and Deaths

We obtain counts of confirmed COVID-19 cases and deaths from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, the COVID Tracking Project, USAFacts, and the New York Times. These sources employ different data collection methods and assumptions and sometimes differ in terms of the number and timing of new cases or deaths.6 We isolate the common trend in new cases and deaths by taking the first principal component of estimates from all four sources. We remove residual noise with a local regression smoother over a small span.

Homebase

We obtain measures of small business employment, hours, and wages from Homebase, an employee scheduling and time-tracking software company. Homebase provides anonymized data at the establishment-worker-day level. We aggregate to the establishment-day level and construct four measures: number of employees, total hours worked, total wages, and an indicator for whether the establishment was open on a given day, defined as positive hours worked. We normalize all values relative to establishment averages over the first seven weeks of 2020. We then aggregate to the state level based on zip code, weighting establishments by their total employees, hours worked, or wages in early 2020.

Air Quality

We obtain measures of air pollutant concentrations from the Environmental Protection Agency’s Air Quality Index and Air Quality System programs. We use measures of four major pollutants: nitrogen dioxide, sulfur dioxide, carbon monoxide, and particulate matter. Concentrations of these pollutants reflect a range of economic activities, including industrial production, power generation, construction, and motor vehicle usage. To account for seasonality, we normalize pollution levels relative to the average for the same week in 2018 and 2019.

Google Trends

We construct proxies for job loss and new hires based on search intensity from Google Trends. For job loss, we obtain data on searches that contain any of the terms, “file for unemployment,” “unemployment benefits,” or “unemployment insurance.”7 For new hires, we obtain data on searches that contain any of the terms, “W-4,” “W-9,” or “I-9.” To account for seasonality, we normalize values relative to the average for the same week in 2019.

SafeGraph

We obtain several measures derived from mobile device location data from the SafeGraph Social Distancing Metrics. SafeGraph aggregates devices to the census block group level based on “common nighttime location” and provides various indicators of mobility and behavior. We use measures of distance travelled, time spent at home, number of devices at home for the full day, number of devices at a fixed location outside the home during regular workday hours (a proxy for work), and number of devices engaged in delivery activity.

PlaceIQ/DEX

We obtain a measure of mobile device “exposure” constructed by Couture et al (2020) based on mobile device location data from PlaceIQ. The device exposure index (DEX) reflects the average number of devices that visited locations also visited by residents of a county. It is an indirect measure of the extent to which individuals are congregating in common locations. Locations covered by the DEX are largely commercial venues such as restaurants and retail establishments.

Opportunity Insights

We obtain measures of debit and credit card spending, small business revenues, and low-income employment from the Opportunity Insights Economic Tracker. These measures are based on private sector data from several sources and are described in Chetty, Friedman, Hendren, Stepner, and the Opportunity Insights Team (2020). We obtain these measures as seven-day averages and estimate daily values based on the pseudoinverse of the moving average matrix.

Google Mobility

We obtain several measures derived from mobile device location data from Google Community Mobility Reports. Google tracks the number of visits and duration of stay at different types of locations. We use measures of time spent at grocery and pharmacy establishments, retail and recreation establishments, residential locations, and workplace locations.

Unacast

We obtain measures of distance travelled, visits to non-essential businesses, and encounter density (physical proximity to others) from the Unacast Social Distancing Scoreboard. These measures are derived from mobile device location data and reflect the behavior of all devices in a state. Measures of distance travelled and non-essential visits are expressed as percent differences from the average for same day of the week prior to March 8, 2020. Encounter density is the average number of times an individual is within 50 meters another person, normalized by a county’s physical size and relative to the pre-COVID national average.

All underlying data series are available for download here.

Anne Cori, Neil M. Ferguson, Christophe Fraser, Simon Cauchemez, A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics, American Journal of Epidemiology, Volume 178, Issue 9, 1 November 2013, Pages 1505–1512, https://doi.org/10.1093/aje/kwt133 ↩
We make one manual adjustment to the data. In late June, New Jersey redefined COVID-19 deaths in a way that resulted in a backlog of nearly 2,000 deaths being reported at once. We impute deaths for that day and distribute the excess reported deaths proportionally to previously reported deaths for all days prior. ↩
See Goldsmith-Pinkham and Sojourner (2020), who use search intensity to forecast weekly unemployment insurance claims. ↩

Sign up for our Newsletter

Sign up for our Newsletter