Water Balance (WRSI)

RFE

SPI

US SPI

CHARM

FACT

 
 

US SPI Product Background

Input Data

The US Unified Precipitation data used to create the US SPI was developed by the Climate Prediction Center (http://www.cpc.ncep.noaa.gov) using multiple sources of US raingauge data. Data sources for the recent data include: River Forecast Center (~6000 gauge stations per day), and the Climate Anomaly Data Base (~several hundred gauge stations per day.) Prior to March 4, 1998, only RFC data is available with about 3000 - 6000 gauge stations per day. The historical dataset was created using 3 sources: NCDC daily co-op stations (1948-...), CPC dataset (River Forecast Centers data +1st order stations - 1992-...), and daily accumulations from hourly precipitation dataset (1948-...).

The datasets were quality checked using: 1) a duplicate station check, 2) a data check (including a buddy check, and a standard deviation check against climatology), and 3) a NEXRAD Radar check for spurious zeros (for data since 1998.) The data were then gridded into 0.25x 0.25, 140W-60W, 20N-60N using a modified Cressman Scheme.

The current US precipitation data, from 1999 on, was downloaded in 32-bit IEEE floating point format directly from the CPC ftp site. The historical US Unified Precipitation data spanning 1948 to 1998 was provided in netCDF format by the NOAA-CIRES Climate Diagnostics Center, Boulder, Colorado, USA, at http://www.cdc.noaa.gov/. More detailed information about the dataset is available at:

http://www.cpc.ncep.noaa.gov/research_papers/ncep_cpc_atlas/7/index.php

Dataset Comparisons

           Since the historical and current datasets used were created using different precipitation sources, it was necessary to do spatial and temporal comparisons to ensure that there are no large discrepancies between the datasets that would affect the SPI. This was done using the three year overlap from 1996 to 1998 present in the datasets to calculate average differences and correlation by grid cell.

           The average precipitation for the entire US was calculated for each month spanning the period of January 1996 to December 1998 for both the current data and the historical data (fig. 1 [1] ). This shows that the difference of means for each month for the entire US is relatively small, with the means of the recent dataset generally being slightly higher than the means of the historical dataset. The average difference between the means of these datasets is 0.36 cm. The differences between the datasets are not uniform throughout the US however, as seen in figure 2. In this image, negative values depict areas where the historical precipitation data are higher than the recent data, while positive values mean the recent dataset values tend to be higher than the historical dataset values.

           The central US appears to be dominated by average monthly differences of -0.40 cm to 0.40 cm, especially in the northern states, with the north-central states being having a larger range of -0.40 cm to 0.20 cm, in particular: North Dakota, South Dakota, Nebraska, Montana, and Wyoming. The eastern US appears dominated by average differences ranging from 0 cm to 0.20 cm, and less so by values ranging from -0.20 cm to 0 cm. There is a small area in Massachusetts that appears to have a value of -0.60 cm to -0.80 cm. The western states don’t appear to be dominated by either particular dataset, except for the Oregon and Washington coasts, which have the largest ranges of mean differences out of the entire US. The average monthly difference between the recent dataset and historical dataset along the northwestern coast goes from –0.80 to 0.80 in Washington, and –0.40 to 0.80 in Oregon. Washington seems to have the most extreme average difference between datasets, with a majority of those values being highly positive along the coast, and decreasing as you move inland. This means that along the coast, the dataset of recent precipitation values is much higher, on average, than the historical dataset.

In terms of the SPI that is calculated using these datasets, areas where the recent precipitation dataset values are higher than the historical dataset values would tend to have an SPI value that is higher than it should be. The opposite is true for areas where the precipitation values in the historical dataset are higher than the recent dataset values, calculating an SPI value that is lower than it should be. This effect would be more pronounced in areas that receive very little rain, as even very small differences between the average recent dataset and historical dataset could be much larger (or smaller) than the precipitation (or lack of precipitation) received, in turn causing the SPI calculation to be off by a very large amount.

           The correlation map shows that, in general, the correlation between the recent precipitation dataset and the historical precipitation dataset ranges from 0.90 to 1. In some areas along the Rocky Mountains and in Nevada, the correlation drops down to 0.40, meaning the trends of the two datasets tend to vary more. Washington appears to have the worst correlation, with one particular area near its northern border dropping down to –0.30. Basically, the two datasets seem to follow the same trend throughout the US.

           Basically, the high correlation of the recent and historical datasets throughout the US paired with the small mean differences between the datasets means that both can be used together in calculating an accurate US SPI.

Methods

           It was necessary to convert the recent US precipitation data into monthly, little-endian data from the original daily, big-endian format downloaded from the CPC ftp site. This was simply a matter of doing a byte swap of the data, removing an extra 2(?) bits of data from the beginning and end of the dataset, and adding the dailies together to create a monthly precipitation grid. To convert the historical data into monthly precipitation grids, it was only necessary to convert the daily netCDF data into floating point data that could then be summed into monthly grids.

           The US SPI program is a modified version of the Africa SPI program created by Greg Husak.

(The following is adapted from Greg's basic introduction to SPI's to match the changes made to the Africa SPI program)

“Developing the shape and scale parameters is a fairly straightforward process. Applying the necessary equations to the data was a strictly a matter of writing computer code that would perform the necessary equations given the inputs. This section will break down the computer code and equations used to calculate the mean, the chance of a no rain event, and the shape and scale parameters, which can later be used as inputs in to the gamma distribution.
The precipitation grids used as the inputs in this research were a 161x321 array in netCDF format. Since these were monthly aggregates from 51 years (1948-1998 inclusive) the data was rather large and cumbersome. The initial step was to group the 612 (51years x 12 months) files into one large file for each month. This created 12 files that represented a 321x161x51 array of integers. These files were the principal input into the IDL program written to calculate the alpha and beta coefficients for each cell in the African grid.
At each grid cell the 51 values, one from each year, are read into a single 51-element array. Once this one-dimensional array has been extracted from the larger 321x161x51 array, it is dealt with independently of the larger file. This smaller element makes calculations less time consuming because the program only needs to go through the large file one time for each cell, rather than having to comb through it for every calculation.
Now that the small array is ready, the first calculation is a mean of only the positive elements in the 51-element array. This is a tedious process because first the program must go through and figure out the number of values in the array that are greater than zero. While it's doing this it also sums up all the positive values. Once all the elements in the array have been checked the sum of the positive values is divided by the number of positive values. For the remainder of this paper, when the term mean is used it refers to the mean of the positive values only, discarding values of zero. Once this mean is calculated, as a floating point number, it is put into a 321x161 floating point array at the appropriate (x,y) location.
The next step is to calculate the likelihood of a monthly total being zero. This is estimated by calculating the number of observations with a value of zero, divided by the total number of observations. The program analyzes each element in the array to find the number of array elements equal to zero. After all the elements have been analyzed the counter is then divided by the number of years, 51 in this case, to give the ratio of months with no rain, to the total number of months. This results in a floating point value between 0.0 and 1.0, and that value is put into a 321x161 floating point array at the appropriate (x,y) location.
Following the calculation of the mixture coefficient, the program begins the calculation of the shape and scale parameters. Oeztuerk (1981) begins the calculation of the shape and scale by calculating the A coefficient. This coefficient is described as the sum of the natural log of the individual positive values, divided by the number of positive values (n) subtracted from the natural log of the mean. The A is an intermediate step in the calculation of the shape parameter.
Once A is calculated the shape, or alpha, parameter can be estimated. If the value of A is less than or equal to zero then alpha is equal to zero. If A is greater than zero, alpha is estimated by muliplying A by four-thirds, adding one to this value and then getting the square root of this sum. The square root is then added to one and divided by four times A. This calculation is straightforward and can be seen in the following figure.
The calculation of the scale, or beta, parameter is the next step after calculating the shape parameter. The beta parameter is highly dependent on the alpha coefficient and the mean as it is equal to the mean divided by the alpha value. This can be slightly problematic for where the shape parameter is extremely small because it creates an artificial inflation of the scale parameter.
The result of this series of computations is four 321x161 floating point arrays, for all 12 months. The first array is the mean monthly rainfall, the second is the mixture coefficient, the third grid is the shape parameter and the final is the scale parameter. The arrays are then written to files to be accessed later.”

References

Oeztuerk, A, 1981. On the Study of a Probability Distribution for Precipitation Totals. Journal of Applied Meteorology. 20:1499-1505.


Fig. 1: Monthly means of historical and recent datasets for entire US



Fig. 3: Difference of average monthly values by grid cell (historical data minus current data)



Fig. 4: Correlation of recent and historical datasets for January 1996 to December 1998


2003 US SPI maps (Jan-Aug)



[1] Please note, on figures 2 and 3 there are straight lines of grid cells extending both south and west of Texas. These are due to an error in the United States mask, and do not affect the mean and correlations these maps depict.

 
HOME | ABOUT | RESEARCH | PUBLICATIONS | PRODUCTS | LINKS | CONTACT