US SPI Product Background
Input Data
The US Unified Precipitation data used
to create the US SPI was developed by the Climate Prediction
Center (http://www.cpc.ncep.noaa.gov) using multiple sources
of US raingauge data. Data sources for the recent data include:
River Forecast Center (~6000 gauge stations per day), and
the Climate Anomaly Data Base (~several hundred gauge stations
per day.) Prior to March 4, 1998, only RFC data is available
with about 3000 - 6000 gauge stations per day. The historical
dataset was created using 3 sources: NCDC daily co-op stations
(1948-...), CPC dataset (River Forecast Centers data +1st
order stations - 1992-...), and daily accumulations from
hourly precipitation dataset (1948-...).
The datasets were quality checked using:
1) a duplicate station check, 2) a data check (including
a buddy check, and a standard deviation check against climatology),
and 3) a NEXRAD Radar check for spurious zeros (for data
since 1998.) The data were then gridded into 0.25x 0.25,
140W-60W, 20N-60N using a modified Cressman Scheme.
The current US
precipitation data, from 1999 on, was downloaded in 32-bit
IEEE floating point format directly from the CPC ftp site.
The historical US Unified Precipitation data spanning 1948
to 1998 was provided in netCDF format by the NOAA-CIRES
Climate Diagnostics Center, Boulder, Colorado, USA,
at http://www.cdc.noaa.gov/.
More detailed information about the dataset is available
at:
http://www.cpc.ncep.noaa.gov/research_papers/ncep_cpc_atlas/7/index.html
Dataset Comparisons
Since the historical and current
datasets used were created using different precipitation
sources, it was necessary to do spatial and temporal comparisons
to ensure that there are no large discrepancies between
the datasets that would affect the SPI. This was done using
the three year overlap from 1996 to 1998 present in the
datasets to calculate average differences and correlation
by grid cell.
The average precipitation
for the entire US was calculated for each month spanning
the period of January 1996 to December 1998 for both the
current data and the historical data (fig. 1 [1] ). This shows that the difference of means for each month for
the entire US is relatively small, with the means of the
recent dataset generally being slightly higher than the
means of the historical dataset. The average difference
between the means of these datasets is 0.36 cm. The differences
between the datasets are not uniform throughout the US
however, as seen in figure 2. In this image, negative values
depict areas where the historical precipitation data are
higher than the recent data, while positive values mean
the recent dataset values tend to be higher than the historical
dataset values.
The central US appears to
be dominated by average monthly differences of -0.40 cm
to 0.40 cm, especially in the northern states, with the
north-central states being having a larger range of -0.40
cm to 0.20 cm, in particular: North Dakota, South Dakota,
Nebraska, Montana, and Wyoming. The eastern US appears dominated
by average differences ranging from 0 cm to 0.20 cm, and
less so by values ranging from -0.20 cm to 0 cm. There is
a small area in Massachusetts that appears to have a value
of -0.60 cm to -0.80 cm. The western states don’t appear
to be dominated by either particular dataset, except for
the Oregon and Washington coasts, which have the largest
ranges of mean differences out of the entire US. The average
monthly difference between the recent dataset and historical
dataset along the northwestern coast goes from –0.80 to
0.80 in Washington, and –0.40 to 0.80 in Oregon. Washington
seems to have the most extreme average difference between
datasets, with a majority of those values being highly positive
along the coast, and decreasing as you move inland. This
means that along the coast, the dataset of recent precipitation
values is much higher, on average, than the historical dataset.
In terms of the SPI that is
calculated using these datasets, areas where the recent
precipitation dataset values are higher than the historical
dataset values would tend to have an SPI value that is
higher than it should be. The opposite is true for areas
where the precipitation values in the historical dataset
are higher than the recent dataset values, calculating
an SPI value that is lower than it should be. This effect
would be more pronounced in areas that receive very little
rain, as even very small differences between the average
recent dataset and historical dataset could be much larger
(or smaller) than the precipitation (or lack of precipitation)
received, in turn causing the SPI calculation to be off
by a very large amount.
The correlation map shows
that, in general, the correlation between the recent precipitation
dataset and the historical precipitation dataset ranges
from 0.90 to 1. In some areas along the Rocky Mountains
and in Nevada, the correlation drops down to 0.40, meaning
the trends of the two datasets tend to vary more. Washington
appears to have the worst correlation, with one particular
area near its northern border dropping down to –0.30. Basically,
the two datasets seem to follow the same trend throughout
the US.
Basically, the high correlation
of the recent and historical datasets throughout the US
paired with the small mean differences between the datasets
means that both can be used together in calculating an accurate
US SPI.
Methods
It was necessary to convert
the recent US precipitation
data into monthly, little-endian data from the original
daily, big-endian format downloaded from the CPC ftp site.
This was simply a matter of doing a byte swap of the data,
removing an extra 2(?) bits of data from the beginning and
end of the dataset, and adding the dailies together to create
a monthly precipitation grid. To convert the historical
data into monthly precipitation grids, it was only necessary
to convert the daily netCDF data into floating point data
that could then be summed into monthly grids.
The US SPI program is a modified
version of the Africa SPI program created by Greg Husak.
(The following is adapted from Greg's
basic introduction to SPI's to match the changes made to
the Africa SPI program)
“Developing the shape and scale parameters
is a fairly straightforward process. Applying the necessary
equations to the data was a strictly a matter of writing
computer code that would perform the necessary equations
given the inputs. This section will break down the computer
code and equations used to calculate the mean, the chance
of a no rain event, and the shape and scale parameters,
which can later be used as inputs in to the gamma distribution.
The precipitation grids used as the inputs
in this research were a 161x321 array in netCDF format.
Since these were monthly aggregates from 51 years (1948-1998
inclusive) the data was rather large and cumbersome. The
initial step was to group the 612 (51years x 12 months)
files into one large file for each month. This created 12
files that represented a 321x161x51 array of integers. These
files were the principal input into the IDL program written
to calculate the alpha and beta coefficients for each cell
in the African grid.
At each grid cell the 51 values, one from
each year, are read into a single 51-element array. Once
this one-dimensional array has been extracted from the larger
321x161x51 array, it is dealt with independently of the
larger file. This smaller element makes calculations less
time consuming because the program only needs to go through
the large file one time for each cell, rather than having
to comb through it for every calculation.
Now that the small array is ready, the first
calculation is a mean of only the positive elements in the
51-element array. This is a tedious process because first
the program must go through and figure out the number of
values in the array that are greater than zero. While it's
doing this it also sums up all the positive values. Once
all the elements in the array have been checked the sum
of the positive values is divided by the number of positive
values. For the remainder of this paper, when the term mean
is used it refers to the mean of the positive values only,
discarding values of zero. Once this mean is calculated,
as a floating point number, it is put into a 321x161 floating
point array at the appropriate (x,y) location.
The next step is to calculate the likelihood
of a monthly total being zero. This is estimated by calculating
the number of observations with a value of zero, divided
by the total number of observations. The program analyzes
each element in the array to find the number of array elements
equal to zero. After all the elements have been analyzed
the counter is then divided by the number of years, 51 in
this case, to give the ratio of months with no rain, to
the total number of months. This results in a floating point
value between 0.0 and 1.0, and that value is put into a
321x161 floating point array at the appropriate (x,y)
location.
Following the calculation of the mixture
coefficient, the program begins the calculation of the shape
and scale parameters. Oeztuerk (1981) begins the calculation
of the shape and scale by calculating the A coefficient.
This coefficient is described as the sum of the natural
log of the individual positive values, divided by the number
of positive values (n) subtracted from the natural
log of the mean. The A is an intermediate step in
the calculation of the shape parameter.
Once A is calculated the shape, or
alpha, parameter can be estimated. If the value of A
is less than or equal to zero then alpha is equal to zero.
If A is greater than zero, alpha is estimated by
muliplying A by four-thirds, adding one to this value
and then getting the square root of this sum. The square
root is then added to one and divided by four times A.
This calculation is straightforward and can be seen in the
following figure.
The calculation of the scale, or beta, parameter
is the next step after calculating the shape parameter.
The beta parameter is highly dependent on the alpha coefficient
and the mean as it is equal to the mean divided by the alpha
value. This can be slightly problematic for where the shape
parameter is extremely small because it creates an artificial
inflation of the scale parameter.
The result of this series of computations
is four 321x161 floating point arrays, for all 12 months.
The first array is the mean monthly rainfall, the second
is the mixture coefficient, the third grid is the shape
parameter and the final is the scale parameter. The arrays
are then written to files to be accessed later.”