## Section 3 responses

3. It is alleged that proxy temperature deductions and instrumental temperature data have been improperly combined to conceal mismatch between the two data series An attempt to hide the difficulty of combining these two data series and to mislead is alleged to be revealed in the following sentence in a November 1999 email from Professor Phillip Jones which is alleged to imply a conscious attempt to mislead: ”I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (i.e. from 1981 onwards) and from 1961 for Keith’s to hide the decline”.

**See specific questions but more general issues can be commented on here**

February 21, 2010 at 4:30 pm |

Background paper on data analysis

http://www.gps.caltech.edu/~tapio/papers/imputation.pdf

Analysis of Incomplete Climate Data: Estimation of Mean Values and

Covariance Matrices and Imputation of Missing Values

TAPIO SCHNEIDER

Atmospheric and Oceanic Sciences Program, Princeton University, Princeton, New Jersey

(Manuscript received 3 December 1999, in final form 27 March 2000)

ABSTRACT

Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with

imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization

(EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance

matrices from incomplete datasets and for the imputation of missing values, is taken as the point of departure

for the development of a regularized EM algorithm. In contrast to the conventional EM algorithm, the regularized

EM algorithm is applicable to sets of climate data, in which the number of variables typically exceeds the sample

size. The regularized EM algorithm is based on iterated analyses of linear regressions of variables with missing

values on variables with available values, with regression coefficients estimated by ridge regression, a regularized

regression method in which a continuous regularization parameter controls the filtering of the noise in the data.

The regularization parameter is determined by generalized cross-validation, such as to minimize, approximately,

the expected mean-squared error of the imputed values. The regularized EM algorithm can estimate, and exploit

for the imputation of missing values, both synchronic and diachronic covariance matrices, which may contain

information on spatial covariability, stationary temporal covariability, or cyclostationary temporal covariability.

A test of the regularized EM algorithm with simulated surface temperature data demonstrates that the algorithm

is applicable to typical sets of climate data and that it leads to more accurate estimates of the missing values

than a conventional noniterative imputation technique.