Section 6.5

6. The scrutiny and re-analysis of data by other scientists is a vital process if hypotheses are to rigorously tested and improved. It is alleged that there has been a failure to make important data available or the procedures used to adjust and analyse that data, thereby subverting a crucial scientific process.   It is alleged that there has been a systematic policy of denying access to data thathas been used in publications, referring to an email from Jones to Mann on 2nd February 2005 which contains the following:  “And don’t leave stuff lying around on ftp sites – you never know who is trawling them. The two MMs have been after the CRU station data for years. If they ever hear there is a Freedom of Information Act now in the UK, I think I’ll delete the file rather than send to anyone. Does your similar act in the US force you to respond to enquiries within 20 days?—our does! The UK works on precedents, so the first request will test it. We also have a data protection act, which I will hide behind”. 

QUESTIONS  TO ADDRESS

 1 Do you agree that releasing data for others to use and to test hypotheses is an important principle? 

 2. If so, do you agree that this principle has been abused?

 3. If so, should not data be released for use by those with the intention to undermine your case, or is there a distinction you would wish to make between legitimate and illegitimate use?

 4. If not, do others have reasonable access to the data at all levels and to the description of processing steps, in order to be able to carry out such a re- analysis?

 5. Can you describe clearly the data-sets and relevant meta-data that have been released; what has not been released and to what extent is it in useable form? 

6. Where has it been released?

7. Where access is limited, or not possible, or not meaningful, for legitimate reasons please explain why?

4 Responses to “Section 6.5”

  1. Jimchip Says:

    1119534778 Jun 23, 2005 (Jones to Moberg):

    “I thought Keith had put those two series on our web site, but I can’t find them either. However, I found them ages and put them with some of the other long tree-ring series. So here they are with others. The ones you want should be in columns 1 and 2. The file starts in 1628BC, so it takes a while to get to them. They start in AD 500. I vaguely recall chopping off the 402-499 and 441-499 years because of sample size. Keith has more trw series now, so they could be improved. Keith should have a reconstruction from the Grudd et al. (2002) paper in The Holocene, but they must be on his machine.”

  2. jimchip Says:

    1177158252.txt

    >> On the 1990 paper, I have put the locations and the data for
    >> the rural stations used in the paper on the CRU website. All
    >> the language is about me not being able to send them the
    >> station data used for the grids (as used in 1990!). I don’t
    >> have this information, as we have much more data now
    >> (much more in Australia and China than then) and probably
    >> more stations in western USSR are as well.
    >>
    >> As for the other request, I don’t have the information on
    >> the sources of all the sites used in the CRUTEM3 database.
    >> We are adding in new datasets regularly (all of NZ from
    >> Jim Renwick recently) , but we don’t keep a source code
    >> for each station. Almost all sites have multiple sources and
    >> only a few sites have single sources. I know things roughly
    >> by country and could reconstruct it, but it would take a while.

    From Phil within
    From: “Kevin Trenberth”
    To: mann@psu.edu
    Subject: Re: FYI
    Date: Sat, 21 Apr 2007 08:24:12 -0600 (MDT)
    Reply-to: trenbert@ucar.edu
    Cc: “Phil Jones”

    , “Ben Santer”

  3. Jimchip Says:

    1059664704.txt The “dirty laundry” email

    From: “Michael E. Mann”
    To: Tim Osborn
    Subject: Re: reconstruction errors
    Date: Thu, 31 Jul 2003 11:18:24 -0400

    Tim,
    Attached are the calibration residual series for experiments based on available networks
    back to:
    AD 1000
    AD 1400
    AD 1600
    I can’t find the one for the network back to 1820! But basically, you’ll see that the
    residuals are pretty red for the first 2 cases, and then not significantly red for the 3rd
    case–its even a bit better for the AD 1700 and 1820 cases, but I can’t seem to dig them
    up. In any case, the incremental changes are modest after 1600–its pretty clear that key
    predictors drop out before AD 1600, hence the redness of the residuals, and the notably
    larger uncertainties farther back…
    You only want to look at the first column (year) and second column (residual) of the files.
    I can’t even remember what the other columns are!
    Let me know if that helps. Thanks,
    mike
    p.s. I know I probably don’t need to mention this, but just to insure absolutely clarify on
    this, I’m providing these for your own personal use, since you’re a trusted colleague. So
    please don’t pass this along to others without checking w/ me first. This is the sort of
    “dirty laundry” one doesn’t want to fall into the hands of those who might potentially try
    to distort things…
    At 02:58 PM 7/31/2003 +0100, you wrote:

    Thanks for the explanation, Mike. Now I see it, it looks familiar – so perhaps you’ve
    explained it to me previously (if you have, then sorry for asking twice!).
    I now understand how you compute them in theory. I have two further questions though
    (sorry):
    (1) how do you compute them in practise? Do you actually integrate the spectrum of the
    residuals?
    (2) how would I estimate an uncertainty for a particular band of time scales (e.g.
    decadal to secular, f=0.0 to 0.1)? If integrating the spectrum of the residuals, I
    wonder whether integrating from f=0 to f=0.02 and then f=0.02 to (e.g.) f=0.1 (note this
    last limit has changed) would give me the right error for time scales of 10 years and
    longer (i.e. for a 10-yr low pass filter)? The way I had planned to do this was to
    assume the residuals could be modelled as a first order autoregressive process, with
    lag-1 autocorrelation r1=0.0 after 1600 (essentially white) and r1=??? before 1600. Do
    you know what the lag-1 autocorrelation of the residuals is for the network that goes
    back to 1000 AD?
    The stuff back 2000 years will be interesting, though the GCM runs we’re starting to
    look at go back only 500 (Hadley Centre) or 1000 (German groups), so MBH99 seems fine
    for now.
    Cheers
    Tim
    At 14:28 31/07/2003, you wrote:

    Tim,
    The one-sigma *total* uncertainty is determined from adding the low f and high f
    components of uncertainty in quadrature. The low f and high f uncertainties aren’t
    uncertainties for a particular (e.g. 30 year or 40-year) running mean,they are band
    integrated estimates of uncertainties (high-frequency band from f=0 to f=0.02,
    low-frequency band from f=0.02 to f=0.5 cycle/year) taking into account the spectrum of
    the residual variance (the broadband or “white noise” mean of which is the nominal
    variance of the calibration residuals)
    Alternatively, one could calculate uncertainties for a particular timescale average
    using the standard deviation of the calibration residuals, and applying a square-root-N’
    argument (where N’ is the effective degrees of freedom in the calibration residuals). I
    believed I did this at one point, and got similar results.
    Let me know if this needs further clarification. Thanks,
    mike
    p.s. you might want to try to using Mann and Jones N. Hem if you’re going back further
    than AD 1000? Crowley has some EBM results now back to 0 AD, and is in the process of
    comparing w/ that. SHould be interesting…
    At 02:04 PM 7/31/2003 +0100, you wrote:

    Hi Mike,
    we’ve recently been making plans with Simon Tett at the Hadley Centre for comparing
    model simulations with various climate reconstructions, including the MBH98 and MBH99
    Northern Hemisphere temperatures. I was stressing the importance of including
    uncertainty estimates in the comparison and that the error estimates should depend on
    the timescale (e.g. smoothing filter or running mean) that had been applied.
    I then looked at the file that I have been using for the uncertainties associated with
    MBH99 (see attachment), which I must have got from you some time ago. Column 1 is year,
    2 is the “raw” standard error, 3 is 2*SE.
    But what are columns 4 and 5? I’ve been plotting column 4, labelled “1 sig (lowf)” when
    plotted your smoothed reconstruction, assuming that this is the error appropriate to
    low-pass filtered data. I’d also assumed that the last column “1 sig (highf)” was
    appropriate to high-pass filtered data. I also noticed that the sum of the squared high
    and low errors equalled the square of the raw error, which is nice.
    But I’ve realised that I don’t understand how you estimate these errors, nor what time
    scale the lowf and highf cutoff uses (maybe 40-year smoothed as in the IPCC plots?).
    From MBH99 it sounds like post-1600 you assume uncorrelated gaussian calibration
    residuals. In which case you would expect the errors for a 40-year mean to be reduced
    by sqrt(40). This doesn’t seem to match the values in the attached file. Pre-1600 you
    take into account that the residuals are autocorrelated (red noise rather than white),
    so presumably the reduction is less than sqrt(40), but some factor (how do you compute
    this?).
    The reason for my questions is that I would like to (1) check whether I’ve been doing
    the right thing in using column 4 of the attached file with your smoothed
    reconstruction, and (2) I’d like to estimate the errors for a range of time scales, so I
    can compare decadal means, 30-year means, 50-year means etc.
    Thanks in advance for any help you can give me here.
    Tim

  4. Jimchip Says:

    1059674663.txt Osborn’s original question. Even he doesn’t understand what Mann is doing. “From MBH99 it sounds like…”

    From: Tim Osborn
    To: “Michael E. Mann”
    Subject: reconstruction errors
    Date: Thu Jul 31 14:04:23 2003

    Hi Mike,

    we’ve recently been making plans with Simon Tett at the Hadley Centre for comparing model simulations with various climate reconstructions, including the MBH98 and MBH99 Northern Hemisphere temperatures. I was stressing the importance of including uncertainty estimates in the comparison and that the error estimates should depend on the timescale (e.g. smoothing filter or running mean) that had been applied.

    I then looked at the file that I have been using for the uncertainties associated with MBH99 (see attachment), which I must have got from you some time ago. Column 1 is year, 2 is the “raw” standard error, 3 is 2*SE.

    But what are columns 4 and 5? I’ve been plotting column 4, labelled “1 sig (lowf)” when plotted your smoothed reconstruction, assuming that this is the error appropriate to low-pass filtered data. I’d also assumed that the last column “1 sig (highf)” was appropriate to high-pass filtered data. I also noticed that the sum of the squared high and low errors equalled the square of the raw error, which is nice.

    But I’ve realised that I don’t understand how you estimate these errors, nor what time scale the lowf and highf cutoff uses (maybe 40-year smoothed as in the IPCC plots?). From MBH99 it sounds like post-1600 you assume uncorrelated gaussian calibration residuals. In which case you would expect the errors for a 40-year mean to be reduced by sqrt(40). This doesn’t seem to match the values in the attached file. Pre-1600 you take into account that the residuals are autocorrelated (red noise rather than white), so presumably the reduction is less than sqrt(40), but some factor (how do you compute this?).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: