Search D-Lib:
D-Lib-blocks5
The Magazine of Digital Library Research

D-Lib Magazine

November/December 2016
Volume 22, Number 11/12
Table of Contents

 

Appendix I

Assessing Stewardship Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset: Use Case Study and Lessons Learned

Ge Peng1, Jay Lawrimore2, Valerie Toner3, Christina Lief2, Richard Baldwin2, Nancy Ritchey2, Danny Brinegar2 and Stephen A. Del Greco2

1Cooperative Institute for Climate and Satellites-North Carolina, North Carolina State University and NOAA's National Centers for Environmental Information
2NOAA's National Centers for Environmental Information
3STG, Inc. and NOAA's National Centers for Environmental Information

 

Corresponding Author: Ge Peng ([email protected])

 

DSMM Document for the Global Historical Climatology Network-Monthly (GHCN-M) Version 3 Dataset

This Appendix to "Assessing Stewardship Maturity of the Global Historical Climatology Network-Monthly (GHCN-M) Dataset: Use Case Study and Lessons Learned" includes tables showing the stewardship maturity ratings and detailed justifications for the GHCN-M version 3 dataset.

< Return to article

 

Table A: Dataset and Data Stewardship Maturity Assessment Metadata

Dataset Title Global Historical Climatology Network-Monthly, Version 3
Dataset Information URL http://www.ncdc.noaa.gov/ghcnm/v3.php; http://dx.doi.org/10.7289/V5X34VDR
Data Provider POC (Name; E-mail; Affiliation) Jay Lawrimore; Jay.Lawrimore@noaa.gov; NCEI/CWC/CSD/DSB
Dataset POC (Name; E-mail; Affiliation) Jay Lawrimore; Jay.Lawrimore@noaa.gov; NCEI/CWC/CSD/DSB
SMM Version (Document ID and Version Number) NCDC-CICS-SMM_0001_Rev.1 12/09/2014 (Peng et al., 2015)
SMM POC (Name; E-mail; Affiliation) Ge Peng; Ge.Peng@noaa.gov; Cooperative Institute for Climate and Satellites, North Carolina (CICS-NC), North Carolina State University (NCSU) & NOAA's National Centers for Environmental Information (NCEI)1
SMM Template Version (Document ID and Version Numbers) NCDC-CICS-SMM_0001_Rev.1 v4.0 06/23/2015 (Peng, 2015)
SMM Template POC (Name; E-mail; Affiliation) Ge Peng; Ge.Peng@noaa.gov; Cooperative Institute for Climate and Satellites, North Carolina (CICS-NC), North Carolina State University (NCSU) & NOAA's National Centers for Environmental Information (NCEI)
SMM Assessment Version (v<nn>r<mm>, e.g., v01r00) v01r03
SMM Assessment Date (MM/DD/YYYY) 09/04/2015
SMM Assessment POC (Name; E-mail; Affiliation) Jay Lawrimore; Jay.Lawrimore@noaa.gov; NCEI/CWC/CSD/DSB; Valerie Toner; valerie.toner@noaa.gov; NCEI/DSD/AB; Christina Lief; Christina.Lief@noaa.gov; NCEI/DSD/AB; Ge Peng; Ge.Peng@noaa.gov; NCEI/CWC/CSD/PRB & CICS-NC; Rich Baldwin; Rich.Baldwin@noaa.gov; NCEI/DSD/DAB
Stewardship Maturity Ratings
(kc1/kc2/kc3/kc4/kc5/kc6/kc7/kc8/kc9)
4.0/2.0/2.5/4.5/3.5/2.5/3.0/2.5/3.5
SMM Original Assessment Date (MM/DD/YYYY) 06/08/2015
SMM Original Assessment POC (Name; E-mail; Affiliation) Valerie Toner; valerie.toner@noaa.gov; Archive Specialist, Contractor with Team ERT/STG, an affiliate of NOAA's National Centers for Environmental Information (NCEI)
SMM Last Modified Date (MM/DD/YYYY) 09/04/2015
SMM Last Modification POC (Name; E-mail; Affiliation) Rich Baldwin; Rich.Baldwin@noaa.gov; NCEI/DSD/DAB
SMM Modified Date (MM/DD/YYYY) 07/22/2015
SMM Modification POC (Name; E-mail; Affiliation) Ge Peng; Ge.Peng@noaa.gov; NCEI/CWC/CSD/PRB & CICS-NC; Jay Lawrimore; Jay.Lawrimore@noaa.gov; NCEI/CWC/CSD/DSB; Christina Lief; Christina.Lief@noaa.gov; NCEI/DSD/AB; Valerie Toner; valerie.toner@noaa.gov; NCEI/DSD/AB
SMM Modified Date (MM/DD/YYYY) 07/06/2015
SMM Modification POC (Name; E-mail; Affiliation) Jay Lawrimore; Jay.Lawrimore@noaa.gov; NCEI/CWC/CSD/DSB
SMM Modified Date (MM/DD/YYYY) 06/25/2015
SMM Modification POC (Name; E-mail; Affiliation) Christina Lief; Christina.Lief@noaa.gov; NCEI/DSD/AB
1NCEI includes the organizations previously referred to as National Climatic Data Center (NCDC), National Geophysical Data Center (NGDC), and National Oceanographic Data Center (NODC).

Table A References:

Peng, G., 2015: NCDC-CICSNC Scientific Data Stewardship Maturity Matrix Template. Version: v4.0 06/23/2015. Fighare. https://doi.org/10.6084/m9.figshare.1211954

Peng, G., J.L. Privette, E.J. Kearns, N.A. Ritchey, and S. Ansari, 2015: A unified framework for measuring stewardship practices applied to digital environmental datasets. Data Science Journal, 13. https://doi.org/10.2481/dsj.14-049

 

Table B: Detailed Justifications for the GHCN-M Version 3 Dataset

Maturity Scale (across) Level 1
Ad Hoc
Not Managed
Level 2
Minimal
Managed Limited
Level 3
Intermediate
Managed
Defined
Partially Implemented
Level 4
Advanced
Managed
Well-Defined
Fully Implemented
Level 5
Optimal
Level 4 + Measured
Controlled, Audit
 
Key Component
(below)
Stewardship Maturity Rating and
Justification or Evidence
Comments
Preservability

(The state of being preservable)
Any storage location

Data only
Non-designated repository

Redundancy

Limited archiving metadata
Designated archive

Redundancy

Community-standard archiving metadata

Conforming to limited archiving standards
Level 3 +
Conforming to community archiving standards
Level 4 +

Archiving process performance controlled, measured, and audited

Future archiving standard changes planned
Level 4
  • Archived at NCEI-NC (Designated NOAA data center that is compliant to NARA archive standards.)
  • Conforming to the NCEI-NC archiving process and guideline that are following OAIS RM (CCSDS, 2012)
  • Compliant to NCEI-NC defined archive procedure and requirement set forth by Submission Agreement (SA)
  • Offsite backup copy available
  • Collection level Metadata conforming to ISO 19115 metadata standards
 
Accessibility

(The state of being searchable and accessible publicly)
Not publicly available

Person-to-person
Publicly available

Direct file download (e.g., via anonymous FTP server)

Collection/dataset level searchable online
Level 2 +

Non-standard data service

Limited data server performance

Granule/file level searchable

Limited search metrics
Level 3 +

Community-standard data service

Enhanced data server performance

Conforming to community search metrics

Dissemination report metrics defined and implemented internally
Level 4 +

Dissemination reports available online

Future technology and standard changes planned
Level 2
  • Direct file download via ftp server
  • Data and metadata are ASCII files, space delimited, in gzip’d tar files. (1 file containing all station data and 1 file containing all station metadata)
  • Collection-level searchable (Google, NCEI, NOAA Catalog, Geoportal) but not searchable at file-level
  • Dissemination report available internally
Next version will be at level 3 (metadata will be in the Historical Observing Metadata Repository (HOMR) and data will be provided via Climate Data Online (CDO) portal)
Usability

(The state of being easy to use)
Extensive product-specific knowledge required

No documentation online
Non-standard data format

Limited documentation (e.g., user's guide) online
Community standard-based interoperable format & metadata

Documentation (e.g., source code, product algorithm document, processing or/and data flow diagram) online
Level 3 +

Basic capability (e.g., subsetting, aggregating) & data characterization (overall/global, e.g., climatology, error estimates) available online
Level 4 +

Enhanced online capability (e.g., visualization, multiple data formats)

Community metrics of data characterization (regional/cell) online

External ranking
Level 2.5
  • README file online
  • Product algorithm reference list is online
  • Some source code online
  • All source code/data flow/process flow diagram not online
  • Data and file-level metadata are in ASCII format which is in situ community supported format
  • The current ASCII files are not self-describing
Next version will be at level 3 (with all source code, data flow and process flow diagrams) with self-describing data format.
Production Sustainability

(The state of data production being sustainable and extendable)
Ad Hoc or Not applicable

No obligation or deliverable requirement
Short-term

Individual PI's commitment (grant obligations)
Medium-term

Institutional commitment (contractual deliverables with specs and schedule defined)
Long-term

Institutional commitment

Product improvement process in place
Level 4 +

National or international commitment

Changes for technology planned
Level 4.5
  • Long-term institutional and international commitment
  • Data are being updated regularly
  • Product improvement in place
  • Product under version control
 
Data Quality Assurance

(The state of data quality being assured)
Data quality assurance (DQA) procedure unknown or none Ad Hoc and random

DQA procedure not defined and documented
DQA procedure defined and documented and partially implemented DQA procedure well documented, fully implemented and available online with master reference data

Limited data quality assurance metadata
Level 4 +

DQA procedure monitored and reported

Conforming to community quality metadata & standards

External review
Level 3.5
  • DQA procedure defined in JGR-Atmospheres journal article (Lawrimore et al., 2011) and also described online under "Data Assurance" tab
  • Quality assurance procedures fully implemented
  • Community metrics are produced and made available online
  • No data quality assurance metadata
 
Data Quality Control/Monitoring

(The state of data quality being controlled and monitored)
None or Sampling unknown or spotty

Analysis unknown or random in time
Sampling and analysis are regular in time and space

Limited product-specific metrics defined & implemented
Level 2+

Sampling and analysis are frequent and systematic but not automatic

Community metrics defined and partially implemented

Procedure documented and available online
Level 3+

Anomaly detection procedure well-documented and fully implemented using community metrics, automatic, tracked and reported

Limited quality monitoring metadata
Level 4 +

Cross-validation of temporal & spatial characteristics

Physical consistency check

Conforming to community quality metadata & standards

Dynamic providers/users feedback in place
Level 2.5
  • Quality flagged and statistics metrics are online
  • Regular monthly manual reviews of automatically generated plots or statistics are conducted.
  • Quality monitoring metrics are consistent with in situ community
  • Procedure is not documented and available online
  • No data quality control/monitoring metadata
For next version, documentation on the procedure(s) will be online — level 3.
Data Quality Assessment

(The state of data quality being assessed)
Algorithm/method/model theoretical basis assessed (methods and results online) Level 1 +

Research product assessed (methods and results online)
Level 2+

Operational product assessed (methods and results online)
Level 3+

Quality metadata assessed

Limited quality assessment metadata
Level 4 +

Assessment performed on a recurring basis

Conforming to community quality metadata & standards

External ranking
Level 3
  • Information on the product algorithm and data quality assessment procedures is available in the JGR-Atmospheres article (Lawrimore et al., 2011) and online here.
  • Assessment of the operational product, i.e., GHCN-Monthly version 3.x, was done comparing with other datasets and included in the latest IPCC report (IPCC, 2013)
  • No data quality assessment metadata
 
Transparency /Traceability

(The state of being transparent, trackable, and traceable)
Limited product information available

Person-to-person
Product information available in literature Algorithm Theoretical Basis Document (ATBD) & source code online

Dataset configuration managed (CM)

Unique Object Identifier (OID) assigned (dataset, documentation, source code)

Data citation tracked (e.g., utilizing Digital Object Identifier (DOI) system)
Level 3+

Operational Algorithm Description (OAD) online, OID assigned, and under CM
Level 4 +

System information online

Complete data provenance online
Level 2.5
  • Product information in the literature (Lawrimore et al., 2011)
  • Dataset ID is assigned (NCDC DSI 9100_03) and under CM
  • Dataset DOI is assigned and tracked
  • Detailed summary of each software modification and the resulting impacts to global temperatures is available here (Williams et al., 2012; Gleason et al., 2015)
  • The Pairwise Homogeneity Adjustment algorithm software is available online here
Descriptive Product Information Document will be generated and available for next version — Level 3.
Data Integrity

(The state of data integrity being verifiable)
Unknown or no data ingest integrity check Data ingest integrity verifiable (e.g., checksum technology) Level 2 +

Data archive integrity verifiable
Level 3+

Data access integrity verifiable

Conforming to community data integrity technology standard
Level 4 +

Data authenticity verifiable (e.g., data signature technology)

Performance of data integrity check monitored and reported
Level 3.5
  • Data integrity is checked at ingest, archive, and dissemination using the check-sum technology
  • No check-sum available online for users to verify data files at access
Recommend including checksum/MANIFEST on ftp when staging the data files.

Table B References:

CCSDS (The Consultative Committee for Space Data Systems), 2012: Reference model for an open archival information system (OAIS) — Recommendation for Space Data System Practices. Version CCSDS 650.0-M-2 June 2012. 135 pp.

Gleason, B., C. Williams, M. Menne, and J. Lawrimore, 2015: Modifications to GHCN-Monthly (version 3.3.0) and USHCN (version 2.5.5) processing systems. NCEI Technical Report No. GHCNM-15-01. 23 pp.

IPCC, 2013: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, T. F. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P.M. Midgley, Eds. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 1535 pp.

Lawrimore, J. H., M. J. Menne, B. E. Gleason, C. N. Williams, D. B. Wuertz, R. S. Vose, and J. Rennie, 2011: An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3, J. Geophys. Res., 116, D19121. https://doi.org/10.1029/2011JD016187

Williams, C., M. Menne, and J. Lawrimore, 2012: Modifications to Pairwise Homogeneity Adjustment software to address coding errors and improve run-time efficiency. NCEI Technical Report No. GHCNM-12-02. 31 pp.

< Return to article