CRYSTAL-FACE Data Exchange

2002-07-08

Contents

1  Overview
    1.1  ASCII Data and Image Files in the ESPO Archive
    1.2  Quick-look Data Plots
    1.3  Large Data Files
2  Details
    2.1  ESPO Archive
        2.1.1  File Names
        2.1.2  Data File Formats
        2.1.3  Image File Formats
    2.2  Quick-look Plots
        2.2.1  Use Project Web Site
        2.2.2  Use Team Web Site
    2.3  Large Data Files
        2.3.1  Use Project Archive
        2.3.2  Use Alternative Archive

1  Overview

This document outlines the current plan for exchanging and archiving data collected during CRYSTAL-FACE. The key elements are included, but there are some details which have not yet been completely defined, so these pages will be updated as new information becomes available. If you have questions, concerns or problems with any aspect of the data exchange then please send a note to mailto:gaines@cloud1.arc.nasa.gov.

The primary objectives are to encourage a timely exchange of data during and after CRYSTAL-FACE by providing facilities and predefined mechanisms by which participants will have access to those data. The emphasis, both during and after the mission, is to exchange those data which other participants want to use. All data and image files should include well-defined variables so they can be readily compared with other datasets. They should also clearly indicate the source (instrument, model, etc.) of the data. The term data is used loosely to include both measured and derived quantities, so theory teams with products which will aid in the interpretation of measured quantities are also encouraged to participate in the data exchange. The following subsections outline the three types of data discussed in this document.

1.1  ASCII Data and Image Files in the ESPO Archive

The Earth Science Project Office (ESPO) archive will be the only archive provided by the project during CRYSTAL-FACE, and it is expected that most instrument teams will submit their preliminary data to this archive during the mission. It will serve as the main clearing-house for preliminary data during the mission, and as data are recalibrated and files updated, it will become the final archive for those files. For most in-situ instruments this will contain the complete datasets they distribute to the community. For most remote sensing instruments this will contain a reduced resolution subset of their complete datasets, and final versions of their high resolution datasets will be stored as binary files in a different archive (see 1.3). Please limit the size of your files stored in the ESPO archive to less than 20 MegaBytes (MB); that is not a hard limit, just a guideline. See 2.1 for more details.

1.2  Quick-look Data Plots

These are conspicuously displayed on web pages shortly after the data are collected. Some of these may also be submitted to one of the archives, but if not then they will only exist for the duration of the web site which hosts them. Instrument teams that plan to produce quick-look plots should also archive data files from which the plots can be regenerated. The archives are described in 1.1 and 1.3. The Project Office will, with conditions, provide resources for displaying quick-look plots; see 2.2.1 for details. Alternatively, your team may have a web site tailored to displaying its data plots, so if it prefers to utilize that then review the criteria listed in 2.2.2 and respond accordingly.

1.3  Large Data Files

Some data products will be too large (>20 MB) to exchange via the ESPO archive, and those will more efficiently be exchanged in a portable binary file format. The project is considering options for providing an archive for final versions of these large binary data files and associated image files, and those plans will be included in 2.3.1 once the details have been finalized. Your team may already use or maintain an existing archive for its large data products, and if that archive satisfies the conditions in 2.3.2 then it can be used as an alternative to the archive the project plans to provide for large files. Those teams which produce large data products are encouraged to also contribute reduced resolution data and/or image files, as in 1.1 and 1.2, for other investigators to use until the high resolution products are archived. Requests for these high resolution data products before they are archived will be handled at the discretion of the PI providing the data.

2  Details

2.1  ESPO Archive

The Earth Science Project Office (ESPO) maintains an online archive at Ames Research Center which can be accessed at http://espoarchive.nasa.gov/archive/, or via ftp and telnet logins to the several archive accounts on espoarchive.nasa.gov. The archive is primarily used for the exchange of ASCII data files from several NASA missions, but it can also accommodate a few types of standard image files. The usual procedure is to password protect access to data from each mission until those data are deemed public, and at that time the archive is opened to everyone via the web or anonymous ftp. You can use your web browser to draw simple plots of the archived data, check for newly archived files, and browse the archived files via its ftp capabilities.

The name of the CRYSTAL-FACE archive account is crystalf, and the login password for that account has been sent to all team leaders. Send a note to mailto:gaines@cloud1.arc.nasa.gov if you have not received the login password or if you are having trouble accessing that archive account. All investigators will share this account to submit/retrieve files to/from the archive. You can re-submit files as often as you like, with older versions being replaced by the new versions.

During the mission there will be a mirror of the CRYSTAL-FACE archive on the LAN in Key West, cf46.crystalface.nasa.gov (198.116.48.46). The archive procedures on both machines are the same. Files can be submitted/retrieved to/from either archive, but to reduce traffic on the WAN to the Key West LAN you are encouraged to use the archive on that LAN if your computer is on that LAN, and to use the archive at Ames if your computer is not on the Key West LAN.

If you will be submitting files to the archive then it is recommended that you login to the crystalf account and review the ~/docs/README.TXT file. It contains a tutorial on the basic archive procedures, describes the information you need to convey to the archivist before you submit files to the archive, and describes other useful files in the ~/docs/ directory.

2.1.1  File Names

All data and image files in the ESPO archives use a standardized naming convention. A complete description of that convention can be found at http://espoarchive.nasa.gov/archive/, and anyone planning to submit files to the archive should review that document. Here we will just reiterate some issues of general interest.

All file names include a date, and some may include a time, and those dates and times are always expressed in UTC. To facilitate inter-comparisons of data collected from a particular aircraft, the following conventions apply to data files pertaining to a particular aircraft flight:

  1. The date in the file name always refers to the UT launch date.
  2. All of a particular type of data for each flight is included in one file. Thus, there should be no times or volume counters in the file name.
  3. Multiple launches on the same UT date are differentiated using the launch counter, Ln, in the file name:
    1. DD20020702.CIT for first launch of the Citation on 2 July 2002 UT.
    2. DD20020702__L2.CIT for second launch on 2 July 2002 UT. There are two underscores separating the date and the launch counter, L2.
Other types of files, i.e. non-aircraft specific data files and all image files, can contain a time in the file name, and the date/time refers to the UT date/time at with the data begins. For example, suppose you are archiving a sequence of image files with each image depicting 20 minutes of data. Then your file names would be like:
  1. IM20020702_1712.JPG for data depiction beginning at 17:12 on 2 July 2002 UT. There is only one underscore separating the date and time.
  2. IM20020702_1732.JPG for data depiction beginning at 17:32 on 2 July 2002 UT.

To locate a particular file in the ESPO archive, you need to know its name and the directory in which it resides. The best way to find a file is from the data and image file catalogs, located in the ~/docs/ directory of the crystalf archive account. You can obtain those catalogs by directing your web browser to http://espoarchive.nasa.gov/archive/ and selecting the Download Files (FTP) link. Then proceed to the ~/docs/ directory and you will find:

  1. ~/docs/datatable.txt - ASCII data file catalog.
  2. ~/docs/imagetable.txt - image file catalog.

Each of those file catalogs contains (or will contain) four columns which list the: file name code; subdirectory; point of contact and content description for each type of data or image file in the archive. The data file subdirectories are relative to the ~/data/ directory and the image file subdirectories are relative to the ~/images/ directory.

The file name codes are constructed from the constant parts of the file names, so for the above examples, the corresponding file name codes are:

  1. DD20020702__L2.CIT has the code DD.CIT
  2. IM20020702_1712.JPG has the code IM.JPG

2.1.2  Data File Formats

There are nine different ASCII data file formats used in the ESPO archive. Each includes a file header, followed by the data records. The file headers have fields allocated for specific types of information, and those should be used to clearly define the originator of the file, the source of data and the units used for each variable. A complete description of the data file formats can be found at http://espoarchive.nasa.gov/archive/.

It is useful to keep in mind the following points when you design your data files:

  1. Use compact files, since that speeds up the time to transfer and read your files, and requires less disk space to store them. Each character in an ASCII file adds a byte to the file size, so eliminate extraneous characters.
    1. The precision of numeric data should be tailored to the accuracy of those data, i.e., if your data are accurate to three significant digits then there is seldom a benefit in recording your data with more than four significant digits.
    2. Eliminate extraneous whitespace padding between numeric data.
    3. Use the scale factors to record real numbers as scaled integers, and thus eliminate the recording of decimal points in each numeric value.
  2. Define your numeric Independent Variables in such a way that their values can be differentiated with single precision numbers. Independent Variables must be monotonic, so by ensuring that changes in consecutive values can be resolved by single precision numbers will ensure that software tools using single precision can read your files.
  3. Use only the printable ASCII characters in your data files, which have ASCII decimal values from 32 through 126. The TAB character is not allowed.
  4. The maximum number of characters in each line is 132.
  5. If you would like help in casting your data into the ASCII file formats, contact the archivist mailto:gaines@cloud1.arc.nasa.gov.

2.1.3  Image File Formats

Images can be archived in any of the following standard image file formats: GIF, JPEG, PNG, PDF. It is important to include enough annotations in the image so that by viewing the image one can determine its subject, and readily compare it with other datasets. The annotations should also indicate the source of the data depicted in the image.

2.2  Quick-look Plots

As mentioned in 1.2, you can use either the web site provided by the project or your own site to display your quick-look plots. The following sections define the conditions for using those sites.

2.2.1  Use Project Web Site

The project will provide a web site for displaying your quick-look plots, subject to the following conditions:

  1. Acceptable image file formats are GIF, JPEG, PNG and PDF.
  2. Only one plot per instrument per flight will be displayed. Multiple plots should be combined into a multi-page PDF file.
  3. A web-based form will be used to submit your plots for display. The URL of that form is http://www.espo.nasa.gov/crystalface/quicklook_form.html.

2.2.2  Use Team Web Site

Your team may use its own web site for displaying its quick-look plots, subject to the following conditions:

  1. Acceptable image file formats are GIF, JPEG, PNG and PDF.
  2. Your site will be readily accessible by CRYSTAL-FACE investigators for the foreseeable future.
  3. Your team uses the web form at http://www.espo.nasa.gov/crystalface/quicklook_form.html to notify the Project Office of the URL of your site, so links to that site can be included in the CRYSTAL-FACE web pages.

2.3  Large Data Files

2.3.1  Use Project Archive

Details will be provided once they have been defined.

2.3.2  Use Alternative Archive

As mentioned in 1.3, your team may already use or maintain an existing archive for its large data products, and you can use that to archive your large data products from CRYSTAL-FACE if the following conditions are satisfied:

  1. CRYSTAL-FACE investigators will have access to your files for the foreseeable future.
  2. Binary data files are written in a portable format so they can be read on computers running Unix, Macintosh and Windows operating systems.
  3. Your team provides format descriptions for your data files.
  4. Your team provides computer programs for reading its binary data files with, preferably, at least one version of the program(s) not requiring proprietary software.
  5. Your team sends a note to the curator of the CRYSTAL-FACE web site, mailto:hknisely@mail.arc.nasa.gov, including:
    1. The URL of the archive your team will use.
    2. A description of any special procedures investigators need to follow to obtain your files from the archive.