Time-Series Files

SeisBase.read_data!Function
read_data!(S, fmt, filestr [, keywords])

Read data in file format fmt matching file pattern filestr into SeisData object S.

read_data!(S, filestr [, keywords])

Read from files matching file pattern filestr into SeisData object S. Calls guess(filestr) to identify the file type based on the first file matching pattern filestr. Much slower than manually specifying file type.

  • Formats: ah1, ah2, bottle, geocsv, geocsv.slist, lennartz, mseed, passcal, suds, sac, segy, SeisBase, slist, uw, win32
  • Keywords: cf, full, jst, memmap, nxadd, nxnew, strict, swap, v, vl

This function is fully described in the official documentation at https://SeisBase.readthedocs.io/ (TODO: Change site) in section Time-Series Files.

See also: SeisBase.KW, get_data, guess, rseis

source
SeisBase.read_dataFunction
S = read_data(fmt, filestr [, keywords])

Read data in file format fmt matching file pattern filestr into SeisData object S.

S = read_data(filestr [, keywords])

Read from files matching file pattern filestr into SeisData object S. Calls guess(filestr) to identify the file type based on the first file matching pattern filestr. Much slower than manually specifying file type.

  • Formats: ah1, ah2, bottle, geocsv, geocsv.slist, lennartz, mseed, passcal, suds, sac, segy, SeisBase, slist, uw, win32
  • Keywords: cf, full, jst, memmap, nxadd, nxnew, strict, swap, v, vl

This function is fully described in the official documentation at https://SeisBase.readthedocs.io/ (TODO: Change site) in section Time-Series Files.

See also: SeisBase.KW, get_data, guess, rseis

source

Supported File Formats

File FormatStringStrict Match
AH-1ah1id, fs, gain, loc, resp, units
AH-2ah2id, fs, gain, loc, resp
Bottle (UNAVCO)bottleid, fs, gain
GeoCSV, time-sample pairgeocsvid
GeoCSV, sample listgeocsv.slistid
Lennartz ASCIIlenartzid, fs
Mini-SEEDmseedid, fs
PASSCAL SEG Ypasscalid, fs, gain, loc
SACsacid, fs, gain
SEG Y (rev 0 or rev 1)segyid, fs, gain, loc
SeisBaseSeisBaseid, fs, gain, loc, resp, units
SLIST (ASCII sample list)slistid, fs
SUDSsudsid
UW data fileuwid, fs, gain, units
Win32win32id, fs, gain, loc, resp, units

Strings are case-sensitive to prevent any performance impact from using matches and/or lowercase().

Note that read_data with file format "SeisBase" largely exists as a convenience wrapper; it reads only the first SeisBase object from each file that can be converted to a SeisData structure. For more complicated read operations, $rseis$ should be used.

Warning: GeoCSV files must be Unix text files; DOS text files, whose lines end in "\r\n", will not read properly. Convert with dos2unix or equivalent Windows Powershell commands.

Supported Keywords

KeywordUsed ByTypeDefaultMeaning
cfwin32String\"\"win32 channel info filestr
full[1]_Boolfalseread full header into :misc?
llsegyUInt80x00set loc in :id? (see below)
memmap*Boolfalseuse Mmap.mmap to buffer file?
nx_add[2]_Int64360000minimum size increase of x
nx_new[3]_Int6486400000length(x) for new channels
jstwin32Booltrueare sample times JST (UTC+9)?
swap[4]_Booltruebyte swap?
strict*Booltrueuse strict match?
v*Integer0verbosity
vl*Bool0verbose source logging? [5]_
  • [1]: used by ah1, ah2, sac, segy, suds, uw; information read into $:misc$ varies by file format.
  • [2]: see table below.
  • [3]: used by bottle, mseed, suds, win32
  • [4]: used by bottle, mseed, suds, win32
  • [5]: used by mseed, passcal, segy; swap is automatic for sac.

Performance Tips

  1. mmap=true improves read speed for some formats, particularly ASCII readers, but requires caution. In our benchmarks, the following significant (>3%) speed changes are observed:
  • Significant speedup: ASCII formats, including metadata formats
  • Slight speedup: mini-SEED
  • Significant slowdown: SAC
  1. With mseed or win32 data, adjust nx_new and nx_add based on the sizes of

the data vectors that you expect to read. If the largest has Nmax samples, and the smallest has Nmin, we recommend nx_new=Nmin and nx_add=Nmax-Nmin.

Default values can be changed in SeisBase keywords, e.g.,

SeisBase.KW.nx_new = 60000
SeisBase.KW.nx_add = 360000

The system-wide defaults are nx_new=86400000 and nx_add=360000. Using these values with very small jobs will greatly decrease performance.

  1. strict=true may slow read_data based on the fields matched as part of

the file format. In general, any file format that can match on more than id and fs will read slightly slower with this option.

Channel Matching

By default, read_data continues a channel if data read from file matches the channel id (field :id). In some cases this is not enough to guarantee a good match. With $strict=true$, read_data matches against fields :id, :fs, :gain, :loc, :resp, and :units. However, not all of these fields are stored natively in all file formats. Column "Strict Match" in the first table lists which fields are stored (and can be logically matched) in each format with strict=true.

Examples

  1. S = read_data("uw", "99011116541W", full=true)
    • Read UW-format data file $99011116541W$
    • Store full header information in $:misc$
  2. read_data!(S, "sac", "MSH80*.SAC")
    • Read SAC-format files matching string pattern MSH80*.SAC
    • Read into existing SeisData object $S$
  3. S = read_data("win32", "20140927*.cnt", cf="20140927*ch", nx_new=360000)
    • Read win32-format data files with names matching pattern 2014092709*.cnt
    • Use ASCII channel information filenames that match pattern 20140927*ch
    • Assign new channels an initial size of nx_new samples

Memory Mapping

memmap=true is considered unsafe because Julia language handling of SIGBUS/SIGSEGV and associated risks is undocumented as of SeisBase v1.0.0. Thus, for example, we don't know what a connection failure during memory-mapped file I/O does. In some languages, this situation without additional signal handling was notorious for corrupting files.

Under no circumstances should mmap=true be used to read files directly from a drive whose host device power management is independent of the destination computer's. This includes all work flows that involve reading files directly into memory from a connected data logger. It is not a sufficient workaround to set a data logger to "always on".

Format Descriptions and Notes

Additional format information can be accessed from the command line by typing SeisBase.formats("FMT") where FMT is the format name; keys(SeisBase.formats) for a list.

  • AH (Ad-Hoc) was developed as a machine-independent seismic data format based on External Data Representation (XDR).
  • Bottle is a single-channel format maintained by UNAVCO (USA).
  • GeoCSV: an extension of "human-readable", tabular file format Comma-Separated Values (CSV).
  • Lennartz: a variant of sample list (SLIST) used by Lennartz portable digitizers.
  • PASSCAL: A single- channel variant of SEG Y with no file header, developed by PASSCAL/New Mexico Tech and used with PASSCAL field equipment.
  • SAC: the Seismic Analysis Code data format, originally developed by LLNL for the eponymous command-line interpreter.
  • SEED: adopted by the International Federation of Digital Seismograph Networks (FDSN) as an omnibus seismic data standard. mini-SEED is a data-only variant that uses only data blockettes.
  • SEG Y: Society of Exploration Geophysicists data format. Common in the energy industry. Developed and maintained by SEG.
  • SLIST: An ASCII file with a one-line header and data written to file in ASCII string format.
  • SUDS: A similar format to SEED, developed by the US Geological Survey (USGS) in the late 1980s.
  • UW: created in the 1970s by the Pacific Northwest Seismic Network (PNSN), USA, for event archival; used until the early 2000s.
  • Win32: maintained by the National Research Institute for Earth Science and Disaster Prevention (NIED), Japan. Continuous data are divided into files that contain a minute of data from multiple channels stored in one-second segments.

Format-Specific Information

SEG Y

Only SEG Y rev 0 and rev 1 with standard headers are supported. The following are known support limitations:

  1. A few SEG Y headers are partially implemented or unused. These will be refined as we obtain more test data with standardized SEG Y headers and known results.

  2. Not all SEG Y files use the gain formula in the SEG Y rev 1 manual. Users are urged to consult equipment manufacturers and/or coders whose software converts proprietary data formats to SEG Y.

  3. SeisBase does not use the Textual File Header (file bytes 1-3600) or Extended Textual File Header records, as these were never standardized. Specify full=true to read the raw bytes into vectors in :misc. These byte vectors can be parsed manually by the user after file read.

Setting the Location Subfield

The location subfield within :id ("LL" in NN.SSSS.LL.CC) is normally blank, but can be set from an arbitrary Int32 quantity in SEG Y. The reason for this behavior is that SEG Y has at least six "recommended" quantities that can indicate a unique channel. Use one by passing the corresponding value from the table below to keyword "ll":

CodeUBytes:miscUsual trace header quantity
0x00None (Default); don't set LL
0x01Y001-004traceseqlineTrace sequence number within line
0x02Y005-008traceseqfileTrace sequence number within SEG Y file
0x03009-012rec_noOriginal field record number
0x04Y013-016channel_noTrace number within original field record
0x05017-020energysrcptEnergy source point number
0x06021-024cdpEnsemble number
0x07?025-028traceinensembleTrace number within the ensemble
0x08037-040src-rec_distDistance from center of source point
0x09041-044rec_eleReceiver group elevation
0x0a045-048src_eleSurface elevation at source
0x0b049-052src_depSource depth below surface (positive)
0x0c053-056recdatumeleDatum elevation at receiver group
0x0d057-060srcdatumeleDatum elevation at source
0x0e061-064srcwaterdepWater depth at source
0x0f065-068recwaterdepWater depth at group
0x10073-076src_xSource coordinate - X
0x11077-080src_ySource coordinate - Y
0x12081-084rec_xGroup coordinate - X
0x13085-088rec_yGroup coordinate - Y
0x14181-184cdp_xX coordinate of ensemble (CDP) position
0x15185-188cdp_yY coordinate of ensemble (CDP) position
0x16189-192inline_3dFor 3-D poststack data, in-line number
0x17193-196crossline_3dFor 3-D poststack data, cross-line number
0x18197-200shot_pointShotpoint number (2-D post-stack data)
0x19205-208trans_mantTransduction Constant (mantissa)
0x1a?233-236unassigned_1Unassigned — For optional information
0x1b?237-240unassigned_2Unassigned — For optional information

A SEG Y file usually increments one (or more) of 0x01, 0x02, or 0x04 for each trace. Unfortunately, we can't imagine any way to use all three, or even two, in a SEGY-compliant channel ID.

Warning: for any quantity above,

  1. Numeric values >1296 lead to nonstandard characters in the LL subfield
  2. Numeric values >7200 lead to non-unique :id fields, with undefined results
  3. Numeric values >9216 cause read_data to throw an InexactError

UW

Only UW v2 (UW-2) data files are supported. We have no reason to believe that any UW-1 data files are in circulation, and external converters to UW-2 exist.

Win32

Use older channel files with caution. They were not controlled by any central authority until the late 2010s. Inconsistencies between different versions of the same channel file were found by SeisBase developers as recently as 2015.

Other File I/O Functions

SeisBase.rseisFunction
rseis(fstr::String[, c::Array{Int64,1}=C, v::Integer=0, memmap::Bool=false])

Read SeisBase files matching file pattern fstr into memory. If an array of record indices is passed to keyword c, only those record indices are read from each file.

  • Set v>0` to control verbosity.
  • Set memmap=true` to use memory mapping. Faster but potentially unsafe.
source
SeisBase.sachdrFunction
sachdr(f)

Print formatted SAC headers from file f to stdout. Does not accept wildcard file strings.

source
SeisBase.segyhdrFunction
segyhdr(f[; passcal=false, ll=LL, swap=false])

Print formatted, sorted SEG-Y headers of file f to stdout. Use keyword passcal=true for PASSCAL/NMT modified SEG Y; use swap=true for big-endian PASSCAL. See SeisBase read_data documentation for ll codes.

source