Time-Series Files
SeisBase.read_data!
— Functionread_data!(S, fmt, filestr [, keywords])
Read data in file format fmt
matching file pattern filestr
into SeisData object S
.
read_data!(S, filestr [, keywords])
Read from files matching file pattern filestr
into SeisData object S
. Calls guess(filestr)
to identify the file type based on the first file matching pattern filestr
. Much slower than manually specifying file type.
- Formats: ah1, ah2, bottle, geocsv, geocsv.slist, lennartz, mseed, passcal, suds, sac, segy, SeisBase, slist, uw, win32
- Keywords: cf, full, jst, memmap, nxadd, nxnew, strict, swap, v, vl
This function is fully described in the official documentation at https://SeisBase.readthedocs.io/ (TODO: Change site) in section Time-Series Files.
See also: SeisBase.KW
, get_data
, guess
, rseis
SeisBase.read_data
— FunctionS = read_data(fmt, filestr [, keywords])
Read data in file format fmt
matching file pattern filestr
into SeisData object S
.
S = read_data(filestr [, keywords])
Read from files matching file pattern filestr
into SeisData object S
. Calls guess(filestr)
to identify the file type based on the first file matching pattern filestr
. Much slower than manually specifying file type.
- Formats: ah1, ah2, bottle, geocsv, geocsv.slist, lennartz, mseed, passcal, suds, sac, segy, SeisBase, slist, uw, win32
- Keywords: cf, full, jst, memmap, nxadd, nxnew, strict, swap, v, vl
This function is fully described in the official documentation at https://SeisBase.readthedocs.io/ (TODO: Change site) in section Time-Series Files.
See also: SeisBase.KW
, get_data
, guess
, rseis
Supported File Formats
File Format | String | Strict Match |
---|---|---|
AH-1 | ah1 | id, fs, gain, loc, resp, units |
AH-2 | ah2 | id, fs, gain, loc, resp |
Bottle (UNAVCO) | bottle | id, fs, gain |
GeoCSV, time-sample pair | geocsv | id |
GeoCSV, sample list | geocsv.slist | id |
Lennartz ASCII | lenartz | id, fs |
Mini-SEED | mseed | id, fs |
PASSCAL SEG Y | passcal | id, fs, gain, loc |
SAC | sac | id, fs, gain |
SEG Y (rev 0 or rev 1) | segy | id, fs, gain, loc |
SeisBase | SeisBase | id, fs, gain, loc, resp, units |
SLIST (ASCII sample list) | slist | id, fs |
SUDS | suds | id |
UW data file | uw | id, fs, gain, units |
Win32 | win32 | id, fs, gain, loc, resp, units |
Strings are case-sensitive to prevent any performance impact from using matches and/or lowercase().
Note that read_data with file format "SeisBase" largely exists as a convenience wrapper; it reads only the first SeisBase object from each file that can be converted to a SeisData structure. For more complicated read operations, $rseis$ should be used.
Warning: GeoCSV files must be Unix text files; DOS text files, whose lines end in "\r\n", will not read properly. Convert with dos2unix
or equivalent Windows Powershell commands.
Supported Keywords
Keyword | Used By | Type | Default | Meaning |
---|---|---|---|---|
cf | win32 | String | \"\" | win32 channel info filestr |
full | [1]_ | Bool | false | read full header into :misc? |
ll | segy | UInt8 | 0x00 | set loc in :id? (see below) |
memmap | * | Bool | false | use Mmap.mmap to buffer file? |
nx_add | [2]_ | Int64 | 360000 | minimum size increase of x |
nx_new | [3]_ | Int64 | 86400000 | length(x) for new channels |
jst | win32 | Bool | true | are sample times JST (UTC+9)? |
swap | [4]_ | Bool | true | byte swap? |
strict | * | Bool | true | use strict match? |
v | * | Integer | 0 | verbosity |
vl | * | Bool | 0 | verbose source logging? [5]_ |
- [1]: used by ah1, ah2, sac, segy, suds, uw; information read into $:misc$ varies by file format.
- [2]: see table below.
- [3]: used by bottle, mseed, suds, win32
- [4]: used by bottle, mseed, suds, win32
- [5]: used by mseed, passcal, segy; swap is automatic for sac.
Performance Tips
mmap=true
improves read speed for some formats, particularly ASCII readers, but requires caution. In our benchmarks, the following significant (>3%) speed changes are observed:
- Significant speedup: ASCII formats, including metadata formats
- Slight speedup: mini-SEED
- Significant slowdown: SAC
- With mseed or win32 data, adjust
nx_new
andnx_add
based on the sizes of
the data vectors that you expect to read. If the largest has Nmax
samples, and the smallest has Nmin
, we recommend nx_new=Nmin
and nx_add=Nmax-Nmin
.
Default values can be changed in SeisBase keywords, e.g.,
SeisBase.KW.nx_new = 60000
SeisBase.KW.nx_add = 360000
The system-wide defaults are nx_new=86400000
and nx_add=360000
. Using these values with very small jobs will greatly decrease performance.
strict=true
may slowread_data
based on the fields matched as part of
the file format. In general, any file format that can match on more than id and fs will read slightly slower with this option.
Channel Matching
By default, read_data
continues a channel if data read from file matches the channel id (field :id). In some cases this is not enough to guarantee a good match. With $strict=true$, read_data
matches against fields :id, :fs, :gain, :loc, :resp, and :units. However, not all of these fields are stored natively in all file formats. Column "Strict Match" in the first table lists which fields are stored (and can be logically matched) in each format with strict=true
.
Examples
S = read_data("uw", "99011116541W", full=true)
- Read UW-format data file $99011116541W$
- Store full header information in $:misc$
read_data!(S, "sac", "MSH80*.SAC")
- Read SAC-format files matching string pattern
MSH80*.SAC
- Read into existing SeisData object $S$
- Read SAC-format files matching string pattern
S = read_data("win32", "20140927*.cnt", cf="20140927*ch", nx_new=360000)
- Read win32-format data files with names matching pattern
2014092709*.cnt
- Use ASCII channel information filenames that match pattern
20140927*ch
- Assign new channels an initial size of
nx_new
samples
- Read win32-format data files with names matching pattern
Memory Mapping
memmap=true
is considered unsafe because Julia language handling of SIGBUS/SIGSEGV and associated risks is undocumented as of SeisBase v1.0.0. Thus, for example, we don't know what a connection failure during memory-mapped file I/O does. In some languages, this situation without additional signal handling was notorious for corrupting files.
Under no circumstances should mmap=true
be used to read files directly from a drive whose host device power management is independent of the destination computer's. This includes all work flows that involve reading files directly into memory from a connected data logger. It is not a sufficient workaround to set a data logger to "always on".
Format Descriptions and Notes
Additional format information can be accessed from the command line by typing SeisBase.formats("FMT")
where FMT is the format name; keys(SeisBase.formats)
for a list.
- AH (Ad-Hoc) was developed as a machine-independent seismic data format based on External Data Representation (XDR).
- Bottle is a single-channel format maintained by UNAVCO (USA).
- GeoCSV: an extension of "human-readable", tabular file format Comma-Separated Values (CSV).
- Lennartz: a variant of sample list (SLIST) used by Lennartz portable digitizers.
- PASSCAL: A single- channel variant of SEG Y with no file header, developed by PASSCAL/New Mexico Tech and used with PASSCAL field equipment.
- SAC: the Seismic Analysis Code data format, originally developed by LLNL for the eponymous command-line interpreter.
- SEED: adopted by the International Federation of Digital Seismograph Networks (FDSN) as an omnibus seismic data standard. mini-SEED is a data-only variant that uses only data blockettes.
- SEG Y: Society of Exploration Geophysicists data format. Common in the energy industry. Developed and maintained by SEG.
- SLIST: An ASCII file with a one-line header and data written to file in ASCII string format.
- SUDS: A similar format to SEED, developed by the US Geological Survey (USGS) in the late 1980s.
- UW: created in the 1970s by the Pacific Northwest Seismic Network (PNSN), USA, for event archival; used until the early 2000s.
- Win32: maintained by the National Research Institute for Earth Science and Disaster Prevention (NIED), Japan. Continuous data are divided into files that contain a minute of data from multiple channels stored in one-second segments.
Format-Specific Information
SEG Y
Only SEG Y rev 0 and rev 1 with standard headers are supported. The following are known support limitations:
A few SEG Y headers are partially implemented or unused. These will be refined as we obtain more test data with standardized SEG Y headers and known results.
Not all SEG Y files use the gain formula in the SEG Y rev 1 manual. Users are urged to consult equipment manufacturers and/or coders whose software converts proprietary data formats to SEG Y.
SeisBase does not use the Textual File Header (file bytes 1-3600) or Extended Textual File Header records, as these were never standardized. Specify full=true to read the raw bytes into vectors in :misc. These byte vectors can be parsed manually by the user after file read.
Setting the Location Subfield
The location subfield within :id ("LL" in NN.SSSS.LL.CC) is normally blank, but can be set from an arbitrary Int32 quantity in SEG Y. The reason for this behavior is that SEG Y has at least six "recommended" quantities that can indicate a unique channel. Use one by passing the corresponding value from the table below to keyword "ll":
Code | U | Bytes | :misc | Usual trace header quantity |
---|---|---|---|---|
0x00 | None (Default); don't set LL | |||
0x01 | Y | 001-004 | traceseqline | Trace sequence number within line |
0x02 | Y | 005-008 | traceseqfile | Trace sequence number within SEG Y file |
0x03 | 009-012 | rec_no | Original field record number | |
0x04 | Y | 013-016 | channel_no | Trace number within original field record |
0x05 | 017-020 | energysrcpt | Energy source point number | |
0x06 | 021-024 | cdp | Ensemble number | |
0x07 | ? | 025-028 | traceinensemble | Trace number within the ensemble |
0x08 | 037-040 | src-rec_dist | Distance from center of source point | |
0x09 | 041-044 | rec_ele | Receiver group elevation | |
0x0a | 045-048 | src_ele | Surface elevation at source | |
0x0b | 049-052 | src_dep | Source depth below surface (positive) | |
0x0c | 053-056 | recdatumele | Datum elevation at receiver group | |
0x0d | 057-060 | srcdatumele | Datum elevation at source | |
0x0e | 061-064 | srcwaterdep | Water depth at source | |
0x0f | 065-068 | recwaterdep | Water depth at group | |
0x10 | 073-076 | src_x | Source coordinate - X | |
0x11 | 077-080 | src_y | Source coordinate - Y | |
0x12 | 081-084 | rec_x | Group coordinate - X | |
0x13 | 085-088 | rec_y | Group coordinate - Y | |
0x14 | 181-184 | cdp_x | X coordinate of ensemble (CDP) position | |
0x15 | 185-188 | cdp_y | Y coordinate of ensemble (CDP) position | |
0x16 | 189-192 | inline_3d | For 3-D poststack data, in-line number | |
0x17 | 193-196 | crossline_3d | For 3-D poststack data, cross-line number | |
0x18 | 197-200 | shot_point | Shotpoint number (2-D post-stack data) | |
0x19 | 205-208 | trans_mant | Transduction Constant (mantissa) | |
0x1a | ? | 233-236 | unassigned_1 | Unassigned — For optional information |
0x1b | ? | 237-240 | unassigned_2 | Unassigned — For optional information |
A SEG Y file usually increments one (or more) of 0x01, 0x02, or 0x04 for each trace. Unfortunately, we can't imagine any way to use all three, or even two, in a SEGY-compliant channel ID.
Warning: for any quantity above,
- Numeric values >1296 lead to nonstandard characters in the LL subfield
- Numeric values >7200 lead to non-unique :id fields, with undefined results
- Numeric values >9216 cause read_data to throw an InexactError
UW
Only UW v2 (UW-2) data files are supported. We have no reason to believe that any UW-1 data files are in circulation, and external converters to UW-2 exist.
Win32
Use older channel files with caution. They were not controlled by any central authority until the late 2010s. Inconsistencies between different versions of the same channel file were found by SeisBase developers as recently as 2015.
Other File I/O Functions
SeisBase.rseis
— Functionrseis(fstr::String[, c::Array{Int64,1}=C, v::Integer=0, memmap::Bool=false])
Read SeisBase files matching file pattern fstr
into memory. If an array of record indices is passed to keyword c, only those record indices are read from each file.
- Set
v>0
` to control verbosity. - Set
memmap=true
` to use memory mapping. Faster but potentially unsafe.
SeisBase.sachdr
— Functionsachdr(f)
Print formatted SAC headers from file f
to stdout. Does not accept wildcard file strings.
SeisBase.segyhdr
— Functionsegyhdr(f[; passcal=false, ll=LL, swap=false])
Print formatted, sorted SEG-Y headers of file f
to stdout. Use keyword passcal=true
for PASSCAL/NMT modified SEG Y; use swap=true
for big-endian PASSCAL. See SeisBase read_data
documentation for ll
codes.