Other formats
Other formats you might find data
While those standard representations provide as comprehensive a coverage about the different ways that data can be represented, sometimes the format of the data is such that, even though it might still be representative of those core data types, it’s form and the way we interact with it is such that it warrents a special call out here.
Text based storing (ASCII)
Shapefiles and Sidecar files
Sidecar files are separate pieces of data that otherwise work in concert to represent a concept.
In the geospatial domain, one of the most common ways you’ll encounter this concept is the shapefile format. Popularized by ESRI, this format visually appears in the ESRI tool ecosystem as a single file, but on your hard drive this is actually composed of multiple files which all share the same base filename, but have different extensions. A graphical view of those formats is shown above.
For a shapefile to be consider “valid”, it’s common to require just the the .shp, .shx and.dbf files, which I’ve called out as “principle” here. The projection file is the definition of the projection the .sbn and .sbx files are used to optimize spatial analyzes, the .xml file defines metadata, and the .cpg file defines encoding. One of the common limitations of shape files is file size. The size of both .shp and .dbf component files cannot exceed 2 GB (or 231 bytes), which encompasses ~70,000,000 points. Another common limitation is that these files can have no more than 255 attributes (fields) and those field names are limited to 10 constrained characters. This limitation is commonly encountered with high resolution data and authoritative datasets at even the county to national level.
A note on the definition of principal here. While the technical implementation of reading data from a shapefile requires no projection file to successfully read in, the way the resulting array is placed in space by virtually all software requires some form of spatial definition. At this point, most of the commonly used software will make a default guess or force you to make a declaration. While the latter is preferred in my view (and that of the sf implementations), the former is perfectly acceptable so long as you actually stop to read the notice it places at you and understand the implications. As I say, I prefer the sf implementation of that process. If you give me some data but don’t tell me where it goes, I can faff about all day but I’m not going to try and put it on a map until you tell me where it should go. This statement flies in stark contrast with my actions. See RRASSLER’s shotgun_proj_test.R function, and my inability to detach myself from datums.
While not a strict expression of this format, in the hydraulic science the most common way you’ll encounter a variation on this concept is the HEC-RAS model format. HEC-RAS solves the energy > elevation portion of the integrated modeling stack, and is primarily used to represent channels in 1 Dimension. To encode that representation and all the other inputs the model needs, that data is shared across many files.