Now we understand what a GIS is and what is can do, the next step is to understand how a GIS is made. But first you need to take a crash course in GIS data.
If you think the term GIS is vague, then you haven’t seen anything yet. There are a dizzying array of formats used for storing GIS data.
Before we delve into the various formats let’s take a look at some fundamentals. Primarily there are two main types of GIS data: vector and raster.
Vector Data
You can think of vector data as instructions for how to render data. The best way to visualise it is to think of it as a spreadsheet with columns that contain your regular data, but in addition it always has an extra column called “geometry”.
That column contains one or more coordinates that describe how to draw the point, line or polygon that represents that feature on the face of the earth.
Raster Data
If vector data is abstract, raster data is literal. Raster data is a bitmap image such as a TIFF or JPEG. This format is usually used for satellite imagery, aerial photography, elevation models and topographic maps.
Introducing the Shapefile
The Shapefile is the most common format in GIS. It’s a geospatial vector data format that can be read by almost all GIS systems.
The name "Shapefile" is a little deceptive, because a shapefile is made up of at least four parts. The .SHP, the .DBF, the .SHX, and the .PRJ.
It’s not important that you remember what’s in each part of of a Shapefile, but I think a brief explanation will help you better understand how GIS data is structured in general.
SHP
contains the geometry of each feature.
DBF
is a dBase file which contains the attribute data for all of the features in the dataset. The dBase file is very similar to a sheet in a spreadsheet.
SHX
is the spatial index, it allows GIS systems to find features within the .SHP file more quickly.
PRJ
is the projection file. It contains information about the “projection” and “coordinate system” the data uses.
Geometry Type
Every Shapefile can only contain one geometry type. This means that every feature in the dataset will be either a point, a line or a polygon. You can’t have a dataset that contains a mixture of geometry types.
Most beginner and intermediate level GIS users never need to look any further than the Shapefile for storing and sharing map data.
So that wraps up our introduction to the Shapefile.
JARGON ALERT : Projections & Coordinate SystemsYou could fill an entire book on this subject, but it would be pretty dry reading for most of us!
The short answer is that the earth is a three dimensional sphere and your screen is two dimensional and flat. In order to display the earth on your screen, it first needs to be flattened.
This was originally achieved by literally "projecting" a spherical map into a flat surface by shining a light through it.
Projections create distortions, with each method of projection producing a very accurate representation in one part of the world, and various types of distortions elsewhere.
In Mercator projection, this distortion is most evident at the poles. This is the reason that on some maps Greenland looks the same size as the whole of South America.
Map projections can be deceptive!#dataviz #maps #GIS #projectionmapping #mapping pic.twitter.com/LAx6Ir9Dqq
— Neil Kaye (@neilrkaye) October 12, 2018
Check out The true size of... for a great interactive map that lets you explore this distortion.
There are many different formulas of flattening the earth, each designed to cause less distortion in specific places on earth.
You don’t need to understand how this process works as the data you use will already have the correct coordinate system. And if you are making a new dataset, the default coordinate system used in most GIS systems (WGS84) will be suitable 99% of the time.
If you want to dive a little deeper, check out Mango CEO Chris explaining projections.
Other Common GIS Data Formats
There a lots of other formats used in GIS. Each with their own distinctive benefits and drawbacks. Here’s a quick list of other common formats that you might come across:
CSV - Comma Separated Value File
Although the CSV isn’t exclusively a mapping format, it is often used in mapping. The beauty of the CSV is its simplicity. This simplicity means they can be read by almost any program including the Excel or Google Docs.
It’s literally a text file where columns are separated by commas and rows are separated by line breaks. When used in mapping, two extra columns are added to hold the x and y, or lat and lon.
For mapping purposes this format is only really used for sharing point layers. The downside of the CSV is that they are very easy to break. Just one comma in the wrong place and the file becomes unreadable.
File GeoDatabase
A file geodatabase (or "FileGDB") is a collection of files in a folder on disk that can store, query, and manage both spatial and non-spatial data.
This is a popular format amongst advanced GIS users. But despite originally being touted as the favourite to replace the old but entrenched Shapefile as the defacto standard for sharing GIS data, the FileGDB never gained the popular support that many believed it would.
The main reason being its lack of support amongst open source GIS platforms.
MapInfo® Tab
This format is very similar to the Shapefile and is the default format used by the MapInfo® desktop GIS system.
KML
This is the format most likely to be known by non-GIS users, as it is the default file format of Google Earth.
Unlike the other datasets covered here, KML does more than just store geometry and attribute data, it also contains lots of configuration options for Google Earth maps.
This extra information however makes KML less portable, as the additional information is only relevant to Google Earth and isn’t of any value to other GIS systems.
GeoJSON
JSON or to give it its full name JavaScript Object Notation is a lightweight data interchange format.
It’s primarily used by software developers due to the ease with which it can processed by web applications.
GeoJSON is a form of JSON that also contains geometry data. It’s not often used as a format for sharing spatial data for human consumption but is very popular as an output for API’s (application programming interface).
GeoJSON data of the national border of Romania:
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "id": "ROU", "properties": { "stroke": "#555555", "stroke-width": 2, "stroke-opacity": 1, "fill": "#555555", "fill-opacity": 0.5, "name": "Romania", "url": "https://www.gov.ro/en" }, "geometry": { "type": "Polygon", "coordinates": [ [ [22.710531,47.882194],[23.142236,48.096341],[23.760958,47.985598],[24.402056,47.981878],[24.866317,47.737526],[25.207743,47.891056],[25.945941,47.987149],[26.19745,48.220881],[26.619337,48.220726],[26.924176,48.123264],[27.233873,47.826771],[27.551166,47.405117],[28.12803,46.810476],[28.160018,46.371563],[28.054443,45.944586],[28.233554,45.488283],[28.679779,45.304031],[29.149725,45.464925],[29.603289,45.293308],[29.26543,45.035391],[29.141612,44.82021],[28.837858,44.913874],[28.558081,43.707462],[27.970107,43.812468],[27.2424,44.175986],[26.065159,43.943494],[25.569272,43.688445],[24.100679,43.741051],[23.332302,43.897011],[22.944832,43.823785],[22.65715,44.234923],[22.474008,44.409228],[22705726,44.578003],[22.459022,44.702517],[22.145088,44.478422],[21.562023,44.768947],[21.483526,45.18117],[20.874313,45.416375],[20.762175,45.734573],[20.220192,46.127469],[21.021952,46.316088],[21.626515,46.994238],[22.099768,47.672439],[22.710531,47.882194] ] ] } } ] }
GeoTIFF
The GeoTIFF is the most widely supported raster data format. TIFF is a bitmap image format similar to GIF, PNG or JPEG, and is commonly the output of drone, aerial, or satellite capture.
A GeoTIFF is just a regular TIFF that also contains special metadata that allows us to know where it should be placed on a map.
A GeoTIFF is an uncompressed format. There are many other raster formats that offer compression to reduce the filesize, but these tend to be proprietary formats that require additional paid software to use.