{ "cells": [ { "cell_type": "markdown", "id": "68393dc8", "metadata": {}, "source": [ "# Introduction to geographic data objects in Python\n", "\n", "In this chapter, we will learn how geometric objects (vector) are represented in Python using a library called [shapely](https://shapely.readthedocs.io/en/stable/manual.html) [^shapely]. In the following parts of this chapter, we will use a library called `geopandas` extensively which uses these `shapely` geometries to represent the geographic features in the data. Understanding how these geometric objects work and can be created in Python is extremely useful, because these objects are the fundamental buildings blocks that enable us doing geographic data analysis. " ] }, { "cell_type": "markdown", "id": "a8469fac", "metadata": {}, "source": [ "## Representing vector geometries with `shapely` \n", "\n", "`Shapely` is a fundamental Python package for representing vector data geometries on a computer. Basic knowledge of shapely is important for using higher-level tools that depend on it, such as `geopandas`. \n", "\n", "\n", " Under the hood `shapely` actually uses a C++ library called [GEOS](https://trac.osgeo.org/geos) [^GEOS] to construct the geometries, which is one of the standard libraries behind various Geographic Information Systems (GIS) software, such as [PostGIS](https://postgis.net/) [^PostGIS] or [QGIS](http://www.qgis.org/en/site/) [^QGIS]. Objects and methods available in shapely adhere mainly to [the Open Geospatial Consortium’s Simple Features Access Specification](https://www.ogc.org/standards/sfa) [^OGC_sfa] making them compatible with various GIS tools.\n", "\n", "In this section, we give a quick overview of creating geometries using `shapely`. For a full list of `shapely` objects and methods, see [the shapely user manual online](https://shapely.readthedocs.io/en/stable/manual.html) [^shapely]." ] }, { "cell_type": "markdown", "id": "2e91c9c5", "metadata": {}, "source": [ "### Creating point geometries\n", "\n", "When creating geometries with `shapely`, we first need to import the geometric object class (such as `Point`) that we want to create from `shapely.geometry` which contains all possible geometry types. After importing the `Point` class, creating a point is easy: we just pass `x` and `y` coordinates into the `Point()` -class (with a possible `z` -coordinate) which will create the point for us:" ] }, { "cell_type": "code", "execution_count": 1, "id": "878e4187", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import Point\n", "\n", "point = Point(2.2, 4.2)\n", "point3D = Point(9.26, -2.456, 0.57)\n", "\n", "point" ] }, { "cell_type": "markdown", "id": "747afbae", "metadata": {}, "source": [ "_**Figure 6.1**. ADD PROPER FIGURE CAPTION!._\n", "\n", "Jupyter Notebook is automatically able to visualize the point shape on the screen. We can use the print statement to get the text representation of the point geometry as [Well Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) [^WKT]. The letter Z indicates 3D coordinates. " ] }, { "cell_type": "code", "execution_count": 2, "id": "fd5befd6", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POINT (2.2 4.2)\n", "POINT Z (9.26 -2.456 0.57)\n" ] } ], "source": [ "print(point)\n", "print(point3D)" ] }, { "cell_type": "markdown", "id": "6beca2a9", "metadata": {}, "source": [ "There are different approaches for extracting the coordinates of a `Point` as numerical values. The property the `coords` gives us to access the coordinates of the point geometry as a `CoordinateSequence` which data structure for storing a list of coordinates. For our purposes, we can convert the `coords` into a list to access its contents. " ] }, { "cell_type": "code", "execution_count": 3, "id": "0ef6c6a7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(2.2, 4.2)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(point.coords)" ] }, { "cell_type": "markdown", "id": "b05b5149", "metadata": {}, "source": [ "We can also access the coordinates directly using the `x` and `y` properties of the `Point` object." ] }, { "cell_type": "code", "execution_count": 4, "id": "0c4c27cd", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.2\n", "4.2\n" ] } ], "source": [ "print(point.x)\n", "print(point.y)" ] }, { "cell_type": "markdown", "id": "9fd420da", "metadata": {}, "source": [ "Points and other shapely objects have many useful built-in attributes and methods. See [shapely documentation ](https://shapely.readthedocs.io/en/stable/manual.html#general-attributes-and-methods) [^shapely] for a full list. For example, it is possible to calculate the Euclidian distance between points, or to create a buffer polygon for the point object. However, all of these functionalities are integrated into `geopandas` and we will go through them later in the book. \n" ] }, { "cell_type": "markdown", "id": "7641e6fa", "metadata": {}, "source": [ "### Creating LineString geometries\n", "\n", "\n", "Creating `LineString` -objects is fairly similar to creating `Point`-objects. We need at least two points for creating a line. We can construct the line using either a list of `Point`-objects or pass the point coordiantes as coordinate-tuples to the `LineString` constructor." ] }, { "cell_type": "code", "execution_count": 5, "id": "31ff39e0", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import Point, LineString\n", "\n", "point1 = Point(2.2, 4.2)\n", "point2 = Point(7.2, -25.1)\n", "point3 = Point(9.26, -2.456)\n", "\n", "line = LineString([point1, point2, point3])\n", "line_from_tuples = LineString([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])\n", "line" ] }, { "cell_type": "markdown", "id": "b03324f1-e590-4be0-bd1d-a3a2acfb0f65", "metadata": {}, "source": [ "_**Figure 6.2**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 6, "id": "4e5da9c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LINESTRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456)\n" ] } ], "source": [ "print(line)" ] }, { "cell_type": "markdown", "id": "2fe4dc61", "metadata": {}, "source": [ "As we can see from above, the WKT representation of the `line` -variable constitutes of multiple coordinate-pairs. `LineString` -objects have many useful built-in attributes and methods similarly as `Point` -objects. It is for instance possible to extract the coordinates, calculate the length of the `LineString`, find out the centroid of the line, create points along the line at specific distance, calculate the closest distance from a line to specified Point, or simplify the geometry. A full list of functionalities can be read from `shapely` documentation [^shapely]. Most of these functionalities are directly implemented in `geopandas` (see next chapter), hence you very seldom need to parse these information directly from the `shapely` geometries yourself. However, here we go through a few of them for reference. We can extract the coordinates of a LineString similarly as with `Point`:" ] }, { "cell_type": "code", "execution_count": 7, "id": "bb87a20c", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "[(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(line.coords)" ] }, { "cell_type": "markdown", "id": "757ebec8", "metadata": {}, "source": [ "As a result, we have a list of coordinate tuples (x,y) inside a list. If you need to access all `x` -coordinates or all `y` -coordinates of the line, you can do it directly using the `xy` attribute: " ] }, { "cell_type": "code", "execution_count": 8, "id": "4821c6d2", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2.2, 7.2, 9.26]\n", "[4.2, -25.1, -2.456]\n" ] } ], "source": [ "xcoords = list(line.xy[0])\n", "ycoords = list(line.xy[1])\n", "\n", "print(xcoords)\n", "print(ycoords)" ] }, { "cell_type": "markdown", "id": "d6c5ad83", "metadata": {}, "source": [ "It is possible to retrieve specific attributes such as `length` of the line and center of the line (`centroid`) straight from the `LineString` object itself:" ] }, { "cell_type": "code", "execution_count": 9, "id": "3de3f95a", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Length of our line: 52.46 units\n", "Centroid: POINT (6.229961354035622 -11.892411157572392)\n" ] } ], "source": [ "length = line.length\n", "centroid = line.centroid\n", "print(f\"Length of our line: {length:.2f} units\")\n", "print(f\"Centroid: {centroid}\")" ] }, { "cell_type": "markdown", "id": "c3d8402d", "metadata": {}, "source": [ "As you can see, the centroid of the line is again a Shapely Point object. In practice, you would rarely access these attributes directly from individual `shapely` geometries, but we can do the same things for a set of geometries at once using `geopandas`. " ] }, { "cell_type": "markdown", "id": "28e421a1", "metadata": {}, "source": [ "### Creating Polygon geometries\n", "\n", "Creating a `Polygon` -object continues the same logic of creating `Point` and `LineString` objects. A `Polygon` can be created by passing a list of `Point` objects or a list of coordinate-tuples as input for the `Polygon` class. Polygon needs at least three coordinate-tuples to form a surface. In the following, we use the same points from the earlier `LineString` example to create a `Polygon`." ] }, { "cell_type": "code", "execution_count": 10, "id": "4969848f", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import Polygon\n", "\n", "poly = Polygon([point1, point2, point3])\n", "poly" ] }, { "cell_type": "markdown", "id": "8503ab35-6437-47a3-85ce-2f375b6c1a5e", "metadata": {}, "source": [ "_**Figure 6.3**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 11, "id": "3be334d8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POLYGON ((2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2))\n" ] } ], "source": [ "print(poly)" ] }, { "cell_type": "markdown", "id": "5de30468", "metadata": {}, "source": [ "Notice that the `Polygon` WKT representation has double parentheses around the coordinates (i.e. `POLYGON (())` ). This is because Polygon can also have holes inside of it. A `Polygon` is constructed from *exterior* ring and and optiona *interior* ring, that can be used to represent a hole in the polygon. You can get more information about the `Polygon` object by running `help(poly)` of from the [shapely online documentation](https://shapely.readthedocs.io/en/stable/manual.html?highlight=Polygon#Polygon) [^polygon]. Here is a simplified extract from the output of `help(Polygon)`:\n", "\n", "```\n", "class Polygon(shapely.geometry.base.BaseGeometry)\n", " | Polygon(shell=None, holes=None)\n", " | \n", " | A two-dimensional figure bounded by a linear ring\n", " | \n", " | A polygon has a non-zero area. It may have one or more negative-space\n", " | \"holes\" which are also bounded by linear rings. If any rings cross each\n", " | other, the feature is invalid and operations on it may fail.\n", " | \n", " | Attributes\n", " | ----------\n", " | exterior : LinearRing\n", " | The ring which bounds the positive space of the polygon.\n", " | interiors : sequence\n", " | A sequence of rings which bound all existing holes.\n", " | \n", " | Parameters\n", " | ----------\n", " | shell : sequence\n", " | A sequence of (x, y [,z]) numeric coordinate pairs or triples.\n", " | Also can be a sequence of Point objects.\n", " | holes : sequence\n", " | A sequence of objects which satisfy the same requirements as the\n", " | shell parameters above\n", "```" ] }, { "cell_type": "markdown", "id": "6c063ab5", "metadata": {}, "source": [ "If we want to create a `Polygon` with a hole, we can do this by using parameters `shell` for the exterior and `holes` for the interiors. \n", "\n", "Let's see how we can create a `Polygon` with a hole in it. Notice, that because a `Polygon` can have multiple holes, the `hole_coords` variable below contains nested square brackets (`[[ ]]`), which is due to the possibility of having multiple holes in a single `Polygon`. First, let's define the coordinates for the exterior and interior rings." ] }, { "cell_type": "code", "execution_count": 12, "id": "47cd3e68", "metadata": {}, "outputs": [], "source": [ "# Define the exterior\n", "exterior = [(-180, 90), (-180, -90), (180, -90), (180, 90)]\n", "\n", "# Define the hole\n", "hole = [[(-170, 80), (-170, -80), (170, -80), (170, 80)]]" ] }, { "cell_type": "markdown", "id": "7906660f", "metadata": {}, "source": [ "Create the polygon with and without the hole:" ] }, { "cell_type": "code", "execution_count": 13, "id": "79f577d5", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly_without_hole = Polygon(shell=exterior)\n", "poly_without_hole" ] }, { "cell_type": "markdown", "id": "aad2de60-c895-4bc6-9aed-021b40a896f5", "metadata": {}, "source": [ "_**Figure 6.4**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 14, "id": "61b12644", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly_with_hole = Polygon(shell=exterior, holes=hole)\n", "poly_with_hole" ] }, { "cell_type": "markdown", "id": "7da4a6be", "metadata": {}, "source": [ "_**Figure 6.5**. ADD PROPER FIGURE CAPTION!._\n", "\n", "Let's also check how the WKT representation of the polygon looks like." ] }, { "cell_type": "code", "execution_count": 15, "id": "67c2bdd4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POLYGON ((-180 90, -180 -90, 180 -90, 180 90, -180 90), (-170 80, -170 -80, 170 -80, 170 80, -170 80))\n" ] } ], "source": [ "print(poly_with_hole)" ] }, { "cell_type": "markdown", "id": "95ba9743", "metadata": {}, "source": [ "As we can see the `Polygon` has now two different tuples of coordinates. The first one represents the outer ring and the second one represents the inner ring, i.e. the hole.\n", "\n", "\n", "There are many useful attributes and methods related to shapely `Polygon`, such as `area`, `centroid`, `bounding box`, `exterior`, and `exterior-length`. For a full list, the `shapely` documentation [^shapely]. Same attributes and methods are also available when working with polygon data in `geopandas`. Here are a couple of polygon attributes that are often useful when doing geographic data analysis." ] }, { "cell_type": "code", "execution_count": 16, "id": "be96e5a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Polygon centroid: POINT (6.22 -7.785333333333334)\n", "Polygon Area: 86.789\n", "Polygon Bounding Box: (2.2, -25.1, 9.26, 4.2)\n", "Polygon Exterior: LINEARRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2)\n", "Polygon Exterior Length: 62.16395199996553\n" ] } ], "source": [ "print(\"Polygon centroid: \", poly.centroid)\n", "print(\"Polygon Area: \", poly.area)\n", "print(\"Polygon Bounding Box: \", poly.bounds)\n", "print(\"Polygon Exterior: \", poly.exterior)\n", "print(\"Polygon Exterior Length: \", poly.exterior.length)" ] }, { "cell_type": "markdown", "id": "b0b8f977", "metadata": {}, "source": [ "Notice, that the `length` and `area` information are presented here in decimal degrees because our input coordinates were passed as longitudes and latitudes. We can get this information in more sensible format (in meters and m2) when we start working with data in a projected coordinate system later in the book. \n", "\n", "Box polygons that represent the minimum bounding box of given coordinates are useful in many applications. `shapely.box` can be used for creating rectangular box polygons based on on minimum and maximum `x` and `y` coordinates that represent the coordinate information of the bottom-left and top-right corners of the rectangle. Here we will use `shapely.box` to re-create the same polygon exterior. " ] }, { "cell_type": "code", "execution_count": 17, "id": "6a863ed7", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import box\n", "\n", "min_x, min_y = -180, -90\n", "max_x, max_y = 180, 90\n", "box_poly = box(minx=min_x, miny=min_y, maxx=max_x, maxy=max_y)\n", "box_poly" ] }, { "cell_type": "markdown", "id": "3ca04e09-8bda-4ee3-adaa-be9221a10d6e", "metadata": {}, "source": [ "_**Figure 6.6**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 18, "id": "560d3f75", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "POLYGON ((180 -90, 180 90, -180 90, -180 -90, 180 -90))\n" ] } ], "source": [ "print(box_poly)" ] }, { "cell_type": "markdown", "id": "30855618", "metadata": {}, "source": [ "In practice, the `box` function is quite useful for example when you want to select geometries from a specific area of interest. In these cases, you only need to find out the coordinates of two points on the map to be able create the bounding box polygon. " ] }, { "cell_type": "markdown", "id": "4e5932b2", "metadata": {}, "source": [ "### Creating MultiPoint, MultiLineString and MultiPolygon geometries" ] }, { "cell_type": "markdown", "id": "1355f971", "metadata": {}, "source": [ "Creating a collection of `Point`, `LineString` or `Polygon` objects is very straightforward now as you have seen how to create the basic geometric objects. In the `Multi` -versions of these geometries, you just pass a list of points, lines or polygons to the `MultiPoint`, `MultiLineString` or `MultiPolygon` constructors as shown below:" ] }, { "cell_type": "code", "execution_count": 19, "id": "68841a11", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import MultiPoint, MultiLineString, MultiPolygon\n", "\n", "multipoint = MultiPoint([Point(2, 2), Point(3, 3)])\n", "multipoint" ] }, { "cell_type": "markdown", "id": "0bea0c27-7cd8-487f-a249-c01917abb89a", "metadata": {}, "source": [ "_**Figure 6.7**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 20, "id": "2f9f26b8", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "multiline = MultiLineString(\n", " [LineString([(2, 2), (3, 3)]), LineString([(4, 3), (6, 4)])]\n", ")\n", "multiline" ] }, { "cell_type": "markdown", "id": "cab3ca6e-22f2-4749-b0e6-eeb19f4d0467", "metadata": {}, "source": [ "_**Figure 6.8**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "code", "execution_count": 21, "id": "a69a2178", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "multipoly = MultiPolygon(\n", " [Polygon([(0, 0), (0, 4), (4, 4)]), Polygon([(6, 6), (6, 12), (12, 12)])]\n", ")\n", "multipoly" ] }, { "cell_type": "markdown", "id": "2880b746-e98d-469f-af7c-9dd2bd202a3e", "metadata": {}, "source": [ "_**Figure 6.9**. ADD PROPER FIGURE CAPTION!._" ] }, { "cell_type": "markdown", "id": "b33c96be", "metadata": {}, "source": [ "#### Question 6.1\n", "\n", "Create these shapes using Shapely!\n", "\n", "- **Triangle** \n", "- **Square** \n", "- **Circle**" ] }, { "cell_type": "code", "execution_count": 22, "id": "3e3eaf81", "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "# Use this cell to enter your solution." ] }, { "cell_type": "code", "execution_count": 23, "id": "e4276616", "metadata": { "tags": [ "hide_cell", "remove_book_cell" ] }, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Solution\n", "\n", "# Triangle\n", "Polygon([(0, 0), (2, 4), (4, 0)])\n", "\n", "# Square\n", "Polygon([(0, 0), (0, 4), (4, 4), (4, 0)])\n", "\n", "# Circle (using a buffer around a point)\n", "point = Point((0, 0))\n", "point.buffer(1)" ] }, { "cell_type": "markdown", "id": "b0b1d104", "metadata": {}, "source": [ "## Footnotes\n", "\n", "[^GEOS]: \n", "[^OGC_sfa]: \n", "[^polygon]: \n", "[^PostGIS]: \n", "[^QGIS]: \n", "[^shapely]: \n", "[^WKT]: " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 5 }