Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# Toy Problems

Toy problems for learning spatial data science

## Index

1. Mock Data
2. Deterministic Interpolation using IDW
157 changes: 157 additions & 0 deletions idw/01_math.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deterministic Interpolation using Inverse Distance Weighted\n",
"\n",
"*Spatial interpolation is the process of estimating unknown values across space by making use of the values that we do know and using the assumptions that we can make based on Tobler's First Law of Geography. It may also sometimes be referred to as spatial predicition. Inverse Distance Weighted (IDW) interpolation is a deterministic interpolation method (meaning that the outputs for a given set of inputs will always be the same) that makes use of a fundamental mathematical concept in GIScience known as the Weighted Linear Combination (WLC).*\n",
"\n",
"Last updated: Nov 28, 2023"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Math behind IDW\n",
"\n",
"IDW is ultimately built off of the WLC, where the weight is defined as inverse distance, but in order to understand the process as a deeper level of understanding, we need to understand the mathematical definitions behind the method, which means we need to familiarize ourselves with the notation that is used for writing out mathematical formulas (called sigma notation), and see what the mean and weighted mean (WLC) look like with this notation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sigma Notation\n",
"\n",
"Sigma notation is used to shorten up the amount of information that we need to write for the process of summation (adding everything in a series together) by making use of the Greek letter sigma ($\\Sigma$).\n",
"\n",
"In other words, if I had a series of values from 1 to 10 and I wanted to add them up, rather than actually writing out all of the values I could use sigma notation to show the same thing without writing it all out.\n",
"\n",
"$$sum = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10$$\n",
"\n",
"$$sum = \\sum \\limits _{i=1} ^{10} i $$\n",
"\n",
"Let's break down this formula above even further. It can be read as: the sum of i as i goes from 1 to 10 is equal to $sum$. So the value below the sigma indicates our starting value, the value above the sigma indicates our ending value, and the expression to the right of the sigma indicates what we are adding to our sum as we iterate through the different values of $i$. In this case, since we just want the sum of values from 1 to 10, we just use an expession of $i$ itself, since we just want to add every value of $i$ together.\n",
"\n",
"With this in mind, we can then confidently say that:\n",
"\n",
"$$\\sum \\limits _{i=1} ^{10} i = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mean\n",
"\n",
"Now that we understand what sigma notation is and how we can use it, we can also see how to use it to represent the mean of a series.\n",
"\n",
"If we want to take the mean of a series called $x$ (which could be anything, just like out set of values from 1 to 10 in the previous example), we can access each value of $x$ through an index called $i$. So $x_i$ would indicate the value at index $i$ in $x$. In order to loop through all the indexes we will also need to know the total number of values in $x$, also referred to as the length, which we will call $n$.\n",
"\n",
"If we put this all together, we can write this as:\n",
"\n",
"$$\\bar{x} = \\frac{\\sum \\limits _{i=1} ^{n} x_{i}}{n}$$\n",
"\n",
"This expression is read as: the mean of series $x$ is the sum of the value $x_{i}$ as $i$ goes from 1 to $n$, over $n$.\n",
"\n",
"Intuitively, this makes sense, since the mean of a series is the sum of the values divided by the number of values in the series.\n",
"\n",
"This following also shows the different equivalent ways of writing this:\n",
"\n",
"$$\\bar{x} = \\frac{\\sum \\limits _{i=1} ^{n} x_{i}}{n} = \\frac{x_{1} + x_{2} + ... + x_{n}}{n}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weighted Linear Combination (WLC)\n",
"\n",
"WLC is simply an extension of a simple mean that adds in weights to the equation. In it's mist simple case, it is the same as a weighted mean. In addition, instead of using the number of values in $x$ as the denominator, we will use the sum of $w$ as the denominator.\n",
"\n",
"If we have a series of values $x$ and a corresponding series of weights $w$, where $w_{i}$ is the weight of value $x_{i}$, we can compute the WLC/weighted mean as:\n",
"\n",
"$$\\bar{x} = \\frac{\\sum \\limits _{i=1} ^{n} w_{i} * x_{i}}{\\sum \\limits _{i=1} ^{n} w_{i}}$$\n",
"\n",
"This expression is read as: the weighted mean of series $x$ is the sum of the value $x_{i}$ multiplied by the weight $w_{i}$ as $i$ goes from 1 to $n$, over the sum of the value $w_{i}$ as $i$ goes from 1 to $n$.\n",
"\n",
"This following also shows the different equivalent ways of writing this:\n",
"\n",
"$$\\bar{x} = \\frac{\\sum \\limits _{i=1} ^{n} w_{i} * x_{i}}{\\sum \\limits _{i=1} ^{n} w_{i}} = \\frac{x_{1} * w_{1} + x_{2} * w_{2} + ... + x_{n} * w_{n}}{w_{1} + w_{2} + ... + w_{n}}$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Distance\n",
"\n",
"The fundamental change that is made to transform WLC into IDW is the use of distance as a weight. In order to do this, we first need to define and understand distance itself. There are many different types of distance that we can use to measure how close two points are (e.g., Euclidean, Manhattan, Haversine, etc...) and for the purpose of this demonstration, these mathematical defintions of the verious ways distance can be calculated will not be covered in depth.\n",
"\n",
"So if we want to find the distance between points 0 and $i$, we would define this as:\n",
"\n",
"$$d_{0, i}$$\n",
"\n",
"We can also say that:\n",
"\n",
"$$d_{0, i} = d(x_{0}, X)$$\n",
"\n",
"where $x_{0}$ is the target point and $X$ is a set of tuples containing values for the latitude, longitude, and a value.\n",
"\n",
"Mathematically, this could be written as $x_{i} = \\{lat_{i}, lon_{i}, val_{i} \\}$ where $x_{i}$ is a point in $X$.\n",
"\n",
"All this shows is how we can depict distance between two points using their indexes using this specific notation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inverse Distance\n",
"\n",
"If we were to use the raw distance as a weight, higher distance values would yield greater weights (i.e., more similarity between the values), but we know that the opposite is actually the typically the case. So in order to model that smaller distances should yield higher weights, we can use inverse distance. All this means is that we can use 1 divided by the distance to produce inverse distance, which will give us weights that reflect the notion that closer values will be more similar.\n",
"\n",
"Mathematically, we can define the inverse distance as follows:\n",
"\n",
"$$inverse \\: distance = \\frac{1}{d_{0, i}}$$\n",
"\n",
"Typically, we want shorten this as follows (since a negative exponent simply flips the numerator and denominator):\n",
"\n",
"$$\\frac{1}{d_{0, i}} = d_{0, i}^{- \\alpha}$$\n",
"\n",
"where alpha ($\\alpha$) is equal to 1.\n",
"\n",
"In this notation, we also introduce an exponent called alpha ($\\alpha$) which is used to model the fact that distance does not follow a linear decay, but rather is exponential. Simply put, the effect of distance on our values follows an exponential trend rather than a linear trend, so we can use $\\alpha$ to model what this exponential decay looks like. Typically speaking, $\\alpha$ is between 1 and 3, but 1.5 or 2 are the most common. As $\\alpha$ increases, more emphasis is placed on closer points, and vice versa."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inverse Distance Weighted\n",
"\n",
"Now that we know the basics of WLC and inverse distance, we can combine them to get the math behind IDW.\n",
"\n",
"For a value that we want to predict called $x_{val, 0}$, we can say the following:\n",
"\n",
"$$x_{val, 0} = \\frac{\\sum \\limits _{i=1} ^{n} x_{val, i} * d_{0, i}^{- \\alpha}}{\\sum \\limits _{i=1} ^{n} d_{0, i}^{- \\alpha}}$$\n",
"\n",
"This can be read as: the value of $x_{val, 0}$ (i.e., the value of $x_{0}$) is equal to the sum of the value $x_{val, i}$ multiplied by the inverse distance from $x_{0}$ to $x_{i}$ as $i$ goes from 1 to $n$, over the sum of the inverse distance from $x_{0}$ to $x_{i}$ as $i$ goes from 1 to $n$.\n",
"\n",
"This following also shows the different equivalent ways of writing this:\n",
"\n",
"$$x_{val, 0} = \\frac{\\sum \\limits _{i=1} ^{n} x_{val, i} * d_{0, i}^{- \\alpha}}{\\sum \\limits _{i=1} ^{n} d_{0, i}^{- \\alpha}} = \\frac{x_{1} * d_{0, 1}^{- \\alpha} + x_{2} * d_{0, 2}^{- \\alpha} + ... + x_{n} * d_{0, n}^{- \\alpha}}{d_{0, 1}^{- \\alpha} + d_{0, 2}^{- \\alpha} + ... + d_{0, n}^{- \\alpha}}$$"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
495 changes: 495 additions & 0 deletions idw/02_basic_idw.ipynb

Large diffs are not rendered by default.

Loading