Always telling this whenever the topic of Kalman Filters come up:
If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.
To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF).
Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.
I appreciate you taking the time to help people understand higher level concepts.
From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.
But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.
I took a look of that line and give you my thoughts. It is a lot of information in a single line.
First line is a Gaussian in some extended form. The lower case p strongly indicates a probability density functions. We are faced with moments, but I don't know if they are centralized. Classically we can integrate x p to get \mu and with x^2 p and subtracting \mu to get a \sigma for a Gaussian. So \mu and \Sigma should be constant, yet we have strong indication they are more then Skalars (single numbers) and that the capitalisation indicates a matrix form. Therefore \mu could also be a vector of similar dimension. The \operatorname{det} enforces it to be Skalar again, the 2 pi are not affected by det at all and could be moved outside. Whatever \Sigma is, it must be invertable or p is zero for all x, because of det. The uppercase T is a transpose, here of a vector, which multiplied with a matrix vector product is a dot product, therefore a Skalar again. Taking a close look at how the exponent -1 is written for \Sigma, we could assume \Sigma to be some invertable square matrix with real values, I guess.
This leads to conclude, that p is a Skalar, yet \mu and x are vectors of length N and \Sigma is a matrix with of NxN (guess).
What is omitted is, what type of numbers we are dealing with and what vector space we are playing. We may have capital letters for matrices going further in this type of notation. The notation doesn't care if a letter is a Skalar or of higher dimension.
So we have a multidimensional probability density function based on a Gaussian. For some reason one moment is a vector and another moment is a matrix.
* Imagine all that thinking from a single line. It is highly overrated, trust me.
Not at all. I share your sentiment. Many topics are beyond my intellectual grasp at this time, but I’m always hopeful that my curiosity will lessen their obscurity given time and persistence.
With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.
Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true. All of this knowledge is free, and much more interesting than a formal restatement of "it was not supposed to rain, but I see clouds outside, I guess I'll expect light rain instead of a big thunderstorm".
> With no mathematical rigor there is no mathematical understanding
While I appreciate rigor to really know deep details, is noy only not a requirement for understanding, but a hurdle. A terrible insurmountable hurdle.
To first have understanding, I need some kind intuition. Some explanation that makes sense easily. That explanation is btw, what typically the inventor or discoverer had. Begin with, before nailing it down with rigor.
> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.
I will think more about this, but I'm not sure I agree. I have enjoyed reading Feynman talk about twins and one going on a supersonic vacation without understanding the math. Verisimilitude allows a modeling of understanding with a scalar representation of scientific knowledge, so why not?
Of course I would like to understand the math in its purest forms–just the same as I wanted to read 1Q84 in Japanese to be able to fully experience it in its purest form, but my life isn't structured in a way were that is realistic even if the knowledge of the Japanese language is free.
> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.
I can't even foil so the journey toward understanding can feel unattainable in the time resources I have. This absolutely may be a limiting belief, but the concept of knowledge being free ignores the time cost for some exploring these outside of academia or professional setting.
Indeed everything has an opportunity cost, and every life has its own priorities.
Since you mention Feynman, I would like to observe that many expositors who target the lay audience have the skill of making the audience believe that they have comprehended(1) something of an intellectual world that they have no technical grounding to truly comprehend(2). In my view these are two distinct types of comprehension/understanding. So long as the audience is clear on which type of understanding they are getting, and is not wasting time unwittingly pursuing one type at the expense of the other then I see no harm.
There is a risk however, that the pop expositors will put you in a headspace where even if you are faced with accessible, but type 2, material you will not be familiar with what really constitutes understanding. As a mature age student it took me quite a few years of maths exams to switch from 1 to 2. Nowadays I am more comfortable with admitting that I don't understand some piece of math (for that is the first step on the path to learning) than being satisfied with a pop-expository gist.
I've thought a lot about this exact topic. You need both to do well.
You need handwavy and vague versions of things to understand the shape of them and to build intuition.
Then you need to test the intuition and build up levels of rigor.
Especially in the context of the Kalman Filter. I just helped a bunch of middle school students build a system for field localization and position tracking. They don't have all kinds of knowledge. They don't have linear algebra or a real understanding of something being gaussian and have to have a bazillion variables. They understand that their estimates and the quality of stuff coming off their sensors have different qualities based on circumstances, and that gain needs to vary. They'll never hit the optimum parameters.
But: their system works. They understand how it works (even if they don't know how to quantify how well it works). They understand how changing parameters changes its behavior. When they learn tracking filters and control by root locus and all kinds of things later, they'll have an edge in understanding what things mean and how it actually works. I expect their intuition will give them an easier time in tackling harder problems.
Conversely, I've encountered a bunch of students who know what "multimodal" means but couldn't name a single example in the real world of such a thing. I would argue that they don't even know what they're talking about, even if they can calculate a mixture coefficient under ideal conditions.
There's a lot of fluffly language here that isn't saying much.
Linear algebra is not something that takes years of patient study to gain basic competency. It had almost no prerequisites and can be understood enough to understand least squares in a focused weekend or two.
> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.
None of these are needed, or even useful, for understanding the Kalman filter.
I think the easiest way depends on your background knowledge. If you understand linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.
For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a^2, where a and b give the assumed dynamics.
The posterior for Gaussians is the precision weighted mean of the prior and the observation: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), with Y being the Gaussian observation.
Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.
This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and the posterior of Gaussians as such probably are not.
What you write is simple. But your scalar model suppresses the common situation of a measurement matrix with output dimension less than state dimension. Exactly how the Kalman gain formula works under this setting I'm less clear on. Beyond that, additional insight is needed when the measurement matrix is non-linear and K = P_xy P_y^{-1} as in the UKF. At least I get stuck there, with little formal statistics work.
It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.
Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?
Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.
This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce to pole continuations of magnetic field equations, raw GPS post processing corrections, etc.
where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics
FWIW, I think I understand Kalman filters quite well, but the linked PDF is hard for me to follow, and I'd really struggle to understand it if I didn't already know what it's saying.
I think the lesson there is that the Kalman filter is simpler in the "information form" where the Gaussian distribution is parameterized using the inverse of the covariance matrix.
If you don't already know what that means, you likely don't get much out of that. I think the more intuitive way is to first understand the 1D case where the filter result is weighted average of the prediction and the observation where the weights are the multiplicative inverses of the respective variances (the less uncertainty/"inprecision", the more you give weight).
In the multidimensional case the inverse is the matrix inverse but the logic is the same.
More generally the idea is to statistically predict the next step from the previous and then balance out the prediction and the noisy observation based on the confidence you have in each. This intuition covers all Bayesian filters. The Kalman filter is a special case of the Bayesian filter where the prediction is linear and all uncertainties are Gaussian, although it was understood this way only well after Kalman invented the eponymous filter.
Not sure how intuitive that's either, but don't be too worried if these things aren't obvious, because they aren't until you know all the previous steps. To implement or use a Kalman filter you don't really need this statistical understanding.
If you prefer to understand things more "procedually", check out the particle filter. It's conceptually the Bayesian filter but doesn't require the mathematical analysis. That's the way I really understood the underlying logic.
I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.
This is more or less the approach that is taken by Dan Simon's "Optimal State Estimation" book that I came here to recommend: https://academic.csuohio.edu/simon-daniel/state-estimation/ All the prerequisites are covered prior to introducing the Kalman filter in chapter 5. Although Simon does not go through the information filter before introducing the Kalman filter, he discusses it later.
However, to understand recursive least squares, in particular the covariance matrix update you're going to need a firm grounding in probability and statistics. Simon makes the case that probability theory is a less strict pre-requisite than multiple-input-multiple-output (state space) linear systems theory (for which I can recommend Chen's "Linear System Theory and Design").
So I would argue that to understand Kalman filters you need to know state space systems modelling, both continuous time and discrete time discretisation methods (this provides the dynamics that describe the time-update step), plus you need to know enough multivariate statistics to understand how the Kalman filter propagates the gaussian random variables (i.e. the Kalman state) through the dynamics and back and forth through the measurement matrices.
That one is one of my favorite resources on Kalman filters. Deriving it from Bayesian principles make it much more intuitive. It also makes adapting or modifying the filters easier to understand.
If Q and R are constant (as is usually the case), the gain quickly converges, such that the Kalman filter is just an exponential filter with a prediction step. For many people this is a lot easier to understand, and even matches how it is typically used, where Q and R are manually tuned until it “looks good” and never changed again. Moreover, there is just one gain to manually tune instead of multiple quantities Q and R.
This is really what I have never understood about Kalman Filters. As to how you pick Q and R. Do you just adjust them until the result looks right? How does that end up working for anything not completely over-fit?
For example, if I'm tracking birds from video footage, I might choose a certain Q, but depending on the time of day the noise statistics might change. What do you do then?
As far as I am aware, there is no symbolic computing tool yet for probability distributions? For example, multiplying two multivariate Gaussian PDFs together and getting the covariance matrix out. Or defining all the ingredients for a Kalman filter (prediction model and observing process) and getting the necessary formulas out (as in sympy's lambdify).
Sympy can manipulate Gaussian distributions symbolically, but the Gaussian is more or less the only distribution that can be really symbolically manipulated.
Though I'm not sure Sympy can handle the conditional (Bayesian posterior) distribution needed for the Kalman filter.
In any case, you are better off working direcly with the mean and variance (or covariance matrix) if you want to play around with the Kalman filter with Sympy.
Something occurred to me a while back: Can we treat events that only have eyewitness testimony with a Kalman filter somehow in order to strengthen the evidential value of the observations after encoding it into vectors of some sort?
This would treat both lying and inaccuracy as "error"
I'm thinking of things like: reports of Phoenix lights or UFOs in general, ghosts, NDEs, and more prosaically, claims of rape
“Kalman filter” usually refers to “linear quadratic estimator”, which assumes a linear model in its derivation. This will impact the “predict“ step at the very least, and I think also the way the uncertainty propagates. There are nonlinear estimators as well, though they usually have less-nice guarantees (eg particle filter, extended kalman filter)
Edit: in fact, I see part three of the book in tfa is devoted to nonlinear Kalman filters. I suspect some of the crowd (myself included) just assumed we were talking about linear Kalman filters
Indeed the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) are both standard methods to deal with non-linear systems. They each provide a method of propagating uncertainty through nonlinear state and measurement functions. FWIW I would assume that EKF and UKF are within scope whenever Kalman Filtering is mentioned but I may be misinformed.
It’s probably just my bias making me think of linear systems — I’ve only used the nonlinear ones in coursework, certainly less familiar. Apologies if I misrepresented it.
The first example of tracking, is this the same thing as dead reckoning? I've always been confused on the term "tracking" since it is used a lot in common speech, but seems to mean some specific type of 'tracking'
"Tracking", here, means providing some kind of `f(time) -> space` API.
Dead reckoning is a mechanism for incorporating velocity and whatnot into a previously estimated position to estimate a new position (and is also one possible way to implement tracking, usually with compounding errors).
The Kalman filter example is better than just dead reckoning. For a simple example, imagine you're standing still but don't know exactly where. You have an API (like GPS) that can estimate your current position within some tolerance. If you're able to query that API repeatedly and the errors aren't correlated, you can pinpoint your location much more precisely.
Back to tracking with non-zero velocity, every new position estimate (e.g., from GPS) can be incorporated with all the information you've seen so far, adjusting your estimates of velocity, acceleration, and position and giving you a much more accurate current estimate but also better data for dead-reckoning estimates while you wait for your next external signal.
The technique (Kalman Filter) is pretty general. It's just merging all your noisy sources of information according to some ruleset (real-world physics being a common ruleset). You can tack on all sorts of other interesting information, like nearby wifi signals or whatever, and even very noisy signals can aggregate to give precise results.
Another application I threw it at once was estimating my true weight, glycogen reserves, ..., from a variety of noisy measurements. The sky's the limit. You just need multiple measurements and a rule for how they interact.
This is a very educational and intuitive way of putting it, but to nitpick the Kalman filter is a very special case of this (it assumes a linear ruleset and Gaussian uncertainties in the sensor readings).
What you're describing is in general the Bayesian filter (or Bayesian smoothing if you don't have to give the result immediately).
Yep, that's right. I thought about adding that detail but decided it might detract from the main points. Hopefully anyone interested also sees your comment.
Dead reckoning is a form of prediction, based on past evidence that indicates location then, you are reckoning (best guessing) a current position and detrmining a direction to move forward to reach some target.
"Past evidence that indicates" is deliberate phrasing, in the majority of these examples we are looking at acquired data with noise; errors, instrument noise, missing returns, etc.
"Tracking" is multi-stage, there's a desired target to be found (or to be declared absent) in noisy data .. that's pattern search and locking, the trajectory (the track) of that target must be best guessed, and the best guess forward prediction can be used to assist the search for the target in a new position.
This is not all that can be done with a Kalman filter but it's typical of a class of common applications.
"The filter is named after Rudolf E. Kálmán (May 19, 1930 – July 2, 2016). In 1960, Kálmán published his famous paper describing a recursive solution to the discrete-data linear filtering problem."
Perhaps to get the basics sure, but to go through all the edge cases and parameter tuning and failure modes will easily fill a small book. That's not even going into modifying Kalman filters for non-linear systems.
Kalman Filtering is machine learning. It has been used as a neural network training method.
I'm implementing real-time dual state/parameter estimators at the limits of computational tractability. Please tell me how "machine learning" can solve my problem better.
The state dynamics are linear but they are time varying and unknown. I am tracking the dynamics with a second Kalman filter (dual UKF like structure but mostly linear). Main "problem area" is covariance propagation, since the dynamics are defined by GRVs estimated by the parameter estimator but this uncertainty is not propagated between state/parameter estimators. It's a known shortcoming with standard (UKF) dual parameter+state estimation. I've yet to conclude whether it's a real or imaginary problem for my use-case.
But I agree, if a one-pole filter will solve your problem. Kalman filter is an optimal state estimator, useful when you have:
- multiple input and/or multiple output system
- non-trivial dynamics (i.e. the system of interest is in motion and you have a state space description of the dynamics)
- noisy measurements
- statistical estimates of process and measurement noise are available
- ideally, iid gaussian noises
Also good if you want to fuse data on different time scales or with different noise properties.
Always telling this whenever the topic of Kalman Filters come up:
If you're learning the Kalman Filter in isolation, you're kind of learning it backwards and missing out on huge "aha" moments that the surrounding theory can unlock.
To truly understand the Kalman Filter, you need to study Least Squares (aka linear regression), then recursive Least Squares, then the Information Filter (which is a different formulation of the KF). Then you'll realize the KF is just recursive Least Squares reformulated in a way to prioritize efficiency in the update step.
This PDF gives a concise overview:
[1] http://ais.informatik.uni-freiburg.de/teaching/ws13/mapping/...
I appreciate you taking the time to help people understand higher level concepts.
From a different perspective... I have no traditional background in mathematics or physics. I do not understand the first line of the pdf you posted nor do I understand the process for obtaining the context to understand it.
But I have intellectual curiosity. So the best path forward for me understanding is a path that can maintain that curiosity while making progress on understanding. I can reread the The Six (Not So ) Easy Pieces and not understand any of it and still find value in it. I can play with Arnold's cat and, slowly, through no scientific rigor other than the curiosity of the naked ape, I can experience these concepts that have traditionally been behind gates of context I do not possess keys to.
http://gerdbreitenbach.de/arnold_cat/cat.html
I took a look of that line and give you my thoughts. It is a lot of information in a single line.
First line is a Gaussian in some extended form. The lower case p strongly indicates a probability density functions. We are faced with moments, but I don't know if they are centralized. Classically we can integrate x p to get \mu and with x^2 p and subtracting \mu to get a \sigma for a Gaussian. So \mu and \Sigma should be constant, yet we have strong indication they are more then Skalars (single numbers) and that the capitalisation indicates a matrix form. Therefore \mu could also be a vector of similar dimension. The \operatorname{det} enforces it to be Skalar again, the 2 pi are not affected by det at all and could be moved outside. Whatever \Sigma is, it must be invertable or p is zero for all x, because of det. The uppercase T is a transpose, here of a vector, which multiplied with a matrix vector product is a dot product, therefore a Skalar again. Taking a close look at how the exponent -1 is written for \Sigma, we could assume \Sigma to be some invertable square matrix with real values, I guess.
This leads to conclude, that p is a Skalar, yet \mu and x are vectors of length N and \Sigma is a matrix with of NxN (guess).
What is omitted is, what type of numbers we are dealing with and what vector space we are playing. We may have capital letters for matrices going further in this type of notation. The notation doesn't care if a letter is a Skalar or of higher dimension.
So we have a multidimensional probability density function based on a Gaussian. For some reason one moment is a vector and another moment is a matrix.
* Imagine all that thinking from a single line. It is highly overrated, trust me.
[flagged]
> Just stop whining about it in public.
I'm curious if this is how my reply came across?
Not at all. I share your sentiment. Many topics are beyond my intellectual grasp at this time, but I’m always hopeful that my curiosity will lessen their obscurity given time and persistence.
Not at all. Your comment was perfectly fine! And that reply was way out of line...
With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.
Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true. All of this knowledge is free, and much more interesting than a formal restatement of "it was not supposed to rain, but I see clouds outside, I guess I'll expect light rain instead of a big thunderstorm".
> With no mathematical rigor there is no mathematical understanding
While I appreciate rigor to really know deep details, is noy only not a requirement for understanding, but a hurdle. A terrible insurmountable hurdle.
To first have understanding, I need some kind intuition. Some explanation that makes sense easily. That explanation is btw, what typically the inventor or discoverer had. Begin with, before nailing it down with rigor.
> With no mathematical rigor there is no mathematical understanding. You are robbing yourself, as the concepts are meaningless without the context.
I will think more about this, but I'm not sure I agree. I have enjoyed reading Feynman talk about twins and one going on a supersonic vacation without understanding the math. Verisimilitude allows a modeling of understanding with a scalar representation of scientific knowledge, so why not?
Of course I would like to understand the math in its purest forms–just the same as I wanted to read 1Q84 in Japanese to be able to fully experience it in its purest form, but my life isn't structured in a way were that is realistic even if the knowledge of the Japanese language is free.
> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.
I can't even foil so the journey toward understanding can feel unattainable in the time resources I have. This absolutely may be a limiting belief, but the concept of knowledge being free ignores the time cost for some exploring these outside of academia or professional setting.
Indeed everything has an opportunity cost, and every life has its own priorities.
Since you mention Feynman, I would like to observe that many expositors who target the lay audience have the skill of making the audience believe that they have comprehended(1) something of an intellectual world that they have no technical grounding to truly comprehend(2). In my view these are two distinct types of comprehension/understanding. So long as the audience is clear on which type of understanding they are getting, and is not wasting time unwittingly pursuing one type at the expense of the other then I see no harm.
There is a risk however, that the pop expositors will put you in a headspace where even if you are faced with accessible, but type 2, material you will not be familiar with what really constitutes understanding. As a mature age student it took me quite a few years of maths exams to switch from 1 to 2. Nowadays I am more comfortable with admitting that I don't understand some piece of math (for that is the first step on the path to learning) than being satisfied with a pop-expository gist.
I've thought a lot about this exact topic. You need both to do well.
You need handwavy and vague versions of things to understand the shape of them and to build intuition.
Then you need to test the intuition and build up levels of rigor.
Especially in the context of the Kalman Filter. I just helped a bunch of middle school students build a system for field localization and position tracking. They don't have all kinds of knowledge. They don't have linear algebra or a real understanding of something being gaussian and have to have a bazillion variables. They understand that their estimates and the quality of stuff coming off their sensors have different qualities based on circumstances, and that gain needs to vary. They'll never hit the optimum parameters.
But: their system works. They understand how it works (even if they don't know how to quantify how well it works). They understand how changing parameters changes its behavior. When they learn tracking filters and control by root locus and all kinds of things later, they'll have an edge in understanding what things mean and how it actually works. I expect their intuition will give them an easier time in tackling harder problems.
Conversely, I've encountered a bunch of students who know what "multimodal" means but couldn't name a single example in the real world of such a thing. I would argue that they don't even know what they're talking about, even if they can calculate a mixture coefficient under ideal conditions.
There's a lot of fluffly language here that isn't saying much.
Linear algebra is not something that takes years of patient study to gain basic competency. It had almost no prerequisites and can be understood enough to understand least squares in a focused weekend or two.
> Truly appreciate the power of linear approximations by going through algebra, appreciate the tricks of calculus, marvel at the inherent tradeoffs of knowledge with estimator theory, and see the joy of the central limit theorem being true.
None of these are needed, or even useful, for understanding the Kalman filter.
I think the easiest way depends on your background knowledge. If you understand linearity of the Gaussian distribution and the Bayesian posterior of Gaussians, the Kalman filter is almost trivial.
For (1D) we get the prior from the linear prediction X'1 = X0*a + b, for which mean(X'1) = mean(X0)*a + b and var(X'1) = var(X0)*a^2, where a and b give the assumed dynamics.
The posterior for Gaussians is the precision weighted mean of the prior and the observation: X1 = (1 - K)*X'1 + Y*K, where the weighting K = (1/var(X'1))/(1/var(X'1) + 1/var(Y)), with Y being the Gaussian observation.
Iterating this gives the Kalman filter. Generalizing this to multiple dimensions is straightforward given the linearity of multidimensional Gaussians.
This is how (after I understood it) it makes it really simple to me, but things like linearity of (multidimensional) Gaussians and the posterior of Gaussians as such probably are not.
What you write is simple. But your scalar model suppresses the common situation of a measurement matrix with output dimension less than state dimension. Exactly how the Kalman gain formula works under this setting I'm less clear on. Beyond that, additional insight is needed when the measurement matrix is non-linear and K = P_xy P_y^{-1} as in the UKF. At least I get stuck there, with little formal statistics work.
Indeed. I always recommend Time Series Analysis by Hamilton for this reason. KF comes up as a natural way to solve linear models.
You can keep telling this, but this “esoteric” math is often too much for the people actually implementing the filters.
It's bread and butter math for physics, Engineering (trad. Engineering), Geophysics, Signal processing etc.
Why would anyone have people implementing Kalman filters who found the math behind them "esoteric"?
Back in the day, in my wet behind the ears phase, my first time implementing a Kalman Filter from scratch, the application was to perform magnetic heading normalisation for on mag data from an airborne geophysical survey - 3 axis nanotesla sensor inputs on each wing and tail boom requiring a per survey calibration pattern to normalise the readings over a fixed location regardless of heading.
This was buried as part of a suite requiring calculation of the geomagnetic reference field (a big paramaterised spherical harmonic equation), upward, downward and reduce to pole continuations of magnetic field equations, raw GPS post processing corrections, etc.
where "etc" goes on for a shelf full of books with a dense chunk of applied mathematics
FWIW, I think I understand Kalman filters quite well, but the linked PDF is hard for me to follow, and I'd really struggle to understand it if I didn't already know what it's saying.
I think the lesson there is that the Kalman filter is simpler in the "information form" where the Gaussian distribution is parameterized using the inverse of the covariance matrix.
If you don't already know what that means, you likely don't get much out of that. I think the more intuitive way is to first understand the 1D case where the filter result is weighted average of the prediction and the observation where the weights are the multiplicative inverses of the respective variances (the less uncertainty/"inprecision", the more you give weight).
In the multidimensional case the inverse is the matrix inverse but the logic is the same.
More generally the idea is to statistically predict the next step from the previous and then balance out the prediction and the noisy observation based on the confidence you have in each. This intuition covers all Bayesian filters. The Kalman filter is a special case of the Bayesian filter where the prediction is linear and all uncertainties are Gaussian, although it was understood this way only well after Kalman invented the eponymous filter.
Not sure how intuitive that's either, but don't be too worried if these things aren't obvious, because they aren't until you know all the previous steps. To implement or use a Kalman filter you don't really need this statistical understanding.
If you prefer to understand things more "procedually", check out the particle filter. It's conceptually the Bayesian filter but doesn't require the mathematical analysis. That's the way I really understood the underlying logic.
I understood it as reestimation with a dynamic weight factor based on the perceived error factor. I know it’s more complex than that but this simplified version I needed at one point and it worked.
I found this article invaluable for understanding the Kalman filter from a Bayesian perspective:
Meinhold, Richard J., and Nozer D. Singpurwalla. 1983. "Understanding the Kalman Filter." American Statistician 37 (May): 123–27.
You are probably right, but many folks following your advice will give up halfway through and never get to KF.
This is more or less the approach that is taken by Dan Simon's "Optimal State Estimation" book that I came here to recommend: https://academic.csuohio.edu/simon-daniel/state-estimation/ All the prerequisites are covered prior to introducing the Kalman filter in chapter 5. Although Simon does not go through the information filter before introducing the Kalman filter, he discusses it later.
However, to understand recursive least squares, in particular the covariance matrix update you're going to need a firm grounding in probability and statistics. Simon makes the case that probability theory is a less strict pre-requisite than multiple-input-multiple-output (state space) linear systems theory (for which I can recommend Chen's "Linear System Theory and Design").
So I would argue that to understand Kalman filters you need to know state space systems modelling, both continuous time and discrete time discretisation methods (this provides the dynamics that describe the time-update step), plus you need to know enough multivariate statistics to understand how the Kalman filter propagates the gaussian random variables (i.e. the Kalman state) through the dynamics and back and forth through the measurement matrices.
That’s the one should learn any subject—-be it physics, chemistry, math, etc. However, textbooks don’t follow that technique.
I strongly recommend Elements of Physics by Millikan and Gale for anyone who wants to learn pre-quantum physics this way.
Are you me? I feel like I say this every time too! Perfectly captured.
Every time that one comes up, this one comes up https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt... (and vice versa)
That one is one of my favorite resources on Kalman filters. Deriving it from Bayesian principles make it much more intuitive. It also makes adapting or modifying the filters easier to understand.
Using Jupyter notebooks is really great too.
Another resource: https://news.ycombinator.com/item?id=42755133
If Q and R are constant (as is usually the case), the gain quickly converges, such that the Kalman filter is just an exponential filter with a prediction step. For many people this is a lot easier to understand, and even matches how it is typically used, where Q and R are manually tuned until it “looks good” and never changed again. Moreover, there is just one gain to manually tune instead of multiple quantities Q and R.
This is really what I have never understood about Kalman Filters. As to how you pick Q and R. Do you just adjust them until the result looks right? How does that end up working for anything not completely over-fit?
For example, if I'm tracking birds from video footage, I might choose a certain Q, but depending on the time of day the noise statistics might change. What do you do then?
As far as I am aware, there is no symbolic computing tool yet for probability distributions? For example, multiplying two multivariate Gaussian PDFs together and getting the covariance matrix out. Or defining all the ingredients for a Kalman filter (prediction model and observing process) and getting the necessary formulas out (as in sympy's lambdify).
Sympy can manipulate Gaussian distributions symbolically, but the Gaussian is more or less the only distribution that can be really symbolically manipulated.
Though I'm not sure Sympy can handle the conditional (Bayesian posterior) distribution needed for the Kalman filter.
In any case, you are better off working direcly with the mean and variance (or covariance matrix) if you want to play around with the Kalman filter with Sympy.
The posterior is also a Gaussian, is it not?
Yes.
I'm pretty sure Wolfram Mathematica can do what you're looking for.
See:
https://reference.wolfram.com/language/howto/WorkWithStatist...
and:
https://reference.wolfram.com/language/ref/MultinormalDistri...
Related. Others?
Kalman filter from the ground up - https://news.ycombinator.com/item?id=37879715 - Oct 2023 (150 comments)
(also what's the best year to put in the title above?)
Anyone else watch Michael van Biezem (with the bow tie) lectures on Kalman Filters while learning this topic?
- https://www.youtube.com/watch?v=CaCcOwJPytQ&list=PLX2gX-ftPV...
I like how he teaches so many different topics and subjects using nothing more than arithmetics. All computable math is reducible to arithmetics...
Something occurred to me a while back: Can we treat events that only have eyewitness testimony with a Kalman filter somehow in order to strengthen the evidential value of the observations after encoding it into vectors of some sort?
This would treat both lying and inaccuracy as "error"
I'm thinking of things like: reports of Phoenix lights or UFOs in general, ghosts, NDEs, and more prosaically, claims of rape
Only if you can make a linear model of those things…
Why does the model need to be linear?
“Kalman filter” usually refers to “linear quadratic estimator”, which assumes a linear model in its derivation. This will impact the “predict“ step at the very least, and I think also the way the uncertainty propagates. There are nonlinear estimators as well, though they usually have less-nice guarantees (eg particle filter, extended kalman filter)
Edit: in fact, I see part three of the book in tfa is devoted to nonlinear Kalman filters. I suspect some of the crowd (myself included) just assumed we were talking about linear Kalman filters
Indeed the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) are both standard methods to deal with non-linear systems. They each provide a method of propagating uncertainty through nonlinear state and measurement functions. FWIW I would assume that EKF and UKF are within scope whenever Kalman Filtering is mentioned but I may be misinformed.
It’s probably just my bias making me think of linear systems — I’ve only used the nonlinear ones in coursework, certainly less familiar. Apologies if I misrepresented it.
The first example of tracking, is this the same thing as dead reckoning? I've always been confused on the term "tracking" since it is used a lot in common speech, but seems to mean some specific type of 'tracking'
Kind of.
"Tracking", here, means providing some kind of `f(time) -> space` API.
Dead reckoning is a mechanism for incorporating velocity and whatnot into a previously estimated position to estimate a new position (and is also one possible way to implement tracking, usually with compounding errors).
The Kalman filter example is better than just dead reckoning. For a simple example, imagine you're standing still but don't know exactly where. You have an API (like GPS) that can estimate your current position within some tolerance. If you're able to query that API repeatedly and the errors aren't correlated, you can pinpoint your location much more precisely.
Back to tracking with non-zero velocity, every new position estimate (e.g., from GPS) can be incorporated with all the information you've seen so far, adjusting your estimates of velocity, acceleration, and position and giving you a much more accurate current estimate but also better data for dead-reckoning estimates while you wait for your next external signal.
The technique (Kalman Filter) is pretty general. It's just merging all your noisy sources of information according to some ruleset (real-world physics being a common ruleset). You can tack on all sorts of other interesting information, like nearby wifi signals or whatever, and even very noisy signals can aggregate to give precise results.
Another application I threw it at once was estimating my true weight, glycogen reserves, ..., from a variety of noisy measurements. The sky's the limit. You just need multiple measurements and a rule for how they interact.
This is a very educational and intuitive way of putting it, but to nitpick the Kalman filter is a very special case of this (it assumes a linear ruleset and Gaussian uncertainties in the sensor readings).
What you're describing is in general the Bayesian filter (or Bayesian smoothing if you don't have to give the result immediately).
Yep, that's right. I thought about adding that detail but decided it might detract from the main points. Hopefully anyone interested also sees your comment.
Dead reckoning is a form of prediction, based on past evidence that indicates location then, you are reckoning (best guessing) a current position and detrmining a direction to move forward to reach some target.
"Past evidence that indicates" is deliberate phrasing, in the majority of these examples we are looking at acquired data with noise; errors, instrument noise, missing returns, etc.
"Tracking" is multi-stage, there's a desired target to be found (or to be declared absent) in noisy data .. that's pattern search and locking, the trajectory (the track) of that target must be best guessed, and the best guess forward prediction can be used to assist the search for the target in a new position.
This is not all that can be done with a Kalman filter but it's typical of a class of common applications.
The Kalman Filter is a topic in the more general
David G.\ Luenberger, {\it Optimization by Vector Space Methods,\/} John Wiley and Sons, Inc., New York, 1969.\ \
The one sentence you really need to know:
"The filter is named after Rudolf E. Kálmán (May 19, 1930 – July 2, 2016). In 1960, Kálmán published his famous paper describing a recursive solution to the discrete-data linear filtering problem."
Not sure why, but I get this vague notion that the author might have written a book.
If it takes you a whole book to explain Kalman filtering you're doing it wrong.
Perhaps to get the basics sure, but to go through all the edge cases and parameter tuning and failure modes will easily fill a small book. That's not even going into modifying Kalman filters for non-linear systems.
[dead]
You can do all of this with machine learning better now, if you have the data.
Kalman Filtering is machine learning. It has been used as a neural network training method.
I'm implementing real-time dual state/parameter estimators at the limits of computational tractability. Please tell me how "machine learning" can solve my problem better.
You've got nonlinear things happening?
The state dynamics are linear but they are time varying and unknown. I am tracking the dynamics with a second Kalman filter (dual UKF like structure but mostly linear). Main "problem area" is covariance propagation, since the dynamics are defined by GRVs estimated by the parameter estimator but this uncertainty is not propagated between state/parameter estimators. It's a known shortcoming with standard (UKF) dual parameter+state estimation. I've yet to conclude whether it's a real or imaginary problem for my use-case.
Please, if you’re looking to denoise a signal for human input, use the 1-Euro filter instead: https://gery.casiez.net/1euro/
Easier to code, easier to tune parameters without a noise model
It's a one-pole lowpass IIR filter.
But I agree, if a one-pole filter will solve your problem. Kalman filter is an optimal state estimator, useful when you have:
Also good if you want to fuse data on different time scales or with different noise properties.What have I missed?