HMMs | ecohmm

What is an HMM?

Hidden Markov models (HMMs) are stochastic time series models which comprise two processes:

an observed time series X_1,X_2,...,X_T (often called the "state-dependent" process) and
a hidden (or "latent") state process S_1,S_2,...,S_T taking on finitely many (usually 2 or 3) possible values.

The indices here refer to the different time points at which observations are made. These time points in most instances will lie on a regular grid in time, i.e. the time intervals between observations will be of equal length (otherwise HMMs may not be suitable for analysing the data!).

The basic HMM formulation involves two key assumptions: (i) the probability of being in any particular state at any time t depends only on which state was active at time t-1 (the so-called "Markov property") and (ii) the distribution of the observation at each time t, X_t, is completely determined by the state active at time t, S_t.

Assumption (i) implies that there is correlation in the sequence of states, usually such that there is a tendency to remain in a state for some time before switching to a different state. Think of an animal that exhibits a certain behaviour for some time, say resting, before switching to a different behavioural mode, say exploring.

Furthermore, (ii) implies that each observation X_t is generated by one of finitely many (usually 2 or 3) distributions as chosen by the state S_t. For example, within in a resting state the movement speed of an animal could be described by a distribution with very small mean and low variance, whereas in an exploring state the corresponding distribution will have a relatively large mean and possibly also a larger variance. Thus, HMMs are mixture models, where several distinct distributions are used to account for the fact that the animal's different behavioural modes lead to observations of different magnitudes.

What makes HMMs interesting for ecologists?

HMMs are good models for many ecological data primarily because they often constitute a very natural and intuitive approach, especially when the states can be interpreted as corresponding roughly to biologically meaningful entities, e.g. behavioural states or survival states. In such cases the HMM can usually be regarded as a good representation of the biological reality in terms of a fairly simple mathematical model. This then opens up the way for various types of biologically interesting inference to be drawn, including e.g. the effect of environmental conditions on animal behaviour (similar to what is done in regression analyses, but taking the temporal structure of the data into account).

Secondly, despite their relatively complex structure, HMMs turn out to be surprisingly easy to handle and are computationally feasible in almost all of the examples we're dealing with on a regular basis, rendering them convenient and practical tools even for very long and possibly multivariate time series as nowadays routinely generated by sensors attached to animals. With the R package moveHMM (developed specifically for animal movement modelling – see Software), HMMs can be fitted to hundreds of thousands of data points usually in less than an hour. HMMs are also immensely versatile and can be extended in various ways to account for all sorts of relevant patterns that may arise in ecological time series data.