- Original Article
- Open Access
- Published:

# Thermodynamics of the binary symmetric channel

*Pacific Journal of Mathematics for Industry*
**volume 8**, Article number: 2 (2016)

## Abstract

We study a hidden Markov process which is the result of a transmission of the binary symmetric Markov source over the memoryless binary symmetric channel. This process has been studied extensively in Information Theory and is often used as a benchmark case for the so-called denoising algorithms. Exploiting the link between this process and the 1D Random Field Ising Model (RFIM), we are able to identify the Gibbs potential of the resulting Hidden Markov process. Moreover, we obtain a stronger bound on the memory decay rate. We conclude with a discussion on implications of our results for the development of denoising algorithms.

## Introduction

We study the binary symmetric Markov source over the memoryless binary symmetric channel. More specifically, let {*X*
_{
n
}} be a stationary two-state Markov chain with values {±1}, and

where 0<*p*<1; denote by \(P=(p_{x,x'}) =\left (\begin {array}{cc} 1-p & p \\ p& 1-p\end {array}\right)\)the corresponding transition probability matrix. Note that \(\pi =\left (\frac {1}{2},\frac {1}{2}\right)\) is the stationary initial distribution for this chain.

The binary symmetric channel will be modelled as a sequence of Bernoulli random variables {*Z*
_{
n
}} with

Finally, put

for all *n*. The process {*Y*
_{
n
}} is a Hidden Markov process, because *Y*
_{
n
}∈{−1,1} is chosen independently for any *n* from an *emission* distribution \(\pi _{X_{n}}\) on {−1,1}: *π*
_{1}=(*ε*,1−*ε*) and *π*
_{−1}=(1−*ε*,*ε*). More generally, the Hidden Markov Processes are random functions of discrete-time Markov chains, where the value *Y*
_{
n
} is chosen according to the distribution which depends on the value *X*
_{
n
}=*x*
_{
n
} of the underlying Markov chain, independently for any *n*. The applications of Hidden Markov processes include automatic character and speech recognition, information theory, statistics and bioinformatics, see [4, 13]. The particular example (1.1) we consider in the present paper is probably one of simplest examples, and often used as a benchmark for testing algorithms.

In particular, this example has been studied rather extensively in connection to computation of entropy of the output process {*Y*
_{
n
}}, see e.g., [8–10, 12].

The law \(\mathbb {Q}\) of the process {*Y*
_{
n
}} is the push-forward of \(\mathbb {P}\times \mathbb {P}_{Z}\) under \(\psi : \{-1,1\}^{\mathbb {Z}}\times \{-1,1\}^{\mathbb {Z}}\mapsto \{-1,1\}^{\mathbb {Z}}\), with *ψ*((*x*
_{
n
},*z*
_{
n
}))=*x*
_{
n
}·*z*
_{
n
}. We write \(\mathbb {Q}=(\mathbb {P}\times \mathbb {P}_{Z})\circ \psi ^{-1}\). For every *m*≤*n*, and \({y_{m}^{n}}:=(y_{m},\ldots, y_{n})\in \{-1,1\}^{n-m+1}\), the measure of the corresponding cylindric set is given by

Two particular cases are easy to analyze. If \(p=\frac {1}{2}\), then {*X*
_{
n
}} is a sequence of independent identically distributed random variables with \(\mathbb {P}(X_{n}=-1)=\mathbb {P}(X_{n}=+1)=\frac {1}{2}\), and {*Y*
_{
n
}} has the same distribution. If \(\varepsilon =\frac {1}{2}\), then the formula above implies that

and hence again, {*Y*
_{
n
}} is a sequence of independent random variables with \(\mathbb {Q}\left (Y_{n}=-1\right)=\mathbb {Q}\left (Y_{n}=+1\right)=\frac {1}{2}\).

The paper is organizes as follows. In Section 2 we exploit methods of Statistical Mechanics to derive expressions for probabilities of cylindric events (1.2). In Section 3, analyzing derived expressions, we show that the measure \(\mathbb {Q}\) has nice thermodynamic properties; in particular, it falls into the class of *g*-measures with exponential decay of variation (memory decay). We also obtain an novel estimate of the decay rate, which is stronger than estimates derived in previous works. In Section 4 we study two-sided conditional probabilities and show that \(\mathbb {Q}\) is in fact a Gibbs state in the sense of Statistical Mechanics. We also discuss well-known denoising algorithm DUDE, and suggest that the Gibbs property of \(\mathbb {Q}\) explains why DUDE performs so well in this particular example. Furthermore, we argue that the development of denoising algorithsms, relying on thermodynamic Gibbs ideas can result in a superior performance.

## Random field Ising model

It was observed in [18] that the probability \(\mathbb {Q}(y_{m},\ldots,y_{n})\) of a cylindric event {*Y*
_{
m
}=*y*
_{
m
},…,*Y*
_{
n
}=*y*
_{
n
}}, *m*≤*n*, can be expressed via a partition function of a random field Ising model. We exploit this observation further. Assume *p*,*ε*∈(0,1), and put

Then for any (*y*
_{
m
},…,*y*
_{
n
})∈{−1,1}^{n−m+1}, expression for the cylinder probability (1.2) can be rewritten as

where

The non-trivial part of the cylinder probability is the sum over all hidden configurations (*x*
_{
m
},…,*x*
_{
n
}):

is in fact the partition function of the Ising model with the signs of the external random fields given by *y*
_{
i
}’s. Applying the recursive method of [14], the partition function can be evaluated in the following fashion [1]. Consider the following functions

One readily checks that if *s*=±1, then for all \(w\in \mathbb {R}\)

Now the partition function can be evaluated by summing the right-most spin. Namely, suppose *m*<*n*, \({y_{m}^{n}}\in \{-1,1\}^{n-m+1}\), then

where

Hence,

and thus the new sum has exactly the same form, but instead of *Ky*
_{
n−1}, we now have \(w_{n-1}^{(n)}=Ky_{n-1}+A\left (w_{n}^{(n)}\right)\). Continuing the summation over the remaining right-most *x*-spins, one gets

where

equivalently, since *A*(0)=0, we can define

Therefore, we obtain the following expressions for the cylinder and conditional probabilities

## Thermodynamic formalism

Let \(\Omega =A^{\mathbb Z_{+}}\), where *A* is a finite alphabet, be the space of one-sided infinite sequences ** ω**=(

*ω*

_{0},

*ω*

_{1},…) in alphabet

*A*(

*ω*

_{ i }∈

*A*for all

*i*). We equip

*Ω*with the metric

where \(k\left (\boldsymbol {\omega },\boldsymbol {\tilde {\omega }}\right)=1\) if \(\omega _{0}\ne \tilde \omega _{0}\), and \(k\left (\boldsymbol {\omega },\boldsymbol {\tilde {\omega }}\right) = \max \{k\in \mathbb N: \omega _{i}=\tilde \omega _{i}\ \quad \forall i=0,\ldots,k-1\}\), otherwise. Denote by *S*:*Ω*→*Ω* the left shift:

Borel probability measure \(\mathbb {P}\) is translation invariant if

for any Borel event *C*⊆*Ω*.

Let us recall the following well-known definitions:

###
**Definition 3.1.**

Suppose \(\mathbb {P}\) is a fully supported translation invariant measure on \(\Omega =A^{\mathbb Z_{+}}\), where *A* is a finite alphabet.

(i) The measure \(\mathbb {P}\) is called a *g*
** -measure**, if for some positive continuous function

*g*:

*Ω*→(0,1) satisfying the normalization condition

for all ** ω**=(

*ω*

_{0},

*ω*

_{1},…)∈

*Ω*, one has

for \(\mathbb {P}\)-a.a. ** ω**∈

*Ω*.

(ii) The measure \(\mathbb {P}\) is ** Bowen-Gibbs** for a continuous potential \(\phi :\Omega \to \mathbb {R}\), if there exist constants \(P\in \mathbb {R}\) and

*C*≥1 such that for all

**ω**∈

*Ω*and every \(n\in \mathbb N\)

where \((S_{n}\phi)(\boldsymbol {\omega })=\sum _{k=0}^{n-1} \phi (S^{k} \boldsymbol {\omega })\).

(iii) The measure \(\mathbb {P}\) is called an ** equilibrium state** for continuous potential \(\phi :\Omega \to \mathbb {R}\), if \(\mathbb {P}\) attains maximum of the following functional

where \(h(\mathbb {P})\) is the Kolmogorov-Sinai entropy of \(\mathbb {P}\) and the supremum is taken over the set \(\mathcal M_{1}^{*}(\Omega)\) of all translation invariant Borel probability measures on *Ω*.

It is known that every *g*-measure \(\mathbb {P}\) is also an equilibrium state for log*g*; and that every Bowen-Gibbs measure \(\mathbb {P}\) for potential *ϕ* is an equilibrium state for *ϕ* as well.

###
**Theorem 3.1.**

Suppose *p*,*ε*∈(0,1). Then the measure \(\mathbb {Q}=\mathbb {Q}_{p,\varepsilon }\) on \(\{-1,1\}^{\mathbb {Z}_{+}}\) (c.f., (2.2)) is a *g*-measure. Moreover, the corresponding function *g* has *exponential decay of variation:* define the *n*-th variation of *g* by

then

Furthermore, *ρ*(*p*,*ε*)=0 if \(p=\frac {1}{2}\) or \(\varepsilon =\frac {1}{2}\); for all \(p\ne \frac {1}{2}\)

Finally, the measure \(\mathbb {Q}\) is also a Bowen-Gibbs measure for a Hölder continuous potential \(\phi :\{-1,1\}^{\mathbb {Z}_{+}}\to \mathbb R\).

The result of Theorem 3.1 is actually true in much greater generality: namely, for distributions of functions of Markov chains {*Y*
_{
n
}}, where the underlying Markov chain {*X*
_{
n
}} has strictly positive transition probability matrix *P*, see [15] for review of several results of this nature. However, example considered in the present paper is rather exceptional since one is able to identify the *g*-function and the Gibbs potential *ϕ*
*explicitly*. Another interesting question is the estimate of the decay rate *ρ*. In [15] a number of previously known estimates of the rate of exponential decay in (3.3) have been compared; the best known estimate for *ρ*

is due to [8] and [7]. Quite surprisingly this estimate does not depend on *ε*, and in fact, it was conjectured in [15] that the estimate could be improved, e.g., by incorporating dependence on *ε*. The proof of Theorem 3.1 shows that this is indeed the case and one obtains sharper estimate (3.3).

Let us start by introducing some notation and proving a technical result. Suppose *p*,*ε*∈(0,1), and \(p\ne \frac {1}{2}\) and \(\varepsilon \ne \frac {1}{2}\). Fix \(\boldsymbol {y}=(y_{0},y_{1},\ldots)\in \{-1,1\}^{\mathbb {Z}_{+}}\). For every \(n\in \mathbb {Z}_{+}\), define the sequence \(w^{(n)}_{i}=w^{(n)}_{i}(\boldsymbol {y})\), \(i\in \mathbb {Z}_{+}\), as follows:

If we introduce maps *F*
_{−1}, \(F_{1}:\mathbb {R}\to \mathbb {R}\), given by

then for *i*≤*n*,

As we will show the maps *F*
_{−1}, *F*
_{1} are strict contractions, and as a result, for every *i*, the sequence \(\left \{w_{i}^{(n)}\right \}\) is converging as *n*→*∞*; in fact, with exponential speed.

###
**Lemma 3.2.**

For every \(i\in \mathbb {Z}_{+}\) and all \(\boldsymbol {y}\in \{-1,1\}^{\mathbb {Z}_{+}}\) one has

Moreover, there exist constants *ϱ*∈(0,1) and *C*>0, both independent of ** y**, such that

for all *n*≥*i*. Furthermore, *w*
_{
i
}(** y**)=

*w*

_{0}(

*S*

^{i}

**) for all \(i\in \mathbb {Z}_{+}\) and**

*y***, and \(w_{0}:\{-1,1\}^{\mathbb {Z}_{+}}\to \mathbb {R}\) (and hence every**

*y**w*

_{ i }) is Hölder continuous

for some *C*
^{′},*θ*>0 and all \( \boldsymbol {y},\boldsymbol {\tilde {y}}\in \{-1,1\}^{\mathbb {Z}_{+}}\).

###
*Proof.*

Suppose *i*≤*n*≤*m*. Then

One has

and hence

Combined with the fact that for all \(i\in \mathbb {Z}_{+}\)

one therefore concludes that for *i*≤*n*≤*m*

Hence, \({\lim }_{n\to \infty } w_{i}^{(n)}=:w_{i}\) exists and

From representation (3.4) it is clear that for *n*≥*i*,

and hence *w*
_{
i
}(** y**)=

*w*

_{0}(

*S*

^{i}

**).**

*y*Suppose \(\boldsymbol {y}=(y_{0},y_{1},\cdots),\boldsymbol {\tilde {y}}=(\tilde y_{0},\tilde y_{1},\cdots)\in \{-1,1\}^{\mathbb {Z}_{+}}\) are such that \(d(\boldsymbol {y},\boldsymbol {\tilde {y}})=2^{-k}\) for some \(k\in \mathbb N\), i.e., \(y_{0}= \tilde y_{0}\), …, \(y_{k-1}=\tilde y_{k-1}\). Then

and hence \(w_{0}:\{-1,1\}\to \mathbb {R}\) is Hölder continuous. □

The estimate of a contraction rate in the Lemma above can be improved. If \(p=\frac {1}{2}\), then *A*(*w*)≡0, and hence \(w_{i}^{(n)}\equiv Ky_{i}\) for all *n*≥*i*. We may assume that \(p\ne \frac {1}{2}\). Let us also assume that \(\varepsilon \ne \frac {1}{2}\), i.e., *K*≠0.

Let us now consider second iterations:

We are going to show that for \(p\ne \frac {1}{2}\) and all \(\varepsilon \ne \frac {1}{2}\), one has

Firstly, note that

where *α*=(1−*p*)^{2}+*p*
^{2}, 1−*α*=2*p*(1−*p*). Let *Δ*>0 be sufficiently small so that for all *w*∈[−*Δ*,*Δ*] one has cosh(2*K*+2*A*(*w*))> cosh(*K*), and hence for all *w*∈ [−*Δ*,*Δ*]

For *w*∉[−*Δ*,*Δ*], one has

Hence,

and thus (3.5) holds for \(\bar \varrho =\sqrt {\rho ^{(2)}}<|1-2p|\) and some constant \(\tilde C>0\). In particular, we are now able conclude that

Even sharper bounds can be achieved by studying minimum of the denominator in (3.8) or higher interates of *F*’s.

###
*Proof of Theorem.*

The cases \(p=\frac {1}{2}\) or \(\varepsilon =\frac {1}{2}\) are obvious: in these cases, \(\mathbb {Q}\) is the Bernoulli \(\left (\frac {1}{2}, \frac {1}{2}\right)\)-measure on \(\{-1,1\}^{\mathbb {Z}_{+}}\), and hence *ρ*(*p*,*ε*)=0. Thus we will assume that \(p,\varepsilon \ne \frac {1}{2}\).

To show that \(\mathbb {Q}\) is a *g*-measure it is sufficient to show that conditional probabilities \(\mathbb {Q}\left (y_{0}|{y_{1}^{n}}\right)\) converge uniformly as *n*→*∞*. Given that

and using the result of Lemma 3.2: \( w_{i}^{(n)}(\boldsymbol {y}) \rightrightarrows w_{i}(\boldsymbol {y})\) as *n*→*∞*, we obtain uniform convergence of conditional probabilities, and hence, \(\mathbb {Q}\) is a *g*-measure with *g* given by

Taking into account that *w*
_{0}, *w*
_{1}=*w*
_{0}∘*S* are Hölder continuous functions satisfying (3.9), and that cosh, exp, and *B* are smooth functions, we can conclude that *g* is also Hölder continuous with the same decay of variation:

for some *C*
_{3}>0 (*C*
_{4}=*C*
_{2}
*C*
_{3}, c.f., (3.9)), and hence

Let us introduce the following functions: for \(\boldsymbol {y}\in \{-1,1\}^{\mathbb {Z}_{+}}\), put

Taking into account that *w*
_{1}(** y**)=

*w*

_{0}(

*S*

**), one has**

*y*Since every *g*-measure is also an equilibrium state for log*g*, we conclude that \(\mathbb {Q}\) is an equilibrium state for

The difference \(\tilde \phi (\boldsymbol {y})-\phi (\boldsymbol {y})\) has a very special form: it is a sum of a so-called coboundary (log*h*(** y**)− log

*h*(

*S*

**)) and a constant (− log**

*y**λ*

_{ J,K }). Two potentials, whose difference is of a such form, have identical sets of equilibrium states. The reason is that for any translation invariant measure \(\mathbb {Q}'\) one has

Therefore, if \(\mathbb {Q}'\) achieves maximum in the righthand side of (3.1) for \(\tilde \phi \), then \(\mathbb {Q}'\) achieves maximum for *ϕ* as well. Thus \(\mathbb {Q}\) is also an equilibrium state for

Any equilibrium measure for a Hölder continuous potential *ϕ* is also a Bowen-Gibbs measure [3]. In our particular case, direct proof of the Bowen-Gibbs property for \(\mathbb {Q}\) is straightforward. Indeed, using the result of (2.2) and the notation introduced above, for every ** y**=(

*y*

_{0},

*y*

_{1},…) one has

Therefore, for *P*= log*λ*
_{
J,K
},

It only remains to demonstrate that the right hand side is uniformly bounded (both in *n* and ** y**=(

*y*

_{0},

*y*

_{1},…)) from below and above by some positive constants \(\underline C,\overline C\), respectively. Indeed, since

*p*,

*ε*∈(0,1),

*I*=[−|

*K*|−|

*J*|,|

*K*|+|

*J*|] is a finite interval, by the result of the previous Lemma, \(w_{i}^{(n)}(\boldsymbol {y})\in I\) for all

*i*and

*n*. Using (3.5), one readily checks that the following choice of constants suffices:

□

We complete this section with a curious continued fraction representation of the *g*-function (3.11).

###
**Proposition 3.3.**

For every \(\boldsymbol {y} =(y_{0},y_{1},\ldots)\in \{-1,1\}^{\mathbb {Z}_{+}}\), one has

where for *i*≥1

###
*Proof.*

Using elementary transformations, one can show that for every \(\boldsymbol {y}=(y_{0},y_{1},\ldots)\in \{-1,1\}^{\mathbb {Z}_{+}}\) one has

*Since for all w*
\(\in \mathbb {R}\)

for every \(i\in \mathbb {Z}_{+}\), one has

Therefore, if we let *z*
_{
i
}=(1−2*p*)(1−2*ε*)*y*
_{
i−1} tanh(*w*
_{
i
}), \(i\in \mathbb N\), then

Since \(g(\boldsymbol {y}) =\frac {1}{2} +\frac {1}{2}z_{1}\), we obtain the continued fraction expansion (3.12). □

## Two-sided conditional probabilities and denoising

In the previous section we established that \(\mathbb {Q}\) is a Bowen-Gibbs measure. The notion of a Gibbs measure originates in Statistical Mechanics, and is not equivalent to the Bowen-Gibbs definition. In Statistical Mechanics, one is interested in ** two-sided** conditional probabilities

The method of Section 2 can be used to evaluate conditional probabilities \(\mathbb {Q}\left (y_{0}|y_{-m}^{-1},{y_{1}^{n}}\right)\), *m,n* > 0 for \(\boldsymbol {y}=(\ldots,y_{-1},y_{0},y_{1},\ldots)\in \{-1,1\}^{\mathbb {Z}}\). Indeed,

where \(\bar y_{0}=-y_{0}\). We can evaluate

by first summing over spins on the right: *x*
_{
n
},…,*x*
_{1}, and then summing over spins on the left: *x*
_{−m
},…,*x*
_{−1}. One has

where now \(w_{-m}^{(-m)}=Ky_{-m}\),

and

Therefore,

Again, given this expression, one easily establishes uniform convergence and existence of the limits,

Thus the two sided conditional probabilities are also regular, c.f. Theorem 3.1.

### 4.1 Denoising

Reconstruction of signals corrupted by noise during the transmission is one of the classical problems in Information Theory. Suppose we observe a sequence {*y*
_{
n
}}, *n*=1,…,*N*, given by (1.1), i.e.,

where {*x*
_{
n
}} is some unknown realisation of the Markov chain, and {*z*
_{
n
}} is unknown realisation of the Bernoulli sequence {*Z*
_{
n
}}. The natural question is, given the observed data *y*
^{N}=(*y*
_{1},…,*y*
_{
N
}), what is the optimal choice of \(\hat X_{n}=\hat X_{n}\left (y^{N}\right)\) – the estimate of *X*
_{
n
}, such that the empirical zero-one loss (bit error rate)

is minimal. The corresponding standard maximum a posteriori probability (MAP) estimator (denoiser) is given by

In case, parameters of the Markov chain (i.e., *P*) and of the channel (i.e., *Π*) are known, conditional probabilities \(\mathbb {P}\left [X_{n}= x\,|\,Y^{N}=y^{N}\right ]\) can be found using the ** backward-forward algorithm**. Namely, one has

where

are the so-called forward and backward variables, satisfying simple recurrence relations:

The key observation of [16] is that the probability distribution \(\mathbb {P}\left [X_{n}=\cdot \ | Y^{N}=y^{N}\right ]\), viewed as a column vector, can be expressed in terms of two-sided conditional probabilities \(\mathbb {Q}\left [Y_{n}=\cdot \ | Y^{N\setminus n} = y^{N\setminus n}\right ]\), with *N*∖*n*={1,…,*N*}∖{*n*}, as follows

where *Π* is the emission matrix, and *π*
_{−1},*π*
_{1} are the columns of *Π*:

and ⊙ is componentwise product of vectors of equal lengths,

Expression (4.2) opens a possibility of constructing denoisers when parameters of the underlying Markov chains are unknown; we continue to assume that the channel remains known. Indeed, two-sided conditional probabilities \( \mathbb {Q}\left [Y_{n}=\cdot \ | Y^{N\setminus n} = y^{N\setminus n}\right ]\) could be estimated from the data. The Discrete Universal Denoiser algorithm (DUDE) [16] estimates conditional probabilities

where \(m\left (a_{-k_{N}}^{-1},c,b_{1}^{k_{N}}\right)\) is the number of occurrences of the word \(a_{-k_{N}}^{-1}cb_{1}^{k_{N}}\) in the observed sequence *y*
^{N}=(*y*
_{1},…,*y*
_{
N
}); the length of right and left contexts is set to *k*
_{
N
}=*c* log*N*, *c*>0.

The DUDE has shown excellent performance in a number of test cases. In particular, in case of the binary memoryless channel and the symmetric Markov chain, considered in this paper, performance in comparable to the one of the backward-forward algorithm (4.1), which requires full knowledge of the source distribution, while DUDE is completely oblivious in that respect. In our opinion, the excellent performance of DUDE in this case is partially due to the fact that \(\mathbb {Q}\) is a Gibbs measure, admitting smooth two-sided conditional probabilities, which are well approximated by (4.3) and thus can be estimated from the data. It will be interesting to evaluate performance in cases when the output measure is not Gibbs.

Invention of DUDE sparked a great interest in two-sided approaches to information-theoretic problems. It turns out that despite the fact the efficient algorithms for estimation of one-sided models exist, the analogous two-sided problem is substantially more difficult. As alternatives to (4.3), other methods to estimate two-sided conditional probabilities have been suggested, e.g., [6, 11, 17]. For example, Yu and Verdú [17] proposed a ** Backward-Forward Product** (BFP) model:

and the one-sided conditional probabilities \(\mathbb {\widetilde {Q}}(y_{0}|y_{<0})\), \( \mathbb {\widetilde {Q}}(y_{0}|y_{>0})\) can be estimated using standard one-sided algorithms. Note, that in our model,

in general does not coincide with

Nevertheless, the BFP model seems to perform extremely well [17].

Among other alternatives, let us mention the possibility to extend standard one-sided algorithms to produce algorithms for estimating two-sided conditional probabilities from data. This approach is investigated in a joint work with S. Berghout, where the denoising performance of the resulting Gibbsian models is evaluated. Gibbsian algorithm performs better than DUDE: bit error rates are given in the table below for noise level *ε*=0.2 and various values of *p* (smaller rates are better).

p
| Gibbs | DUDE |
---|---|---|

0.05 | 5.30 % | 5.58 % |

0.10 | 9.91 % | 10.48 % |

0.15 | 13.20 % | 13.77 % |

0.20 | 18.34 % | 18.77 % |

One could also try to estimate the Gibbsian potential directly, e.g., using the estimation procedure proposed in [5]. This method showed promising performance in experiments on language classification and authorship attribution. In conclusion, let us also mention that the direct two-sided Gibbs modeling of stochastic processes opens possibilities for applying semi-parametric statistical procedures, as opposed to the universal (parameter free) approach of DUDE.

## References

- 1
Behn, U, Zagrebnov, VA: One-dimensional Markovian-field Ising model: physical properties and characteristics of the discrete stochastic mapping. J. Phys. A. 21(9), 2151–2165 (1988). ISSN:0305-4470, MR952930 (89j:82024).

- 2
Berghout, S, Verbitskiy, E: On bi-directional modeling of information sources (2015).

- 3
Bowen, R: Some systems with unique equilibrium states. Math. Systems Theory. 8(3), 193–202. (1974/75). ISSN:0025-5661, MR0399413 (53 #3257).

- 4
Ephraim, Y, Merhav, N: Hidden Markov processes, Special issue on Shannon theory: perspective, trends, and applications. IEEE Trans. Inform. Theory. 48(6), 1518–1569 (2002). ISSN:0018-9448, MR1909472 (2003f:94024), doi:10.1109/TIT.2002.1003838.

- 5
Ermolaev, V, Verbitskiy, E: Thermodynamic Gibbs Formalism and Information Theory. In:

*The Impact of Applications on Mathematics (M. Wakayama et al, ed.), Mathematics for Industry*, pp. 349–362. Springer Japan (2014). - 6
Fernandez, F, Viola, A, Weinberger, MJ: Efficient Algorithms for Constructing Optimal Bi-directional Context Sets. Data Compression Conference (DCC), 179–188 (2010). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5453460. doi:10.1109/DCC.2010.23.

- 7
Fernández, R, Ferrari, PA, Galves, A: Coupling renewal and perfect simulation of chains of infinite order, Ubatuba (2001).

- 8
Hochwald, BM, Jelenkovic, PŔ: State learning and mixing in entropy of hidden Markov processes and the Gilbert-Elliott channel. IEEE Trans. Inform. Theory. 45(1), 128–138 (1999). ISSN:0018-9448, MR1677853 (99k:94028).

- 9
Jacquet, P, Seroussi, G, Szpankowski, W: On the entropy of a hidden Markov process. Theoret Comput. Sci. 395(2–3), 203–219 (2008).

- 10
Ordentlich, E, Weissman, T: Approximations for the entropy rate of a hidden Markov process. In:

*Proceedings International Symposium on Information Theory 2005*, pp. 2198–2202 (2005). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1523737. doi:10.1109/ISIT.2005.1523737. - 11
Ordentlich, E, Weinberger, MJ, Weissman, T: Multi-directional context sets with applications to universal denoising and compression. In:

*ISIT 2005. Proceedings International Symposium on Information Theory. pp. 1270–1274*(2005). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1523546. doi:10.1109/ISIT.2005.1523546. - 12
Pollicott M: Computing entropy rates for hidden Markov processes. In:

*Entropy of hidden Markov processes and connections to dynamical systems, London Math. Soc. Lecture Note Ser*, pp. 223–245 (2011). Cambridge Univ. Press, Cambridge MR2866670 (2012i:37010). - 13
Rabiner, LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 77, 257–286 (1989).

- 14
Ruján, P: Calculation of the free energy of Ising systems by a recursion method. Physica A: Stat Theoretical Phys. 91(3–4), 549–562 (1978).

- 15
Verbitskiy, EA: Thermodynamics of Hidden Markov Processes. In: Markus, B, Petersen, K, Weissman, T (eds.)

*Papers from the Banff International Research Station Workshop on Entropy of Hidden Markov Processes and Connections to Dynamical Systems*, pp. 258–272. London Mathematical Society, Lecture Note Series (2011). http://dx.doi.org/10.1017/CBO9780511819407.010. - 16
Weissman, T, Ordentlich, E, Seroussi, G, Verdú, S, Weinberger, MJ: Universal discrete denoising: known channel, IEEE Trans. Inform. Theory. 51(1), 5–28 (2005). ISSN: 0018-9448 MR2234569 (2008h:94036).

- 17
Yu, J, Verdú, S: Schemes for bidirectional modeling of discrete stationary sources. IEEE Trans. Inform. Theory. 52(11), 4789–4807 (2006). ISSN:0018-9448, MR2300356 (2007m:94144).

- 18
Zuk, O, Domany, E, Kanter, I, Aizenman, M: From Finite-System Entropy to Entropy Rate for a Hidden Markov Process. IEEE Sig Proc. Letters. 13(9), 517–520 (2006).

## Acknowledgments

Part of the work described in this paper has been completed during author’s visit to the Institute of Mathematics for Industry, Kyushu University. The author is grateful for the hospitality during his stay and the support of the World Premier International Researcher Invitation Program.

## Author information

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Hidden Markov models
- Gibbs states
- Thermodynamic formalism
- Denoising

### Mathematics Subject Classification

- 37D35
- 82B20
- 82B20