Fluctuation Theorems with Retrodiction rather than Reverse Processes

Irreversibility is usually captured by a comparison between the process that happens and a corresponding"reverse process". In the last decades, this comparison has been extensively studied through fluctuation relations. Here we revisit fluctuation relations from the standpoint, suggested decades ago by Watanabe, that the comparison should involve the prediction and the retrodiction on the unique process, rather than two processes. We identify a necessary and sufficient condition for a retrodictive reading of a fluctuation relation. The retrodictive narrative also brings to the fore the possibility of deriving fluctuation relations based on various statistical divergences, and clarifies some of the traditional assumptions as arising from the choice of a reference prior.

Irreversibility is usually captured by a comparison between the process that happens and a corresponding "reverse process". In the last decades, this comparison has been extensively studied through fluctuation relations. Here we revisit fluctuation relations from the standpoint, suggested decades ago by Watanabe, that the comparison should involve the prediction and the retrodiction on the unique process, rather than two processes. We identify a necessary and sufficient condition for a retrodictive reading of a fluctuation relation. The retrodictive narrative also brings to the fore the possibility of deriving fluctuation relations based on various statistical divergences, and clarifies some of the traditional assumptions as arising from the choice of a reference prior.

I. PROCESSES VERSUS INFERENCES
Quantitative approaches to irreversibility traditionally involve a comparison between the process physically happening, usually called forward process, and a corresponding reverse (or backward) process. The definition of the latter is intuitive only for some processes: somewhat ironically, not for those that are paradigmatic of irreversibility. Indeed, consider the erasure channel, which sends every possible input state to a unique, fixed output state: what should one take as its reverse process?
In a previous paper 1 , two of us proposed to look at irreversibility as arising out of our logical inference, rather than out of physical processes. Specifically, we proposed to define the reverse process in terms of Bayesian retrodiction. This is a universal recipe. This retrodictive element can be identified a posteriori in all previously reported fluctuation relations that we checked, including the most famous ones, both classical 2-5 and quantum 6 , that are highlighted in the many available reviews [7][8][9][10] . Besides recovering "intuitive" reverse processes, retrodiction provides a definition for the nonintuitive ones, which smoothly removes anomalies that were reported with other tentative definitions. Thus it seems plausible that all fluctuation relations can be understood in terms of retrodiction (though the literature is too vast and sparse to make a definitive call, we shall strengthen the evidence with Result 3 below).
In the pursuit of this line of research, we recognize that our previous paper was not radical enough. If the retrodictive origin of irreversibility is assumed, the narrative of the two processes becomes superfluous: using retrodiction to define a reverse process is an unnecessary step. There is only one process, the one that happens; what is being compared are our forward and backward inferences on it: prediction and retrodiction.
The replacement of irreversibility with irretrodictability was pioneered by Watanabe 11,12 , though prior to our previous work no connection had been drawn with the fluctuation theorems derived in the last twenty years. Under this change of viewpoint, it is the same physics that is being described, freed from an excess baggage in the narrative (and thus, possibly, on the interpretation). Besides epistemological economy, we are going to show that this viewpoint is fruitful as it opens previously unnoticed vistas.
The plan of the paper is as follows. In Section II we present a self-contained introduction to retrodiction, both classical and quantum; and Section III describes two case studies in detail. Section IV deals with fluctuation relations: we show that these relations are intimately related with statistical distances ("divergences") and that Bayesian retrodiction arises from the requirement that the fluctuating variable can be computed locally. We also compare the fluctuation relations obtained in the retrodictive narrative with those obtained in the reverseprocess narrative. Section V reflects back on the structure of retrodiction, elaborating on the role of the unavoidable reference prior.
A word on the presentation. This paper covers topics from statistics, thermodynamics, and quantum information. We have tried to keep the presentation self-contained. We have also adopted a compact approach to references: besides those that prove specific results, we shall cite mostly reference books and reviews, and occasionally a few works that we consider clear and exemplary, useful as entry points for the reader, without any expectation of being exhaustive.

II. RETRODICTION: GENERALITIES
In this paper, we consider processes with discrete alphabets. The input of the channel is denoted x ∈ {1, 2, ..., d x } and the output y ∈ {1, 2, ..., d y }. We shall always have in mind d x = d y = d, keeping the notation different only when clarity demands it.
Also, throughout the paper, we assume that all probability vectors have strictly positive elements; and also all channels have only strictly positive entries (with the exception of the permutation channels studied in subsection III A). Arbitrarily small entries would be indistinguishable from an exact zero, certainly in practice, and perhaps also in principle depending on one's understanding of probabilities.

A. Bayesian retrodiction on classical information
As basic setting of retrodiction, consider the most elementary form of statistical inference: at the output of a known channel ϕ(y|x), one observes the outcome y = y * , and wants to infer something about the input x. In this paper, we focus on Bayesian retrodiction, whose goal is to update one's belief on the distribution of x. This requires a prior belief, the reference prior, denoted ξ (x). The total prior knowledge is therefore captured by the joint probability distribution P ξ (x, y) = ξ (x)ϕ(y|x); in particular, the prior knowledge about y isξ (y) = ∑ x ξ (x)ϕ(y|x). When the knowledge on y is updated to y = y * , one performs the Bayes' update on the total knowledge, whence the updated knowledge on x follows as P ξ (x|y * ) = ξ (x)ϕ(y * |x)/ξ (y * ). This is the most elementary example of retrodiction. Slightly less basic, though also widely discussed in the statistical literature, is the retrodiction on x based on "soft evidence" on y. This refers to the situation, in which the update on y is not a sharp value y = y * , but a distribution u(y). In real life, soft evidence may arise by sheer uncertainty (e.g. reading the outcome y in very dim light) or by virtual evidence (e.g. the doctor told me a definite result z = z * for my test, but I saw that he was tired and fear that he may have misread the actual result y written on the sheet). The translation of such uncertainties into a quantitative u(y) is not trivial 13,14 , but we take it for granted. For such situations, Bayes' update (1) is generalised to Jeffrey's update 15 In the case of virtual evidence, Jeffrey's update is a direct consequence 16,17 of Bayes' update starting from z = z * , under the assumption that the variable z influences directly only y and not x (c.f. the example above: the tiredness of the doctor has no direct influence on whether I am actually sick). In other cases, it may be considered as an actual addition to the rules of Bayesian inference (this was Jeffrey's own view). Thus, the conditional probability P ξ (x|y) plays the role of channel for the retrodiction, in short retrodiction channel. For the remainder of the paper, we change the notation tô We shall make use at our convenience of a matrix representation. The channel ϕ(y|x) is represented by the columnstochastic matrix Similarly, the retrodiction channelφ ξ (x|y) is represented by the column stochastic matrix In this notation, we similarly define inputs and outputs distributions p(x) as column vectors v p . For instance, the relations that define the reference prior can be written as

B. Two remarks
Before continuing, we bring up two crucial remarks. The first is about the reference prior. It is well known 11,12,17 , and our presentation above leaves it clear once again, that this element of subjectivity is an unavoidable feature of Bayesian retrodiction for a generic channel. The question of the choice of the prior is a recurring topic in Bayesian statistics. The literature on fluctuation relations does not mention it as such, the assumption being stated in more physical language. We shall get back to this point in Section V. Here, for the sake of definiteness we just mention two possible choices. One is the uniform prior ξ (x) = 1 d for all x. Another is the steady state of ϕ, defined by γ =γ. Every stochastic map has at least one steady state, and exactly one if all its entries are strictly positive. It follows immediately fromφ γ (x|y) = γ(x) γ(y) ϕ(y|x) that the uniform prior is a steady state if and only if ϕ is bistochastic.
The second remark, that also others felt the need to highlight 18 , is that retrodiction is not inversion. A channel has a linear inverse if there exists M such that MM ϕ = 1. In the case of an invertible channel, given a valid output distributionp(y) = ∑ x ϕ(y|x)p(x), one is able to recover the input distribution p(x). But for most invertible channels, M is not a channel itself: there exist u(y) such that v u = M ϕ v p . In particular, since the image of the probability simplex by M ϕ is convex, there exists y * such that no input distribution p(x) is mapped by M ϕ to δ y,y * -while retrodicting from δ y,y * is the most basic example of Bayesian inference. Ultimately of course the difference is in the task: retrodiction does not aim at reconstructing the prior through repeated sampling, but at updating one's belief after a single run of the process 19 .
In subsection III A, we shall see a remarkable coincidence: the channels for which the retrodiction channel coincides with the inverse, and those for which the retrodiction channel is independent of the reference prior, are exactly the same.
C. Retrodiction on "quantum-inside" classical channels According to our current knowledge, the most general description of the inner working of any classical input-output channel is given by quantum theory. The "quantum-inside" description of a classical channel is as follows (Fig.1). The classical input x prepares a system in a state ρ x . The system is then sent through a quantum channel (a completely positive, trace-preserving, CPTP map) E , and eventually measured with the positive operator-valued measure (POVM) {Π y }, leading to the classical outcome y. All in all: We want to derive the quantum description of the associated classical retrodiction channel (3): that is, finding statesρ y , a CPTP mapÊ , and a POVM {Π x } x , such that For this, we first need to define the adjoint E † of the channel, that is the operator such that Tr This looks like (3), but in general one has Tr(Π y /ξ (y)) = 1, In order to identify proper states, channel and measurement, one has to introduce a reference state As in the classical case, we assume that Ξ andΞ = E [Ξ] have full rank, to skip caveats for situations of measure zero. Then one possible construction of the quantum elements of (6) useŝ where we have introduced the notation S (A)[B] = √ A B √ A for a positive operator A. Starting from this basic constructions, one can obtain others: also lead to (6) for any pair of unitary channels (U s , U m ).
The key observation is that the retrodiction channelÊ turns out to be the Petz recovery or Petz transpose map 20,21 of E for the reference state Ξ [Eq. (9)], or a rotated version thereof [Eq. (12)].
The Petz map, a widely used tool in quantum information [22][23][24] , was previously identified on formal grounds as the generalisation of retrodiction within the quantum formalism [25][26][27][28] . First of all, in the case where all the states and the channels are diagonal in the same basis, (9) reduces to (3). Furthermore, just as the Bayesian retrodiction ϕ ξ depends on a reference prior ξ , the Petz mapÊ α depends on a reference state α 29 . Interestingly, the Petz map was also used for quantum fluctuation relations 30 , but the connection with retrodiction was not noticed.

III. RETRODICTION: TWO CASE STUDIES
In this Section, we present first retrodiction on Hamiltonian channels (both classical and quantum), which are provably the only ones for which the retrodictive map is independent of the reference prior and is identical to the inverse. Then we discuss retrodiction for all classical bit channels (d = 2): precisely because it is elementary, this case study is useful to clarify features and dissipate possible confusions about retrodiction.

A. Case study: Hamiltonian channels
We call Hamiltonian channels, both classical and quantum, channels that are both deterministic and invertible (Watanabe 12 referred to these channels as "bilaterally deterministic"). The flows do not cross, and each state belongs to one and only one trajectory.
For classical information, we have y = f (x) with f a bijection (in the discrete case, a permutation), and so x = f −1 (y) is uniquely defined. In this case, it is absolutely natural to expect independent of the reference prior. It is readily verified that this is indeed the case from Eq. (3), since for a bijection we haveξ (y) = ξ (x)δ y, f (x) . This result has very appealing features: the retrodiction channel coincides with the inverse and is independent of the arbitrary choice of reference prior. Appealing as they are, these features cannot be taken as paradigmatic, because they are actually unique to this case.
Result 1. The following three statements are equivalent: (I) The channel ϕ is a permutation.
(II) The retrodiction channelφ is independent of the reference prior.
(III) There exists a reference prior ξ , for which the retrodiction channelφ ξ is the inverse channel ϕ −1 .
Proof. We present a full proof here, putting on record that the equivalence of (I) and (II) was already proved in Watanabe's pioneering study 12 .
Thus, for all the off-diagonal terms to be zero we need This means the product of any two entries of a given y-row will always be zero. Hence, there can be at most one non-zero entry for that row, which means that the matrix M ϕ can have at most d non-zero entries. But there are d columns, and the sum of all the elements of each column must be 1. Thus, the only possibility is that each row and each column have exactly one non-zero entry, and the value of the entry is 1. This defines a permutation matrix and concludes the proof. Incidentally, condition (15) shows that Mφ ξ M ϕ = 1 is not determined by the reference prior; so at that point we had proved directly (III) → (II).
The same result holds for retrodiction on quantum information -in fact, Result 1 was presented first for reasons of clarity, but can be seen as a special case of the following: (II) The retrodiction channelÊ is independent of the reference prior.
(III) There exists a reference state α, for which the retrodiction channelÊ α is the inverse channel E −1 .
Finally, for the proof of (III) → (I): since any Petz map is CPTP, the starting assumptionÊ α = E −1 implies that E −1 is a CPTP map. But it is known that a CPTP map E with the same input and output space has a CPTP inverse (that is, it is invertible, and the inverse is itself a channel) if and only if it is unitary 3132 .

B. Classical one-bit channels
As a second case study, we consider classical stochastic processes for d = 2 (Fig. 2). We write a generic channel as with 0 ≤ a, b ≤ 1. Its steady state is unique unless a = b = 0 (this being expected, since every state is a steady state for the identity channel). The corresponding retrodiction channel with generic reference prior is withξ (0) = (1 − a)ξ (0) + bξ (1) andξ (1) = 1 −ξ (0). Interestingly, the retrodiction channel built on the steady state has the same stochastic matrix as the channel itself: This can be verified without calculation, noticing that ϕ γ (x|x) = ϕ(x|x) and that Mφ γ must also be columnstochastic. The channel (16) is invertible if and only if a + b = 1. Result 1 of course holds: the retrodiction channel will be the inverse if and only if a = b = 0 (identity channel) or a = b = 1 (bit-flip channel). For all the other invertible channels, retrodiction and inversion do not coincide, whatever the choice of the reference prior.
The non-invertible channels, a + b = 1, make for an interesting case study; we change the notation a → ε, so that First notice that  for all input p. In other words, these channels erase whatever information is present in the input, and produce a fixed output distribution (which, of course, coincides with their steady state). In this sense, they could all be called erasure channels, though the name is usually given to the case ε = 0.
Because at the output all information on the input has been destroyed, one may naively expect the retrodiction channel to produce a completely random outcome. But this forgetting the importance of the reference prior in retrodiction. Plugging the expressions in the equations, one readily finds The retrodiction channel of an erasure channel is the erasure channel that returns the reference prior -a result that can be easily extended to any alphabet dimension 33 . In agreement with (18), if the reference prior is the steady state, the retrodiction channel is the same erasure channel 34 . These observations are summarized in Fig. 3.

IV. FLUCTUATION RELATIONS FROM RETRODICTION
The topic of this section, fluctuation relations, originated in statistical thermodynamics. As we shall see, the formal structure of these relations can be derived without any reference to that branch of physics. As it happens, we shall mention thermodynamics only in the very last paragraph of the section. The explicit application of these formulas to important situations in thermodynamics was discussed in our previous paper 1 .

A. The process and its statistics
As we noted in the introduction, it customary in studies of irreversibility to define the physical process as the forward process, and to compare it to its corresponding reverse process. Here we adopt a different narrative: • There is only one process, the one that is happening.
• A (forward) prediction on the process starts with a prior p(x) on the input, and infers the predicted distribution • A retrodiction on the process starts with a prior q(y) on the output, and infers the retrodicted distribution The explicit mention of ξ will be dropped for simplicity in the remainder of this Section and resumed in Section V.
We proceed to derive fluctuation relations with our narrative, and later we show the comparison with the reverse-process narrative.

B. Derivation of the fluctuation relations
Consider a variable Ω(x, y) that depends on the initial and final states, and may be determined by the process. Its predicted distribution is while its retrodicted distribution is where So, the difference between µ F (ω) and µ R (ω) is encoded this ratio of probabilities, which is exactly the quantity that appears in the statistical f -divergence 35,36 where the function f (r) must be convex for r ∈ R + and satisfy f (1) = 0. The "entropy production", on which the thermodynamical literature bases fluctuation relations, uses f (r) = − ln(r), which generates the reverse Kullback-Leibler distance D KL (P F ||P R ). But we don't need to choose that particular function at this stage: for any function f (r) invertible 37 for r ∈ R + , if we set we have by definition Besides, there immediately follows from (24) and (25) the fluctuation relation that is the generalisation of Crooks' theorem 4 . By integrating over ω, one obtains the integral fluctuation relation that depends only on the process. This is the generalisation of Jarzynski's equality 3 .

C. Comparison between retrodiction and reverse process
In all the literature we are aware of, fluctuation relations are presented as a measure of the statistical difference between the forward and the reverse process, not between the predicted and retrodicted distributions of a single process. The difference between the two narratives has mathematical manifestations that we are going to discuss now.
For the sake of definiteness, let us start with a canonical example. Suppose that the variable of interest is entropy, and that in the process under study it changes by ∆S. In a retrodictive approach, (23) defines the retrodicted distribution for that same process. But if one looks at (23) as defining a reverse process, for that process the change of entropy will be rather −∆S.
Generalising this observation, the distribution of the variable ω in the reverse process reads where, under assumption (28), because the roles of P F and P R are exchanged between the forward and the reverse process [for the choice f (r) = − ln(r), there follows the expected minus sign g(ω) = f (1/r) = − f (r) = −ω]. The resulting fluctuation relation then reads 1 As expected, µ F evaluated at ω is now related toμ R evaluated at g(ω). The Jacobian factor, which comes from the change of variable in the δ -function, ensures that the integral fluctuation relation takes exactly the same form as (31). A comparison of (30) and (34) for various choices of f is given in Table I. For the thermodynamical case f (r) = − ln(r), we have |g (ω)| = 1, and therefore the only difference between (30) and (34) is that µ R is evaluated at ω whileμ R is evaluated at −ω. Thus in thermodynamics not only the Jarzynski equality, but also the Crooks fluctuation theorem is the same in both narratives (up to that sign change). Interestingly, even when reporting experiments in which the reverse process was actually implemented, it is the retrodictive version that is usually plotted for its visual convenience: see for instance the pioneering verification of Crooks' fluctuation theorem with folding and unfolding of RNA 38 .

D. Fluctuation relations and Bayesian retrodiction
In the retrodictive narrative, the fluctuation relation (30) and its derivate (31) are statistical properties of the random variable ω defined by (29). They are formally valid for the statistical comparison between arbitrary P F and P R , with no reference to the notion of retrodiction, let alone to its mathematical expression (3). In the reverse process narratives, one studies the distribution of the values of the variable when the roles of P F and P R are swapped [Eq. (33)]; but even then, the fluctuation relations follow without having specified any mathematical relation between P F and P R . So, what is the role of Bayesian retrodiction, or that of a proper definition of the reverse process? We are going to prove that it singles out a specific structure for R(x, y), and that this simple result has far-reaching consequences in the context of thermodynamics.
Result 3. The ratio R(x, y) [Eq. (26)] is of the form F(x)G(y), for some functions F and G, if and only if P F and P R are related as (22) and (23), with the latter constructed from ω = f (r) f -divergence FR for retrodiction (30) FR for reverse process (34) Integral FR (31) − ln(r) Reverse Kullback-Leibler  (29) for the corresponding f −divergence. The fourth column is kept in the form (34) without possible algebraic simplifications, to facilitate the identification of g(ω) and |g (ω)|.
Bayesian retrodiction [Eq. (3)]. In this case, Proof. If P F and P R are given by (22) and (23), using (3) it is trivial to derive (35). In the other direction: without loss of generality we keep the form (22) for P F and, using the product rule of joint probabilities, we write P R (x, y) = q(y)η(x|y) for the conditional distribution (channel) η and marginal q. The assumption reads Since the l.h.s. is strictly positive 39 , sign(F(x)) = sign(G(y)) must hold for all (x, y). Now, being a channel, η must satisfy ∑ x η(x|y) = 1, that is 1/G(y) = ∑ xF (x)ϕ(y|x). So finally where ξ (x) =F(x)/ ∑ xF (x) is a valid probability distribution because all theF(x) have the same sign. While this result may look purely anecdotal or formal, let us recall that in the usual thermodynamical interpretation Ω(x, y) = − ln(R(x, y)) is the (non-adiabatic) stochastic entropy production 40 . Thus, whenever the stochastic entropy production can be computed locally (that is, independently of the correlations between microstates x and y), a structure of Bayesian retrodiction is unavoidable (in the reverse process narrative: the reverse process must be defined through Bayesian retrodiction).

V. ON THE CHOICE OF THE REFERENCE PRIOR
In the literature on fluctuation relations, based on thermodynamics, the wording "reference prior" is absent. Its role is usually taken by an assumption of "detailed-balance". In all the examples that we have looked into 1 , this corresponds to the choice of the steady state as reference prior. The operational interpretation of this choice is very physical: one takes as reference the process in which nothing changes. It has also a very neat consequence when it comes to fluctuation relations: the ratio R(x, y) given in (35), and thus the variable that enters the fluctuation relations, depends on the channel ϕ only through its steady state γ. With this choice, one is clearly studying fluctuations around equilibrium 41 Inspired by statistical comparisons, one may opt for a different definition of the reference prior. One possibility is trying to keep prediction and retrodiction as close as possible. With such goals, let us take as a figure of merit Interestingly, we have: We present the proof in Appendix A.
In the same spirit, one can study the reference priors that minimize other figures of merit averaged over the possible priors p(x) and q(y). We run some simple numerical checks at d = 2 for two other figures of merit. For the Kullback-Leibler distance D KL (P R [ξ ]||P F ), the steady state is generically not optimal, while the uniform prior seems to be optimal again, even though the dependence on ξ is different from (36). For the guessing probability, i.e. the probability that argmaxP F (x, y) = argmaxP R [ξ ](x, y), neither the steady state nor the uniform prior are generically optimal.

VI. FINAL CONSIDERATIONS: DO WE NEED A REVERSE PHYSICAL PROCESS?
The everyday meaning of (ir)reversibility in nature is captured by the perceived "arrow of time": if the video of the evolution played backward makes sense, the process is reversible; if it doesn't make sense, it is irreversible.
Science has gone very far in bringing this intuition on quantitative ground. The standard underlying narrative still involves two processes: the one that we observe, and the associated reverse process (not deemed to be strictly impossible, but very unlikely). This reverse process is generically not the video played backward: to cite an extreme example, nobody conceives bombs that fly upward to their airplanes while cities are being built from rabble 42,43 . In the case of controlled protocols in the presence of an unchanging environment, the reverse process is implemented by reversing the protocol. If the environment were to change (in an uncontrolled way, by definition of environment), the connection between the physical process and the associated reverse one becomes thinner.
With our line of research, we are exploring the possibility that the narrative of the reverse process may not be needed at all. In the wording pioneered by Watanabe, irreversibility may be rather irretrodictability. So far, this program has found no obstacle, and has even clarified situations that were deemed puzzling in the case of some quantum channels 1 . The vistas opened by this approach also allow to expand the scope of fluctuation relations (Section IV) and discuss the choice of a reference prior (Section V).
Barring surprises à la John Bell, this conflict of narratives won't be discriminated by experiments. Indeed, on the one hand, the retrodiction channels (both classical and quantum) are by construction valid channels: nothing forbids the physical implementation of the corresponding processes, as indeed was done in the experimental verifications of Crooks' theorem 38 . On the other hand, to falsify the retrodictive narrative, one would have to find a reverse process related to its original process in a way that cannot be expressed by (or worse, contradicts) logical reasoning: it is hard to see how such a claim could ever be made. So, one's narrative of choice will depend on the fruitfulness of the intuition, the economy of concepts, the elegance of the formulas... In this paper, we have hinted at the superiority of the retrodictive narrative in all these respects.