NISQ: Error Correction, Mitigation, and Noise Simulation

18
NISQ: Error Correction, Mitigation, and Noise Simulation Ningping Cao, 1, * Junan Lin, 1, * David Kribs, 2 Yiu-Tung Poon, 3 Bei Zeng, 4, 1, and Raymond Laflamme 1 1 Institute for Quantum Computing and Department of Physics and Astronomy, University of Waterloo, Waterloo, ON N2L 3G1, Canada 2 Department of Mathematics & Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada 3 Department of Mathematics, Iowa State University, Ames, IA, USA 50011 4 Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (Dated: November 4, 2021) Error-correcting codes were invented to correct errors on noisy communication channels. Quantum error correction (QEC), however, may have a wider range of uses, including information transmis- sion, quantum simulation/computation, and fault-tolerance. These invite us to rethink QEC, in particular, about the role that quantum physics plays in terms of encoding and decoding. The fact that many quantum algorithms, especially near-term hybrid quantum-classical algorithms, only use limited types of local measurements on quantum states, leads to various new techniques called Quantum Error Mitigation (QEM). This work addresses the differences and connections between QEC and QEM, by examining different application scenarios. We demonstrate that QEM proto- cols, which aim to recover the output density matrix, from a quantum circuit do not always preserve important quantum resources, such as entanglement with another party. We then discuss the impli- cations of noise invertibility on the task of error mitigation, and give an explicit construction called quasi-inverse for non-invertible noise, which is trace preserving while the Moore-Penrose pseudoin- verse may not be. We also study the consequences of erroneously characterizing the noise channels, and derive conditions when a QEM protocol can reduce the noise. I. INTRODUCTION The field of quantum information processing has en- tered an era featuring noisy, intermediate-scale quantum (NISQ) devices. Despite some recent demonstrations of computational advantages compared to classical comput- ers [1, 2], NISQ devices still face significant challenges before eventually becoming practically useful. In partic- ular, noise in NISQ processors can spoil the computation process and possibly lead to incorrect final results. Conventionally, the main tool for protecting the pro- cessor from noise has been quantum error correction (QEC). QEC protocols are designed to allow a user to detect, and eventually correct, errors that happen dur- ing a quantum computation. While many approaches for QEC have been developed, few have been tested on real quantum processors due to the significant requirements on the hardware. First, QEC generally encodes quan- tum information into a much larger Hilbert space, which requires the hardware size to be large as well. Second, quantum operations (gates) on a processor must below a certain threshold value for QEC to successfully reduce the effective error, instead of introducing more errors. Meeting both requirements is generally difficult on most state-of-the-art devices available today. Recently, the field of quantum error mitigation (QEM) emerged with the goal of decreasing the effective noise level, while circumventing these two obstacles, on near term devices. The general consideration is that, if one * These authors contribute equally to this work. [email protected] has some knowledge about the noise processes happen- ing in a particular hardware, then one should be able to utilize that knowledge to reduce (part of) the effect of that noise. Importantly, it is more desirable to have protocols that does not require (or requires very little) additional hardware overhead in order to improve the computation accuracy. Numerous protocols have been developed during the past few years [3–7] that fall into this category. The parallel development of both fields naturally leads to the question: under what circumstances should one apply QEC over QEM, and vice versa? The current ex- perimental apparatus favors QEM due to limitations on hardware quality. However, there exists deeper distinct- ness between the two that restricts the use of QEM under some experimental goals. In this work, we first examine the relation between QEC and QEM from a high-level perspective. In Sec- tion II we give examples from classical and quantum communication, demonstrating the different usage scenes of QEC and QEM. We argue that the invertibility of noise limits the performance of optimal QEM protocols, and propose a construction called quasi-inverse in case of non-invertible noise. We prove that compared to a conventional choice of pseudoinverse, the Moore-Penrose pseudoinverse, the quasi-inverse has the advantage of be- ing trace preserving, which is advantageous in running computer simulations. In Section III we study the effects due to imperfect characterizations of noise channels, and give a sufficient condition for when an optimal QEM can improve the expectation value of any observable. arXiv:2111.02345v1 [quant-ph] 3 Nov 2021

Transcript of NISQ: Error Correction, Mitigation, and Noise Simulation

Page 1: NISQ: Error Correction, Mitigation, and Noise Simulation

NISQ: Error Correction, Mitigation, and Noise Simulation

Ningping Cao,1, ∗ Junan Lin,1, ∗ David Kribs,2 Yiu-Tung Poon,3 Bei Zeng,4, 1, † and Raymond Laflamme1

1Institute for Quantum Computing and Department of Physics and Astronomy,University of Waterloo, Waterloo, ON N2L 3G1, Canada

2Department of Mathematics & Statistics, University of Guelph, Guelph, ON N1G 2W1, Canada3Department of Mathematics, Iowa State University, Ames, IA, USA 50011

4Department of Physics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong(Dated: November 4, 2021)

Error-correcting codes were invented to correct errors on noisy communication channels. Quantumerror correction (QEC), however, may have a wider range of uses, including information transmis-sion, quantum simulation/computation, and fault-tolerance. These invite us to rethink QEC, inparticular, about the role that quantum physics plays in terms of encoding and decoding. Thefact that many quantum algorithms, especially near-term hybrid quantum-classical algorithms, onlyuse limited types of local measurements on quantum states, leads to various new techniques calledQuantum Error Mitigation (QEM). This work addresses the differences and connections betweenQEC and QEM, by examining different application scenarios. We demonstrate that QEM proto-cols, which aim to recover the output density matrix, from a quantum circuit do not always preserveimportant quantum resources, such as entanglement with another party. We then discuss the impli-cations of noise invertibility on the task of error mitigation, and give an explicit construction calledquasi-inverse for non-invertible noise, which is trace preserving while the Moore-Penrose pseudoin-verse may not be. We also study the consequences of erroneously characterizing the noise channels,and derive conditions when a QEM protocol can reduce the noise.

I. INTRODUCTION

The field of quantum information processing has en-tered an era featuring noisy, intermediate-scale quantum(NISQ) devices. Despite some recent demonstrations ofcomputational advantages compared to classical comput-ers [1, 2], NISQ devices still face significant challengesbefore eventually becoming practically useful. In partic-ular, noise in NISQ processors can spoil the computationprocess and possibly lead to incorrect final results.

Conventionally, the main tool for protecting the pro-cessor from noise has been quantum error correction(QEC). QEC protocols are designed to allow a user todetect, and eventually correct, errors that happen dur-ing a quantum computation. While many approaches forQEC have been developed, few have been tested on realquantum processors due to the significant requirementson the hardware. First, QEC generally encodes quan-tum information into a much larger Hilbert space, whichrequires the hardware size to be large as well. Second,quantum operations (gates) on a processor must belowa certain threshold value for QEC to successfully reducethe effective error, instead of introducing more errors.Meeting both requirements is generally difficult on moststate-of-the-art devices available today.

Recently, the field of quantum error mitigation (QEM)emerged with the goal of decreasing the effective noiselevel, while circumventing these two obstacles, on nearterm devices. The general consideration is that, if one

∗ These authors contribute equally to this work.† [email protected]

has some knowledge about the noise processes happen-ing in a particular hardware, then one should be ableto utilize that knowledge to reduce (part of) the effectof that noise. Importantly, it is more desirable to haveprotocols that does not require (or requires very little)additional hardware overhead in order to improve thecomputation accuracy. Numerous protocols have beendeveloped during the past few years [3–7] that fall intothis category.

The parallel development of both fields naturally leadsto the question: under what circumstances should oneapply QEC over QEM, and vice versa? The current ex-perimental apparatus favors QEM due to limitations onhardware quality. However, there exists deeper distinct-ness between the two that restricts the use of QEM undersome experimental goals.

In this work, we first examine the relation betweenQEC and QEM from a high-level perspective. In Sec-tion II we give examples from classical and quantumcommunication, demonstrating the different usage scenesof QEC and QEM. We argue that the invertibility ofnoise limits the performance of optimal QEM protocols,and propose a construction called quasi-inverse in caseof non-invertible noise. We prove that compared to aconventional choice of pseudoinverse, the Moore-Penrosepseudoinverse, the quasi-inverse has the advantage of be-ing trace preserving, which is advantageous in runningcomputer simulations. In Section III we study the effectsdue to imperfect characterizations of noise channels, andgive a sufficient condition for when an optimal QEM canimprove the expectation value of any observable.

arX

iv:2

111.

0234

5v1

[qu

ant-

ph]

3 N

ov 2

021

Page 2: NISQ: Error Correction, Mitigation, and Noise Simulation

2

II. THE CASE OF COMMUNICATION

A. Classical communication

FIG. 1: An illustrative figure for noisy classicalcommunication.

We start by considering a communication task, andfor simplicity, we start from the classical setting. Here, asender Alice would like to transmit a k-bit string to Bob.An example of the string would look like:

s = 11010101000100110101010011... (1)

Alice and Bob share a classical communication channelC which is subject to noise. She does so by sending thetext through C only once. Suppose that there are 40%of 0’s and 60% of 1’s in the text string. For simplicity,assume that the noise is described by a binary symmetricnoise channel with strength p, denoted as BSCp (notethat we’ll reserve script letters for quantum channels).This channel preserves the sent bit with probability 1−p,and flips it (symmetrically from 0 to 1 and from 1 to 0)with probability p.

Classical error correcting codes have been developed tofight this noise. An illustrative figure is given in Fig. 1.The simplest example is the 3-bit repetition code: it isdefined by the encoding

0→ 000, 1→ 111, (2)

i.e., each bit is repeatedly encoded 3 times. The numberof uses of the channel has now increased 3 times to 3k.The decoding is done by performing a majority vote onthe received bits, so that at the receiving end, Bob againobtains a bit string of length k. Assuming that BSCpacts independently on each bit, the probability of erroris reduced from p to 3p2(1− p) = O(p2) by this code.

It has long been recognized that the key ingredientthat enables classical error correction is trading fidelitywith redundancy. This concept has been transferred toquantum error correcting codes (QECCs), where a directmimicking of classical codes is forbidden due to the no-cloning principle. The complexity of quantum noise hasalso posed a great challenge to building quantum codes.Nonetheless, many QECCs have been developed thatfight against different kinds of noise. Beyond communi-cation, QECCs have also found applications in fields like

quantum computation, quantum simulation, and fault-tolerance. We will discuss further about QECCs in Sec-tion II B.

Next, consider another seemingly “natural” way of re-ducing the effect of noise. We start by writing down amatrix representation of the noise,

BSCp →(

1− p pp 1− p

):= NBSC,p (3)

where both the input and output basis are ordered as{0, 1}. The action of NBSC,p is through matrix multipli-cation, so that if Alice sends a bit 0, it becomes at theoutput end

NBSC,p

(10

)=

(1− pp

), (4)

so that Bob gets 0 with probability 1 − p and 1 withprobability p.

Note that since classical noise is governed by laws ofclassical physics, in principle an almighty Bob can learnexactly whether a noise process has occurred or not dur-ing the transmission. If this is the case, then the uncer-tainty in our probabilistic error model disappears, and hecan perfectly recover the input message. But in reality,Bob does not have such information, and Eq. (3) repre-sents his complete knowledge about the noise. Supposethat in addition, Bob knows the value of p < 1/2. Whatis then the best that Bob can do in attempt to reducethe noise effects? If Bob receives a bit 1, then he onlyknows that Alice more likely sent a 1 than a 0, so thebest deterministic procedure is simply to keep the bit.Applying this argument to all received bits, we see thatthe best that Bob can do is to simply keep all receivedbits intact.

There is, however, another possibility for Bob to usehis knowledge about p. If k is sufficiently large, then Bobcan fully recover the 40% − 60% distribution of Alice’sinput. Specifically, he applies the inverse map of NBSC,p

on his received distribution, resulting in

N−1BSC,pNBSC,pvA = vA, (5)

where vA = (0.4, 0.6)T is Alice’s input distribution. How-ever, Bob cannot further use this restored distribution torecover Alice’s message. The best he can do is to use therestored distribution to randomly generate a new k-bitstring, during which Alice’s message is completely de-stroyed. This procedure is analogous to QEM under theclassical communication setting, so we will call it classi-cal error mitigation (CEM). In particular, CEM does notincrease the channel capacity, defined by

C = maxp(x)

I(X : Y ) (6)

where X and Y are random variables denoting the inputand output, respectively, I(X : Y ) is the mutual infor-mation between X and Y , and the maximum is over allprobability distributions of the input, p(x). This is adirect consequence of the data processing inequality.

Page 3: NISQ: Error Correction, Mitigation, and Noise Simulation

3

ρin N ρout

(a)

ρin E N D ρout

(b)

FIG. 2: (a) Quantum communication model whereAlice sends a state ρin to Bob through a noisy channelN . (b) An attempt to reduce the effect of N using

QEC, through an encoding operation E and a recoveryoperation R.

B. Quantum Communication

The problem of preserving information becomes muchmore interesting and complicated for quantum commu-nication. For example, many different definitions for a“quantum” channel capacity can be proposed based ondifferent considerations [8]. Below, we consider two situ-ations in order to contrast QEC and QEM.

1. When to use both QEC and QEM

First, consider the situation in Fig. 2a where Alice pre-pares k copies of the input state ρin, and send it to Bobusing a noisy quantum channel N . This involves k usesof the quantum channel. How can Alice an Bob reducethe effect of the noise N ?

Earliest development of QECCs have initiated by con-sidering exactly this model [9]. QECCs works by usinga certain encoding scheme, which is a completely posi-tive and trace preserving (CPTP) map E : C2k → C2n

(where n > k), and then decode at Bob’s end using an-other CPTP map R : C2n → C2k . Here C2n denotesthe complex Euclidean space with dimension 2n. Thisinvolves n uses of the quantum channel which increasesthe redundancy. It has been shown that doing so mayprotect the system against certain types of noise.

Let’s contrast this with QEM. In QEM, we again as-sume that Bob has some knowledge about the noise N ;in order not to constrain Bob’s power, we further assumethat Bob knows the exact form of N . Upon receivingk copies of ρout, Bob first makes some measurements toreconstruct ρout (e.g., by using quantum state tomog-raphy), then applies the inverse channel N−1 (assumedexists for now) to recover ρin.

One should now recognize the similarity between QEMand CEM for reconstructing the input distribution. In-deed, a density matrix is probabilistic description of out-comes of any possible measurement on a quantum sys-tem. The only difference is perhaps that quantum noiseis more complex in nature. In this scenario, it is sufficient

SourceρA

N ρB

(a)

SourceρA

N N−1 ρB

(b)

n copies k copiesρA

→ρA

N ρB N ρB

(c)

FIG. 3: (a) Figure where Alice and Bob receives anEPR pair Φ+ from a source, and a noise N occur for

Bob’s channel. (b) A QEM approach. (c) A QECapproach.

for Bob to reconstruct ρin as a mathematical object inorder to eliminate the effect of N , since this determinesthe outcome of any measurement Bob can possibly makeon the system. Thus, QEM can be useful in this case.

We can further materialize this view by consideringan example using current QEM protocols, such as thequasiprobability decomposition method [3]. The goalhere is strictly weaker than the one above; namely, wewould like to recover the expectation value Tr[Oρin] forsome observable O. It is weaker because we are nowonly concerned with some particular measurements cor-responding to A. The protocol assumes that a set ofimperfect gates Gi, which forms a spanning set of all 1-qubit gates, is available. In particular, the ideal channel(which is identity in this case) can be decomposed into aquasiprobability distribution (a probability distribution

with + or − signs) of all Gi’s. Upon receiving the state,

Bob performs gates from {Gi} according to the underly-ing probability distribution, measure the observable O,and updates the measurement outcome with the + or −sign. This effectively recovers Tr[Oρin]. We can clearlysee in this example that error mitigation is helpful if ourgoal is to restore the outcome of some measurements, i.e.,some classical “shadows” of the quantum system.

2. When to use QEC only

Next, consider the situation where not only its classicalimage ρ, but also the quantum object itself, is of inter-est. For example, this is the case when entanglement isbeing used as a resource by two spatially separated usersAlice and Bob to achieve some tasks. Consider Fig. 3,where a central source would like to distribute k copiesof maximally entangled Bell pairs

Φ+ = (|00〉AB + |11〉AB)/√

2 (7)

to Alice and Bob. The quantum channel to Bob is noisyand is described by a channel N . Again, assume for nowthat N is invertible.

Page 4: NISQ: Error Correction, Mitigation, and Noise Simulation

4

First, the channel to Alice is noiseless, so Alice alonehas no knowledge on N . This implies that allowing a1-way communication channel from Alice and Bob can-not improve the fidelity of Bob’s channel. So any possibleoperation must be performed on Bob’s side only. A “nat-ural” QEM protocol Bob can apply is then to first mea-sure his qubit using the k copies, obtain ρB , then applythe inverse channel N−1. This is shown in Fig. 3b. Therecovered state then represents all of Bob’s knowledge ofhis qubit, since from this he can calculate the probabilityof any measurement outcome that he can make.

The only problem with the above protocol is that itis obviously useless. In particular, all entanglement be-tween Alice and Bob has been destroyed due to the mea-surement. In fact, analogous to the classical case whereBob recovers Alice’s input distribution and generate arandom k-bit string, here Bob knows in advance that hewill ideally get a maximally mixed state I/2; so the aboveprotocol is simply equivalent to Bob generating (I/2)k lo-cally, and discarding all qubits received from the Source!

The “correct” way of reducing the effect of N is byusing a family of procedures called entanglement purifi-cation protocols (EPPs) [10]. This is shown in Fig. 3c.In EPP, Alice and Bob needs to start from n > k copiesof the noisy Bell state, and obtain k pairs at the endwhich are closer to the pure state Φ+. We omit the de-tails of different possible procedures here, but emphasizethat only local operations and classical communications(LOCC) are used in all variants of EPP. Here again, wesee that redundancy is necessary in this task, similar toclassical EC in preserving classical information.

The above analysis shows that EPP protocols can pro-tect entanglement against noise. But what is perhapsmore profound is that a class of EPP protocols calledone-way EPP (1-EPP) also permits the creation of aQECC [10]. This is enabled by quantum teleportation:the one-way constraint creates time-separated EPR pairslike Φ+, allowing a quantum object in an arbitrary state|ξ〉 to be teleported forward in time. This effectivelycreates a faithful transportation of quantum informationfrom Alice to Bob, despite the presence of noise from thesource of Bell pairs. This type of QECC is particularlyrelevant in distributed quantum computing, where quan-tum information needs to be transported among spatiallyseparated locations. In this case the local density ma-trices, although representing the full knowledge by eachindividual location, do not cover the whole picture; sousing QEM to recover the ideal local density matriceswill not be useful.

The above example shows a deep distinction betweenQEM, which is only capable of restoring the classical im-age of a quantum system, and QEC, which is capable ofrestoring the quantum system itself, along with all possi-ble non-classical resources that the quantum system pos-sesses. It is instructive to compare the above with theclassical counterpart, namely, that when one needs topreserve classical information (see Section II A). There,we have also argued that EC is helpful for such a task,

while EM is not. Indeed, it has long been recognizedthat entanglement share a similar role to that of classicalinformation [10]. So the task of preserving entanglementmay also be viewed as a quantum analog of preservingclassical information, which can only be achieved by us-ing QEC. Furthermore, recall in Section II B 1 we arguedthat recovering density matrices in QEM is analogousto recovering classical distributions in CEM. These com-pletes our comparisons between EM and EC, which aresummarized in Table I.

EC EMClassical Classical information Classical distributionQuantum Entanglement Density matrices

TABLE I: Comparison between EC and EM under theclassical and quantum settings. The table inputs listwhat the two protocols is capable of restoring, under

both settings.

C. Invertible Noise

Finally, before moving on to discuss QEM in quan-tum computation, in this section we will address how thenature of the noise will determine the “upper limit” ofany QEM protocol. Again we start our discussion fromclassical communication. Consider again the BSCp noisemodel. The matrix representation of the noise in Eq. (3)is only a probabilistic description of the underlying phys-ical processes, which can be either nothing (identity) ora bit-flip. The classical bit-flip channel, denoted F , actsas follows:

F (0) = 1, F (1) = 0. (8)

We see that F is invertible since it is a bijection: i.e.,there is a one-to-one correspondence between its inputsand outputs. Moreover, F−1 = F . If one has thefull knowledge of an invertible noise map in a particu-lar transmission (not the probabilistic one as in Eq. (3)),then the noise can in principle be eliminated without re-dundancy. In the BSC language: if Bob knows that a bitflip happened during a transmission, then he can flip itback to eliminate the noise.

Consider another common classical error model calledthe binary erasure channel with probability p, denoted asBECp. In this model both 0 and 1 are transmitted withprobability 1 − p, and erased with probability p (Bobknows when a bit is erased). The underlying physicalprocess in this example is an erasure channel, denoted asE: it has the effect

E(0) = e, E(1) = e. (9)

where e denotes the state of being erased. Since both 0and 1 correspond to the same output e, E is not a bi-jection and thus non-invertible. In other words, even if

Page 5: NISQ: Error Correction, Mitigation, and Noise Simulation

5

Bob knows that an erasure has occurred, he cannot inferwhat Alice intended to send. Therefore, such errors can-not be corrected without using degeneracy. Importantly,the 3-bit repetition code introduced in Eq. (2) can pro-tect against the erasure error.

Now we consider the quantum case. A quantum noiseprocess is, on the physical level, described by a CPTPmap N . Just as classical noise, quantum noise can alsobe either invertible or non-invertible. The invertibility ofa quantum noise map can be deducted from its matrixrepresentation, as shown in Theorem 1. There are threedistinct possibilities. The first is that N is invertible, andN−1 is CPTP. For an invertible N , the inverse N−1 isunique, and is Hermitian preserving (HP) and trace pre-serving (TP) [11]. If the dimensions of input and outputspace are the same, the channel has a CPTP inverse iffthe channel is an unitary channel [12, 13].

It is particularly instructive to re-examine the casein Fig. 3, under this assumption on N . Since N−1 isunitary, Bob can in principle implement it on his qubit,which would fully restore the Bell pair |Φ+〉 between himand Alice. This, in fact, also correspond to a QECC,with a trivial encoding map E = I and a recovery mapR = N−1 (see Fig. 2b). Thus, no redundancy is neededin principle to recover this noise. However, note thatBob’s recovery operation is local, which cannot increasethe amount of entanglement by any valid measure ac-cording to the fundamental postulate of entanglementtheory [14]. Since the state after recovery is maximallyentangled, the one prior to recovery must be maximallyentangled as well, meaning that this (local unitary) noisemodel does not decrease the entanglement between Aliceand Bob. Thus, this noise model is rather trivial fromthe point of preserving entanglement.

The second possibility is that N is invertible, but N−1

is not CPTP. A condition for when this will happen islater given in Proposition 1. Many experimentally rel-evant noise models, such as the phase damping channeland the depolarizing channel, fall under this category.Since N−1 is not a physically realizable operation, itcannot be experimentally implemented on the target sys-tem, so our above method to restore quantum informa-tion without redundancy fails. Using QEM procedures,one can still recover the classical information in princi-ple, by first extracting the classical information throughmeasurements, and numerically apply the inverse mapN−1. But the process of measurement will inevitablydisturb the system being measured, and destroy any en-tanglement it possibly has with other systems. So in thiscase, the best possible QEM is capable of restoring theclassical information, but not entanglement.

To describe the inverse maps, it is useful to first definea matrix representation for quantum states and maps. Inthis work we denote the space of linear operators map-ping Hilbert space HA to HB as L(HA, HB), or L(HA)in short if HA = HB . Let T (HA, HB) be the space oflinear maps from L(HA) to L(HB). Let ei be the stan-dard basis of Hi with a 1 at position i and 0 elsewhere.

Let Ea,b be the standard basis of L(HA, HB) with a 1 atposition (a, b) and 0 elsewhere.

Definition 1. (Vectorization of linear operators.) Thevec mapping v(·) : L(HA, HB)→ HB⊗HA is the uniquemapping that satisfies v(Ea,b) = eb ⊗ ea.

Next we define representations for quantum maps.

Definition 2. (Choi representation.) The Choi rep-resentation of a map M ∈ T (HA, HB) is defined byC(M) =

∑a,bM(Ea,b)⊗ Ea,b.

Definition 3. (Natural representation.) The natural(or equivalently, superoperator) representation of a mapM ∈ T (HA, HB) is defined by the unique linear op-erator v(M) ∈ L(HA ⊗ HA, HB ⊗ HB) that satisfiesv(M)v(A) = v(M(A)) for all A ∈ L(HA).

In the natural representation of quantum channels, thechannel N acting on a quantum state ρ can be writtenas the superoperator v(N ) multiply the vector represen-tation v(ρ) of the quantum state ρ [15]. The vector rep-resentation v(ρ) of ρ inherits its ordering from the su-peroperator, hence we abuse the notation v(·) for vectorrepresentations of quantum states and observables (it of-ten written as the double ket |ρ〉〉 in other literature).

The following theorem directly comes from representa-tion theory of linear maps.

Theorem 1. The quantum channel N is invertible iffv(N ) is an invertible matrix.

Here is an example that the inverse N−1 of a CPTPmap N is non-CP.

Example 1. Let the Choi representation of a quantumchannel N be

C(N ) =

34 0 − i

812 + i

80 1

4 − i8

i8

i8

i8

14 0

12 − i

8 − i8 0 3

4

.

The superoperator is

v(N ) =

34

i8 − i

814

0 12 − i

8 − i8 0

0 i8

12 + i

8 014 − i

8i8

34

.

Therefore, the inverse of v(N ) is

v(N−1) =

32

14 − i

214 + i

2 − 12

0 2 + i2

i2 0

0 − i2 2− i

2 0− 1

2 − 14 + i

2 − 14 − i

232

.

Its Choi representation is

C(N−1) =

32 0 1

4 + i2 2− i

20 − 1

2i2 − 1

4 − i2

14 − i

2 − i2 − 1

2 02 + i

2 − 14 + i

2 0 32

.

Page 6: NISQ: Error Correction, Mitigation, and Noise Simulation

6

The Choi representation C(N−1) has negative eigenval-ues. Therefore, N−1 is a HPTP map, but not CP.

D. Non-invertible Noise

The third possibility is thatN is non-invertible. Underthis noise, even the classical information cannot be com-pletely recovered without using redundancy in principle.However, the information can be partly restored.

It is known that the superoperator v(N−1) of the in-verse channel N−1 equals to the inverse v(N )−1 of thesuperoperator v(N ). However, if the channel N is notinvertible, the generalized inverse of v(N ) is not unique.The question of how to construct the inverse-like channelfor a non-invertible channel naturally arises.

A commonly used generalized inverse is the Moore-Penrose inverse [16, 17]. Later, we will show that theMoore-Penrose inverse of a CPTP map may not be TPanymore. A qubit channel example is shown in Exam-ple 2. When the generalized inverse N g is not TP, thestate came out of N g is not trace 1, which causes failureon metrics like fidelity.

Here we provide a construction of inverse-like channelN+. Let the dimension of input and output space be d.Take the Jordan decomposition of the superoperator ofN ,

v(N ) = Q · J ·Q−1 (10)

where J is the Jordan normal form, Q is a invertible ma-trix contains the generalized eigenvectors of v(N ). Ifv(N ) is diagonalizable, the Jordan normal form J =diag[λ1, · · · , λd2 ] is the diagonal matrix contains eigen-values λi of v(N ).

We take the inverse-like channel N+ to be

v(N+) = Q · J ′ ·Q−1. (11)

If v(N ) is diagonalizable, J ′ is the diagonal matrix thatleaves the 0’s in J untouched and take the reciprocalof the rest elements in J . If v(N ) is defective, we canconstruct each Jordan block in the following way: a k byk Jordan block Jλi

of λi (λi 6= 0) in J is

Jλi =

λi 1

λi. . .

. . . 1λi

,

let the corresponding block J ′λiin J ′ be the inverse of Jλi

J ′λi:= J−1

λi=

1λi− 1λ2i

· · · (−1)k+1 1λki

1λi− 1λ2i· · · (−1)k 1

λk−1i

. . .. . .

...1λi

− 1λ2i

1λi

.

For a k by k Jordan block of diagonal zero (λi = 0), whichis the nilpotent matrix N , we can set the correspondingblock in J ′ as a zero matrix 0k . Since N is not invertible,letting the block be 0k will have the same result as settingit as Nk−1. There is a certain freedom in the choice ofthis block.

Note that, for invertible channels, N+ described aboveprovides the inverse N−1 of the channel (N+ = N−1).For non-invertible channels, this construction Eq. (11)does not satisfy the condition of generalized inverse (N ◦N+◦N 6= N when the dimension of the nilpotent Jordanblock is greater than one), and we will call N+ the quasi-inverse.

The resulting composed map v(N )v(N+) = QJ ′′Q−1,where J ′′ is a diagonal matrix, only containing 0’s and 1’son its main diagonal. When the noise channel N alreadyonly contains 1 and 0 in its spectrum, the quasi-inverseis itself, and does not recovery more information. In fact,any generalized inverse would not improve the outcomein this case.

The following proposition tells us another condition fora quantum channel to have a non-CP (quasi-) inverse.

Proposition 1. If a non-zero eigenvalue λ of a quantumchannel N has modulus less than 1 (|λ| < 1), then theinverse (or quasi-inverse) channel N+ is not completepositive.

Proof. N is a CPTP map,therefore its spectral radius isone [18], i.e. |Jii| ≤ 1 for any main diagonal element Jiiin J . Since N has eigenvalues less than 1, there exists|Jjj | < 1 for some j ∈ {1, · · · , d2}. As defined above,|J ′jj | > 1, i.e. the spectral radius of N+ is greater than

one. Therefore, N+ is not complete positive.

Note that the spectrum of a quantum channel can bedefined independently from its representations. In thissection, we mainly work with superoperators (naturalrepresentation), but the Proposition 1 still holds in otherrepresentations (e.g. the Pauli representation).

Superoperators are powerful when calculating channelcompositions and their actions on quantum states. Un-like the Choi representation, the natural representationdoes not directly show a lot of critical properties of quan-tum channels, like CP, TP, and HP. However, we foundthat the eigen structure of the superoperator is essen-tial for its property. Lemma 1 and Lemma 2 provide aninsight of why Moore-Penrose inverse is not TP in cer-tain cases. And then, in Theorem 2, we prove that thequasi-inverse for a TP map is also TP.

Denote the trace operation in the vector representationv(A) of a d by d matrix A as sTr [·], where sTr [v(A)] =Tr(A).

Lemma 1. If a linear map N : Md → Md is trace pre-serving, the eigenvectors v and generalized eigenvectorsvg of eigenvalue λ 6= 1 of the superoperator v(N ) is tracezero, i.e. sTr [v] = sTr [vg] = 0.

Page 7: NISQ: Error Correction, Mitigation, and Noise Simulation

7

Proof. For an eigenvector v of v(N ), we have v(N )v =λv. Since N is trace preserving, sTr [v] = sTr [λv]. Andthe eigenvalue λ 6= 1, we have sTr [v] = 0

For a k by k Jordan block of eigenvalue λg, wherek > 1, denote the first generalized eigenvector as vg1 , wehave

[v(N )− λgI]vg1 = v, (12)

where v is the eigenvector corresponding to λg. Takingthe trace on both sizes, sTr [(v(N )− λgI)vg1 ] = sTr [v],the left hand side is sTr [vg1 − λgvg1 ] = (1− λg)sTr [vg1 ],and the right hand side is zero from the argument above.Since λg 6= 1, sTr [vg1 ] = 0. By deduction, all vgi aretrace zero for i ∈ {1, · · · , k − 1}.

Lemma 2. For a trace persevering linear map N : Md →Md, if there is a k by k (k > 1) defective Jordan Blockof eigenvalue λ = 1 in v(N ), the eigenvector v and firstk− 2 generalized eigenvector vgi has to be trace zero, i.e.sTr [v] = sTr [vgi ] = 0 for i ∈ {1, · · · , k − 2}.

Proof. Assume that sTr [v] 6= 0. The first generalizedeigenvector vg satisfy that [v(N ) − I]vg1 = v. Takingtrace on both size, the left hand side equals to zero, andthe right hand side does not equal to zero. It is a contra-diction. The same argument holds for the rest of gener-alized eigenvectors except the last one.

From Lemma 1 and Lemma 2, we know that all eigen-vectors vλ for λ 6= 1 of a TP map has to be traceless.When λ = 1, if its algebraic multiplicity equals to itsgeometry multiplicity, sTr [v(N )vλ] = sTr [vλ] (i.e. thetrace of vλ will not be changed under the action of v(N ));if the algebraic multiplicity does not equal to the geom-etry multiplicity, the eigenvectors and generalized eigen-vectors is traceless except for the last generalized eigen-vector. This tells us that the eigen structure of the su-peroperator v(N ) is crucial for N to be TP. The waythat we construct the quasi-inverse N+ largely preservesthe eigen structure, while the Moore-Penrose inverse N p

focuses more on the singular value structure. It hintsthat N+ should be TP and N p may not.

Theorem 2. The quasi-inverse N+ of a trace preservingmap N is also trace preserving.

To prove that N+ is trace preserving, we need to prove

sTr[v(N+)vλ

]= sTr [vλ] ,

for every eigenvectors and generlized eigenvectors vλ ofv(N ) in Q. From the construction of N+, we almostget trace preserving for free. The proof can be foundin Appendix A. Moreover, it is easy to see from the proofthat the composed map N+ ◦N is also trace preserving.

Example 2. Here we give an example where the Moore-Penrose inverse N p of a CPTP map is not TP, but our

constructed quasi-inverse N+ is TP. Consider a noisechannel N whose Choi representation is given by

C(N ) =1

20

8 0 1 60 12 2 −11 2 8 06 −1 0 12

,

then its superoperator is

v(N ) =1

20

8 1 1 80 6 2 00 2 6 012 −1 −1 12

J = diag(0, 1, 2

5 ,15 ), and J ′ = diag(0, 1, 5

2 , 5).

The superoperator of quasi-inverse N+ is

v(N+) =

25

516

516

25

0 154 − 5

4 00 − 5

4154 0

35 − 5

16 − 516

35

The Choi representation of N+ is

C(N+) =

25 0 5

16154

0 35 − 5

4 − 516

516 − 5

425 0

154 − 5

16 0 35

The Choi representation has negative eigenvalues.

Therefore, the channel N+ is trace preserving, Hermi-tian preserving, but not complete positive.

The Moore-Penrose inverse of v(N ) is

v(N p) =

115294

10441

10441

505882

50147

3245882 − 1165

882 − 100441

50147 − 1165

8823245882 − 100

441115294

10441

10441

505882

,

and its Choi representation is

C(N p) =

115294

50147

10441

3245882

50147

115294 − 1165

88210441

10441 − 1165

882505882 − 100

4413245882

10441 − 100

441505882

which is Hermitian preserving but not trace preserving.

Fig. 4 shows the impact of different recovery channels.For this particular channel N , the quasi-inverse perfectlyrecovers the expectation value for Pauli operator X.

The composition of a singular channel and its quasi-inverse, N re := N+ ◦ N , has a kernel dimension at leastone. The effect of the composed channel N re on trans-mitted quantum states is analyzed in Appendix B.

Page 8: NISQ: Error Correction, Mitigation, and Noise Simulation

8

0 10 20 30 40 50

0.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

Tr(

X)

p

+

ideal

FIG. 4: The expectation values of the Pauli X operator,Tr(ρX), of 50 randomly generated quantum states. The

x-axis is a dummy label for tested states. Thequasi-inverse channel N+ perfectly recover theexpectation values (green triangles) while the

Moore-Penrose inverse N p sometimes worsen the results(blue dots).

III. QEM IN QUANTUM COMPUTATION

In the previous section we have discussed how the na-ture of noise determines whether it is theoretically pos-sible to fully recover the quantum and/or classical infor-mation, under the framework of classical and quantumcommunication. We defined non-invertible noise and con-structed a quasi-inverse under such a case. In this sec-tion, we will study the effects of recovery operations whenperforming QEM in quantum computation. To do so wewill focus on invertible noise only, which still covers most“weak” forms of noise commonly encountered in experi-ments. From our previous discussions, we know that theideal output density matrix can be recovered in principleunder such noise channels, by (for example) numericallyapplying the inverse noise channel to the output state.First, we will consider different methods of implement-ing this inverse in practice.

A. Possibilities of QEM in quantum computation

Assume now there is a noisy quantum circuit withdepth n, where each layer can be represented by a uni-tary map Ui with i = 1, ..., n. The ideal output wouldbe

ρidealout = Un ◦ · · · ◦ U1(ρin).

In practice the gates Ui are implemented imperfectly.Making the standard Markovian assumption on the noise,each imperfect Ui can be decomposed asNiUi, where eachNi is a CPTP map and can be distinct for different i. We

thus have

ρout = Nn ◦ Un ◦ · · · ◦ N1 ◦ U1(ρin) (13)

where ρin is the input quantum state, ρout is the quantumstate came out of the noisy circuits, Ui are the desiredoperations, and Ni are the noise corresponding to gateUi.

To perform QEM, one first tries to learn (part or allof) the noise models, then recover the ideal gates througheither physical or numerical means. Thus, if we wish toanalyze the performance of the best possible QEM strat-egy, we may assume that all Ni’s are known exactly. Andin reality, these Ni’s are obtained from experiments ei-ther during the calibration stage or as part of the QEMprocess, which necessarily involves inaccuracies when be-ing reconstructed. Denote the noise models we receivedfrom experiments Ni, which are approximations of Ni.Let N−1

i denote the inverse of Ni. In this section wewill consider channels with the same input and outputdimensions.

First, consider the case where N−1i exists and is CPTP

for all i. Recall that this is true iff Ni is a unitary chan-nel. Then in principle one can insert an additional gateimplementing N−1

i after each Ui to fully invert the noiseeffect [11]. In reality, the experimentally obtained noise

models are Ni. Thus, the output from this method willbe

ρEM = N−1n ◦ Nn ◦ Un ◦ · · · ◦ N−1

1 ◦ N1 ◦ U1(ρin). (14)

Naturally, there are two main sources of additional er-ror. First, the experimentally learned noise model Nidoes not always equal Ni, so Ni ◦ Ni does not neces-sarily equal to the identity. Second, even if Ni can belearned ideally, physically implementing N−1

i will alsonot be ideal and can introduce more errors.

Next, consider the case where N−1i exists but is not

CPTP. In this case it is impossible to physically restorethe ideal output state. However, we can still perform theinverse numerically to recover the density matrix of theoutput state. The error mitigated output density matrixis given by

ρEM = E−1EM(ρexp

out) = N−11 ◦ N−1

2 ◦ · · · ◦ N−1n (ρexp

out), (15)

where

N−1i = Un ◦ · · · ◦ Ui+1 ◦ N−1

i ◦ U†i+1 ◦ · · · ◦ U†n. (16)

For example, a depth-3 circuit can be numerically in-verted by

ρEM = N−11 ◦ N−1

2 ◦ N−13 (ρexp

out)

= U3 ◦ U2 ◦ N−11 ◦ U†2 ◦ N−1

2 ◦ U†3 ◦ N−13 (ρexp

out).(17)

The numerical inverse method does not involve im-plementing physical gates, but still require that noiseprocesses are accurately characterized. Many current

Page 9: NISQ: Error Correction, Mitigation, and Noise Simulation

9

numerical QEM protocols can be categorized as tryingto obtain the exact noise-inverted output, meaning thattheir optimal performance is upper bounded by Eq. (17).For example, the quasi-probability sampling method [3]directly attempts to implement the ideal gates throughimperfect ones numerically; and learning-based QEM at-tempts to implicitly learn the noise models and invertthem through regression [19].

In general, we can expand Eq. (15) to get

ρEM =E−1EM(ρexp

out)

=[Un ◦ · · · ◦ U2] ◦ [N−11 ◦ · · · ◦ U†n−1 ◦ N−1

n−1 ◦ U†n ◦ N−1n ]

◦ [Nn ◦ Un ◦ · · · ◦ N1 ◦ U1](ρin)

=Un···1 ◦ (U†N−1)1···n(ρexpout) (18)

where we defined

Un···1 := Un ◦ · · · ◦ U1

and

(U†N−1)1···n := U†1 ◦ N−11 ◦ · · · ◦ U†n ◦ N−1

n . (19)

Here Un···1 is the ideal circuit sequence, and (U†N−1)1···nis the channel that maps ρexp

out back to ρin. The composi-tion of such channels first maps the experimental outputstate ρexp

out back to the input state ρin, then perform theideal operations Un···1; this is illustrated by the blue ar-rows in Fig. 5.

A naive numerical implementation of the channel in-verse requires simulating the quantum circuit Un···1,which is naturally expensive. Generally speaking, thecomputational complexity for computing Eq. (18) is evenhigher than classically simulating the ideal circuit. Thisis in addition to the cost of characterizing noise channels.Therefore, directly computing such a inverse channel isnot efficient for the purpose of mitigating error, thus notvery useful in practice. However, theoretically, compar-ing the error mitigated results with classically simulatedones can unfold how precise our knowledge about thedevice noise is. Moreover, the result from this “optimalmethod” upper bounds the performance of any error mit-igation protocol.

Finally, we mention briefly that when only an approx-imate version of the ideal output is wanted, one may ap-ply an effective channel method where only one effectiverecover map N−1

eff is applied (either physically or numer-ically) at the end, i.e.,

ρEM = N−1eff (ρexp

out). (20)

The effective recover map N−1eff is one with tunable pa-

rameters. Normally, this method involves first estimat-ing the parameters in Neff (the effective noise channel)according to experimental data, then calculating and ap-plying N−1

eff to new experimental data to mitigate errors.Methods that fall into this category include decoherencecompensation in NMR experiments, and depolarizing-model-based EM [20]. Recent work also considered con-tinuous inversion through the Petz recovery map [21].

B. Imperfect Characterization of Noise Channels

We now study the effects of imperfectly characterizednoise channels on the performance of QEM. It is gener-ally acknowledged that characterizing noise models in aquantum system is highly resource demanding [22]. Inmany current error mitigation protocols, the noise chan-nel is assumed to be particular models [23], such as adepolarizing channel D. It is then natural to ask thequestion of how incorrectly characterized noise channels{Ni} would affect the mitigation outcome. As mentionedbefore, we will assume all {Ni}’s to be invertible in thissubsection.

As shown in Fig. 5, while the ideal circuits and theexperimental operations are CPTP maps, the channels

(U†N−1)n···1 and (U†N−1

)n···1 (defined in Eq. (19)) are

not necessarily CPTP anymore. The difference Ni −Nibetween the estimations Ni and the actual channels Niupper bounds the result of EM, independent of how theinverses are achieved. And it only affects the difference

between (U†N−1)n···1 and (U†N−1

)n···1.

ρin ρidealout

ρexpout

ρEM Tr(ρEMA)

Tr(ρexpoutA)

Tr(ρidealout A)

Un···1: Unitary (CPTP)

(U†N−1)1···n: HPTP

(U†N−1)1···n:HPTP

(U† N

−1) 1

···n−

U† 1···

n

(U† N

−1) 1

···n−

(U† N

−1) 1

···n

1

FIG. 5: The schematic diagram of maps. The bluearrows indicate the channel

E−1EM-ideal := Un···1 ◦ (U†N−1

)1···n for ideal errormitigation, and the red arrows indicate the channel

E−1EM = Un···1 ◦ (U†N−1

)1···n for actual error mitigation.The error between actual noise channels Ni and

estimations Ni cause the difference between

(U†N−1)1···n and (U†N−1

)1···n, which leads to adeviation in the mitigated result.

From the perspective of output states, the goal of EMis to bring the output states closer to the ideal. In termsof state fidelity, this is to ensure that

F (ρEM, ρidealout ) > F (ρexp

out , ρidealout ), (21)

where F (ρ1, ρ2) := tr(√√

ρ1ρ2√ρ1

)is the fidelity be-

tween ρ1 and ρ2.If the actual noise channels {Ni} are invertible and

the noise characterization is perfect (Ni = Ni), theoreti-cally the errors can be perfectly mitigated, with Eq. (21)

naturally satisfied. Realistically, Ni 6= Ni, which opensthe gap between ideal output states ρideal

out and error mit-

Page 10: NISQ: Error Correction, Mitigation, and Noise Simulation

10

igated state ρEM. We would like to know how much willthe imperfections in characterizing N worsen the fidelity.

Let ∆Ni := Ni −Ni and ∆N−1i := N−1

i −N−1i . Note

that ∆NiN−1i +Ni∆N−1

i = 0, therefore ∆Ni and ∆N−1i

are related to each other. We mainly use ∆N−1i in later

discussion.Fig. 5 shows that the errors {∆N−1

i } only affect

(U†N−1)1···n and (U†N−1

)1···n in the EM inverse chan-nels E−1

EM-ideal and E−1EM respectively. The difference be-

tween ρEM and ρidealout is

ρEM − ρidealout = E−1

EM(ρexpout)− U1···n(ρin)

= Un···1 ◦[(U†N−1

)1···n − (U†N−1)1···n

](ρexp

out).

(22)

In the middle bracket in Eq. (22), the errors {∆N−1i }

scramble in the layers of unitaries U†i . Let

∆N := (U†N−1)1···n − (U†N−1

)1···n, (23)

Eq. (22) becomes ρEM− ρidealout = U1···n ◦∆N (ρexp

out). Takethe first order estimation ∆N (1) of ∆N , where each termin ∆N (1) only contain one of ∆N−1

i (see Eq. (C2) in Ap-pendix C for the explicit expression). The first ordererror between states is ∆ρEM := U1···n ◦ ∆N (1)(ρexp

out).We then define F (ρEM, ρEM + ∆ρEM) to be the firstorder estimation F (1)(ρEM, ρ

idealout ) of the state fidelity

F (ρEM, ρidealout ). The following proposition gives a bound

on this quantity.

Proposition 2. The first order estimation of fidelity be-tween ρEM and ρidealout is(

1− 1

2

√dCexp

∥∥∥v(∆N (1))∥∥∥)2

≤ F (1)(ρEM, ρidealout )

≤ 1− 1

4

(lU ·

∥∥∥v(∆N (1))v(ρexpout)∥∥∥)2

, (24)

where Cexp := ‖v(Un···1)‖ · ‖v(ρexpout)‖ is an experiment-related constant, and lU := inf‖x‖=1 ‖v(Un···1)x‖ is thelower Lipschitz constant of the ideal operations Un···1.The norm ‖ · ‖ is 2-norm for vectors and is the inducedmatrix norm for matrices.

We can see that F (1)(ρEM, ρidealout ) is bounded by ∆N (1)

and other experimental constants. Therefore, by bound-ing the errors {∆N−1

i } in channel estimation, one canconstrain the fidelity by using Eq. (24). In fact, this re-sult can be understood easily from the left side of Fig. 5.Details can be found in Appendix C.

If the task realized by the given circuit only concernsthe expectation value of a set of observables {Ai}, thenthe goal of QEM can be simplified as recovering the idealexpectation value, Tr

(ρideal

out Ai).

As shown in Fig. 6, one would wish the error mitigatedresult to lie within the green area. Since we cannot per-fectly characterize the noise models Ni, we would like

ρidealout ρexp

out ρEM

Tr(ρA)

∆a

∆a

1

FIG. 6: The schematic diagram for mitigating errors forthe expectation value Tr(ρA). The goal of error

mitigation is to have Tr(ρEMA) in the green zoom forinterested observable A.

to know the condition which guarantees that Tr(ρEMA)lands in the green zone. We show in Appendix D thatthe following is a sufficient condition for such a goal.

Proposition 3. If the following condition Eq. (25) issatisfied, EM is guaranteed to improve the expectationvalue of any observable A for any circuit Un···1.

‖v(∆N )‖ ≤ lideal-exp, (25)

where lideal-exp := inf‖x‖=1

∥∥∥v((U†N−1)1···n − U†1···n)x∥∥∥

is the lower Lipschitz constant of v((U†N−1)1···n−U†1···n).

In the above result, the channels (U†N−1)1···n and

U†1···n maps ρexpout and ρideal

out back to ρin respectively. Thiscondition Eq. (25), in general, is asking that the ∆N(Eq. (23)) to be smaller than (U†N−1)1···n − U†1···n. Itis straightforward to observe from the brackets in Fig. 5.Since this proposition is for any observables and any cir-cuit, it will also work for quantum state fidelity.

Note that Eq. (25) is a stringent requirement. If

v((U†N−1)1···n−U†1···n) has a nontrivial null space, then

it will focus the noise channel estimation Ni to be perfect,i.e. Ni = Ni for ∀i ∈ {1, · · · , n}. We do not introduce ex-tra assumptions on circuits and noises while deriving thissufficient condition. Knowing more information aboutthe circuit and noises can loosen the requirement.

Normally, certain noise models are assumed while iden-tifying device noise.The assumptions made on noise mod-els lead to savings in parameters and resources in char-acterization. However, the distance between the actuallynoise N in the system and the model assumed cannot bearbitrarily close. It opens a gap between the ideal out-comes and error mitigated outcomes of the given circuit.In particular, if the error model is overly simplified, itcan cause problems on EM performance.

We consider a simple example of a depth-1, single qubitquantum channel, where the actual noise N is a PauliChannel. Suppose one believes that the noise in the sys-tem is mainly depolarizing, and tries to use the depo-larizing channel D to approximate the actual noise. Af-

Page 11: NISQ: Error Correction, Mitigation, and Noise Simulation

11

ter fitting the parameters in D, the (quasi-) inverse D+

of the estimated D is used to recover information (i.e.D+ ◦ N (ρ)).

The Karus representation of N and D are N :{√p1I,

√p2X,

√p3Y,

√(1− p1 − p2 − p3)Z} and D :

{√

1− 3λ4 I,

√λ4X,

√λ4Y,

√λ4Z}. For a given set of

{p1, p2, p3}, the optimal λ to minimize ‖N − D‖?varies according to different representations and differentchoices of norm ‖ · ‖?. The symmetry on the parametersin D makes it impossible to perfectly capture the noiseN for pi’s that do not have such a symmetry.

Note that, the two vectors, ~n :=

(√p1,√p2,√p3,√

(1− p1 − p2 − p3)) and ~d :=

(√

1− 3λ4 ,√

λ4 ,√

λ4 ,√

λ4 ), are also representations

for N and D respectively. Since ~n and ~d are normalized,minimizing the distance between N and D is equivalent

to maximizing ~n · ~d. i.e.

maxλ∈[0,1]

{√p1(1− 3λ

4)

+[√p2 +

√p3 +

√(1− p1 − p2 − p3)]

√λ

4

}.

When p1 = 12 and p2 = p3 = 0, a channel will

have a phase flip error with probability 12 and will

stay unchanged with probability 12 , corresponding to

an optimal λmax value of 13 . This λmax bounds the

distance between N and D from above for this met-ric. Assume one fits the parameter λ from experiments,and obtains the estimation that λ = 1

3 , the channel

{√

34I,√

112X,

√112Y,

√112Z} will be believed to be N .

Then N−1 = D−1 will be used to perform error mitiga-tion. In Fig. 7, we can see that the actual channel Nin fact preserves the expectation value of Z. Becauseof the incorrect assumptions on noise model, the miti-gated results are actually worse (see the blue trianglesin Fig. 7). Also note that, since D−1 is non-CP, the out-puts D−1 ◦ N (ρ) are not valid quantum states anymore.In this case the fidelity function is not bounded below 1,thus is no longer a valid metric. We give further detailsin Appendix E.

While the above is a rather extreme example of chan-nel mismatching, the message in this example is alerting.The gap between our knowledge of the noise and the ac-tual noise in devices should also be considered while mit-igating errors. Although we can lower bound the fidelityof the error mitigated state ρEM and ρideal

out from Proposi-tion 2, mitigating errors to improve the results still implya competition between the experiment accuracy and thenoise characterization (Proposition 3 and Fig. 5). In or-der to improve experimental readout from EM protocals,the increasing accuracy of the experiments demands bet-ter knowledge of device noise, which will translate intoexpensive procedures and sampling costs on noise char-

0 10 20 30 40 501.5

1.0

0.5

0.0

0.5

1.0

1.5

Tr(

Z)

1

ideal

FIG. 7: Expectation value of Z for 50 randomlygenerated states. The x-axis is the dummy label for

these tested states.

acterization.

IV. CONCLUSION

In this paper, we discussed the different scenarios forusing error mitigation and error correction. While errormitigation has demonstrated its use for most near-termquantum algorithms, it may destroy quantum resourceslike entanglement under some other scenarios, such asdistributed quantum computation. It is thus an inter-esting open question to further study and classify theuse cases of QEM protocols. We also show that the na-ture of noise processes limits the optimal performance ofQEM, and analyzed three distinct cases where the noiseis invertible and CPTP, invertible but not CPTP, andnon-invertible. The first case is where both classical andquantum information is preserved; the second is whereclassical information can be perfectly restored but partof the quantum information is lost; and the third is whenboth classical and quantum information will be lost.

We next focused on the case of non-invertible noise.For non-invertible noise channels, at least part of theinformation carried in the quantum states ρ will be in-evitably erased, and its generalized inverse is not unique.In this case, we constructed a inverse-like channel, calledquasi-inverse, to restore the information. We provedthat the quasi-inverse is always trace preserving whilethe Moore-Penrose inverse may not. This has importantimplications for computing upper bounds on the perfor-mance of QEM protocols. Many previous works concernthe CPTP inverses of quantum channels, mainly due tothe fact that only CPTP maps can be physically im-plemented. However, any HPTP maps can be writtenas the linear combination of CPTP maps, thus openingthe probability of physically implementing CPTP com-ponents by parts and then post-processing results fromeach branch together. Moreover, a widely considered re-covery map, called the Petz recovery channel, is specified

Page 12: NISQ: Error Correction, Mitigation, and Noise Simulation

12

to a particular state. An HPTP channel that is opti-mally recovering over all input states for a non-invertiblechannel is rarely considered in previous literature. Ourresults in Section II D provide a new point of view ofsuperoperators, thus inviting people to tackle the aboveproblem. More work can be done in this direction.

When the noise channels are invertible, the improve-ment from EM protocols is constrained by our knowledgeabout the noise in the device of interest. The gap between{Ni} and {Ni} can bound the fidelity between the idealstate and the error mitigated state. A sufficient condi-tion is derived for theoretically guarantee an improve-ment. By further expending the norm of the first ordererror ∆N (1) (Eq. (C6)), then considering the samplingcost for reaching such an accuracy, one can compute theminimal resource required to learn the noise before apply-ing EM processes. Generally speaking, to guarantee animprovement after EM procedures, the accuracy of noisecharacterization needs to be better than the accuracy ofthe experiment. That is, the better the experiment, the

more expensive the noise characterization needs to be.The EM inverse channel E−1

EM involves the ideal cir-cuits Un···1. Directly computing or decomposing such achannel is more expensive than classically simulating theideal result. It limits the power of numerically imple-menting such a inverse. We also need to be cautiouswhen implementing a non-CP inverse. If the device noiseis perfectly captured, the non-CP inverse would not causetrouble. However, the gap between {Ni} and {Ni} maylead to non-physical numerical outcomes. In this case,commonly used metrics (such as fidelity) fail.

ACKNOWLEDGMENTS

N.C. thanks Maxwell Fitzsimmons for helpful discus-sions. This research was undertaken thanks in part tofunding from the Government of Canada through theNatural Sciences and Engineering Research Council ofCanada (NSERC).

[1] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon,Joseph C. Bardin, Rami Barends, Rupak Biswas, SergioBoixo, Fernando G. S. L. Brandao, David A. Buell, BrianBurkett, Yu Chen, Zijun Chen, Ben Chiaro, RobertoCollins, William Courtney, Andrew Dunsworth, Ed-ward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney,Marissa Giustina, Rob Graff, Keith Guerin, Steve Habeg-ger, Matthew P. Harrigan, Michael J. Hartmann, AlanHo, Markus Hoffmann, Trent Huang, Travis S. Humble,Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri,Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov,Sergey Knysh, Alexander Korotkov, Fedor Kostritsa,David Landhuis, Mike Lindmark, Erik Lucero, DmitryLyakh, Salvatore Mandra, Jarrod R. McClean, MatthewMcEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen,Masoud Mohseni, Josh Mutus, Ofer Naaman, MatthewNeeley, Charles Neill, Murphy Yuezhen Niu, Eric Os-tby, Andre Petukhov, John C. Platt, Chris Quintana,Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Ru-bin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyan-skiy, Kevin J. Sung, Matthew D. Trevithick, AmitVainsencher, Benjamin Villalonga, Theodore White,Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven,and John M. Martinis, “Quantum supremacy using aprogrammable superconducting processor,” Nature 574,505–510 (2019).

[2] Han-Sen Zhong, Hui Wang, Yu-Hao Deng, Ming-ChengChen, Li-Chao Peng, Yi-Han Luo, Jian Qin, Dian Wu,Xing Ding, Yi Hu, Peng Hu, Xiao-Yan Yang, Wei-JunZhang, Hao Li, Yuxuan Li, Xiao Jiang, Lin Gan, Guang-wen Yang, Lixing You, Zhen Wang, Li Li, Nai-Le Liu,Chao-Yang Lu, and Jian-Wei Pan, “Quantum computa-tional advantage using photons,” Science 370, 1460–1463(2020).

[3] Kristan Temme, Sergey Bravyi, and Jay M Gambetta,“Error Mitigation for Short-Depth Quantum Circuits,”Physical Review Letters 119, 180509 (2017).

[4] Suguru Endo, Simon C Benjamin, and Ying Li, “Prac-

tical Quantum Error Mitigation for Near-Future Appli-cations,” Physical Review X 8, 031027 (2018).

[5] Sam McArdle, Xiao Yuan, and Simon Benjamin, “Error-Mitigated Digital Quantum Simulation,” Physical Re-view Letters 122, 180501 (2019).

[6] Filip B. Maciejewski, Zoltan Zimboras, and Micha l Osz-maniec, “Mitigation of readout noise in near-term quan-tum devices by classical post-processing based on detec-tor tomography,” Quantum 4, 257 (2020).

[7] Balint Koczor, “Exponential Error Suppression for Near-Term Quantum Devices,” Physical Review X 11, 031057(2021).

[8] Laszlo Gyongyosi, Sandor Imre, and Hung Viet Nguyen,“A Survey on Quantum Channel Capacities,” IEEECommunications Surveys & Tutorials 20, 1149–1205(2018).

[9] Peter W Shor, “Scheme for reducing decoherence inquantum computer memory,” Physical Review A 52,R2493–R2496 (1995).

[10] Charles H Bennett, David P. DiVincenzo, John A Smolin,and William K Wootters, “Mixed-state entanglement andquantum error correction,” Physical Review A 54, 3824–3851 (1996).

[11] Jiaqing Jiang, Kun Wang, and Xin Wang, “Physicalimplementability of quantum maps and its applicationin error mitigation,” (2020).

[12] John Preskill, “Lecture notes for physics 229: Quan-tum information and computation,” California Instituteof Technology 16 (1998).

[13] Ashwin Nayak and Pranab Sen, “Invertible quantumoperations and perfect encryption of quantum states,”arXiv preprint quant-ph/0605041 (2006).

[14] Michal Horodecki, “Entanglement measures,” QuantumInformation and Computation 1, 3–26 (2001).

[15] John Watrous, The theory of quantum information(Cambridge university press, 2018).

[16] Eliakim H Moore, “On the reciprocal of the general alge-braic matrix,” Bull. Am. Math. Soc. 26, 394–395 (1920).

Page 13: NISQ: Error Correction, Mitigation, and Noise Simulation

13

[17] Roger Penrose, “A generalized inverse for matrices,” inMathematical proceedings of the Cambridge philosophicalsociety, Vol. 51 (Cambridge University Press, 1955) pp.406–413.

[18] Michael M Wolf and David Perez-Garcia, “The in-verse eigenvalue problem for quantum channels,” arXivpreprint arXiv:1005.4545 (2010).

[19] Armands Strikis, Dayue Qin, Yanzhu Chen, Simon C.Benjamin, and Ying Li, “Learning-based quantum errormitigation,” (2020).

[20] Joseph Vovrosh, Kiran E Khosla, Sean Greenaway,Christopher Self, Myungshik Kim, and Johannes Knolle,

“Efficient mitigation of depolarizing errors in quantumsimulations,” arXiv preprint arXiv:2101.01690 (2021).

[21] Hyukjoon Kwon, Rick Mukherjee, and MS Kim, “Re-versing open quantum dynamics via continuous petz re-covery map,” arXiv preprint arXiv:2104.03360 (2021).

[22] Erik Nielsen, John King Gamble, Kenneth Rudinger,Travis Scholten, Kevin Young, and Robin Blume-Kohout, “Gate set tomography,” Quantum 5, 557 (2021).

[23] Joseph Vovrosh, Kiran E. Khosla, Sean Greenaway,Christopher Self, M. S. Kim, and Johannes Knolle,“Simple mitigation of global depolarizing errors inquantum simulations,” Physical Review E 104 (2021),10.1103/physreve.104.035309.

Appendix A: The quasi-inverse of a TP map is also TP

Here we provide the proof for Theorem 2

Proof. Let the Jordan block of eigenvalue λ in J be Jλ, where J is defined in Eq. (10).The inverse of a k by k Jordan block Jλ (λ 6= 0) of v(N ) is

J−1λ = (λI +N)−1 = λ−1(I − λ−1N + · · ·λ−(k−1)N (k−1))

= λ−1

(k−1∑i=0

(−λ−1N)i

)=: J ′λ

where N is the k by k nilpotent matrix, λ is the eigenvalue.By the construction (Eq. (11)), we have

v(N+)Q = QJ ′, (A1)

where Q contains the eigenvectors and generalized eigenvectors of v(N ). For the particular block that we concern,the corresponding eigenvector and generalized eigenvector is

Q =(· · · vg0 vg1 · · · vgk−1 · · ·

)Let ei, i ∈ {0, · · · , k − 1} be the standard basis vectors for this block. Acting ej on both sides of Eq. (A1), the left

hand side is

v(N+)Qej = v(N+)vgj ,

and the right hand side is

QJ ′ej = Qλ−1

(k−1∑i=0

(−λ)−iN iej

).

It is easy to show that N iej = ej−i for j ≥ i, and N iej = 0 · ej for j < i. Therefore

QJ ′ej = Qλ−1

(j∑i=0

(−λ)iej−i

)= λ−1

(j∑i=0

(−λ)ivgj−i

).

Thus

v(N+)vgj = λ−1

(j∑i=0

(−λ)ivgj−i

). (A2)

For λ not equals to 1 and 0, taking trace on both sides of Eq. (A2),

sTr[v(N+)vgj

]= sTr

[λ−1

(j∑i=0

(−λ)ivgj−i

)].

Page 14: NISQ: Error Correction, Mitigation, and Noise Simulation

14

From Lemma 1, we know that sTr [vgj ] = 0, so the right hand side is also 0. That is, sTr [v(N+)vgj ] = sTr [vgj ] holdsfor every j ∈ {0, · · · , k − 1}.

When λ = 1, according to Lemma 2, we have the same results except for j = k − 1. Now we check j = k − 1 casefor λ = 1,

v(N+)vgk−1 =

(k−1∑i=0

(−1)ivgk−1−i

).

Taking trace on both sides,

sTr[v(N+)vgk−1

]= sTr

[(k−1∑i=0

(−1)ivgk−1−i

)]

From Lemma 2, we know that all the eigenvector and generalized eigenvectors have trace zero except for the (k−1)thone. We have

sTr[v(N+)vgk−1

]= sTr [vgk−1 ]

Finally, when λ = 0, J ′λ = 0k, where 0k is the k by k zero matrix. Thus, v(N+)vgj = 0 · ej . From Lemma 1, thetrace of both sides are zero.

Now we have proved that the trace of all columns (eigenvectors and generalized eigenvectors) in Q are unchangedunder the action of v(N+).

Since Q is invertible, any v(ρ) can be expended by columns vλj

i in Q. And we have

sTr

v(N+)(∑ij

aijvλj

i )

= sTr

∑ij

aijv(N+)(vλj

i )

= sTr

∑ij

aijvλj

i

,Hence, N+ is trace preserving.

Appendix B: Survived States

0 1 2 3 4 5 6 [Radian]

0.80

0.85

0.90

0.95

1.00

Fide

lity

p

+

(a)

0 10 20 30 40 500.4

0.5

0.6

0.7

0.8

0.9

1.0

Fide

lity

p

+

(b)

FIG. 8: (a) Non-recovered and recovered fidelities for ρθ where ρθ = 12

(1 eiθ

e−iθ 1

); (b) results for 50 randomly

generated states: the green horizontal line is the average fidelity for states that recovered by the quasi-inversechannel N+; the blue line and red line are for using Moore-Penrose inverse N p and not applying recovery

respectively.

Page 15: NISQ: Error Correction, Mitigation, and Noise Simulation

15

For the composition of a non-invertible channel and it quasi-inverse defined in Eq. (11), v(N+)v(N ), the kerneldimension is at least one. For such a channel, part of the information inevitably leaks out of the systems.

Consider the case that the dimension of the kernel is one. Denote the vector in the kernel as ~k. A density matrixρ which does not effect by such channel v(N+)v(N ) satisfies

[v(N+)v(N )]v(ρ) = v(ρ),

i.e. v(ρ) · ~k = 0.

In Example 2, ~k = [√

22 , 0, 0,−

√2

2 ]. The quantum states that have a chance to perfectly survive this composedchannel are

ρ =

[12 b+ ic

b− ic 12

].

It is the disc of tr(ρZ) = 0 in Bloch Sphere. Now we can see the effect of the composed channel in Example 2. Any1-qubit state ρ can be decomposed to two parts ρ = λ1ρxy + λ2ρz. The channel erases ρz and left ρxy untouched.

Let F+(ρ) := F [(N+ ◦ N )(ρ), ρ], Fp(ρ) := F [(N p ◦ N )(ρ), ρ] and For(ρ) := F [N (ρ), ρ], where F (ρ1, ρ2) =

tr(√√

ρ1ρ2√ρ1

)is the fidelity between ρ1 and ρ2. In Fig. 8a, we can see that the Moore-Penrose inverse chan-

nel N p does not always preserve ρθ, while the quasi-inverse N+ perfectly recovered such states. Fig. 8b shows thefidelities for 50 randomly generated states. In this example, the quasi-inverse channel N+ does not decrease thefidelity (F+ ≥ For), and the Moore-Penrose inverse N p sometime make recovered states even further away from theoriginal states (Fig. 4 partly explained the reason). Although, in certain cases, N p has better performance than N+,the average fidelity F+ is greater than Fp.

Appendix C: The effect of imperfect knowledge about noise channels on fidelity

From the main text, we know that

ρEM = Un···1 ◦ (U†N−1)1···n(ρexpout),

ρidealout = Un···1 ◦ (U†N−1)1···n(ρexp

out),

where Un···1 := Un ◦ · · · ◦ U1 is the ideal set of circuits. (See Fig. 5.)

Imperfect knowledge about Ni leads to imperfect inverse N−1i . Let N−1

i = N−1i +∆N−1

i , where N−1i is the perfect

inverse of Ni.Let ∆ρEM := ρEM − ρideal

out , then

∆ρEM = Un···1 ◦[(U†N−1)1···n − (U†N−1)1···n

](ρexp

out) = Un···1 ◦∆N (ρexpout). (C1)

If we only consider the first order approximation ∆N (1) of ∆N , the first order correction term ∆ρ(1)EM would be

∆ρ(1)EM = Un···1 ◦∆N (1) ◦ Uexp(ρin) (C2)

= Un···1 ◦(

n∑i=1

U†1 ◦ N−11 · · · ◦ U†i ◦∆N−1

i ◦ · · · U†n ◦ N−1n

)◦ Uexp(ρin), (C3)

where Uexp := Nn ◦ Un ◦ · · · ◦ N1 ◦ U1 is the actual experimental operator.

Then ρEM ≈ ρidealout + ∆ρ

(1)EM. The fidelity between ρEM and ρideal

out is approximately

F (ρEM, ρidealout ) ≈ F (1)(ρEM, ρ

idealout ) := F (ρideal

out + ∆ρ(1)EM, ρ

idealout ) = ‖

√ρideal

out + ∆ρ(1)EM

√ρideal

out ‖tr,

it is the first order approximation of F (ρEM, ρidealout ).

By the Fuchs–van de Graaf inequalities,

[1−D(∆ρ(1)EM)]2 ≤ F (1)(ρEM, ρ

idealout ) ≤ 1−D2(∆ρ

(1)EM), (C4)

Page 16: NISQ: Error Correction, Mitigation, and Noise Simulation

16

where D(·) := 12‖·‖tr is the trace distance, and ‖·‖tr is the trace norm. It is also known that ‖A‖F ≤ ‖A‖tr ≤

√r‖A‖F ,

where ‖A‖F is the Frobenius norm which equals to ‖v(A)‖. The norm ‖ · ‖ is the 2-norm.

‖∆ρ(1)EM‖F = ‖v(∆ρ

(1)EM)‖ = ‖v(Un···1)v(∆N (1))v(Uexp)v(ρin)‖ = ‖v(Un···1)v(∆N (1))v(ρexp

out)‖

lU · ‖v(∆N (1))v(ρexpout)‖ ≤ ‖∆ρ(1)

EM‖F ≤ ‖v(Un···1)‖ · ‖v(∆N (1))‖ · ‖v(ρexpout)‖ (C5)

where lU := inf‖x‖=1 ‖v(Un···1)x‖ is the lower Lipschitz constant of the superoperator of the ideal circuits. Noticethat, on the right hand of Eq. (C5), ‖v(Un···1)‖ and ‖v(ρexp

out)‖ are known for a given experiment. Denote (‖v(Un···1)‖ ·‖v(ρexp

out)‖) as Cexp. From Eq. (C4), we know the fidelity between the mitigated state and the ideal state is bounded

(1− 1

2

√dCexp‖v(∆N (1))‖

)2

≤ F (1)(ρEM, ρidealout ) ≤ 1− 1

4

(lU · ‖v(∆N (1))v(ρexp

out)‖)2

.

In addition, the norm of v(∆N (1)) satisfies that

∥∥∥v(∆N (1))∥∥∥ ≤ n∏

k=1

∥∥∥v(U†k)∥∥∥ · n∑

i=1

∥∥v(∆N−1i )

∥∥ ∏j∈{1,··· ,n}

j 6=i

∥∥∥v(N−1j )

∥∥∥ , (C6)

where U†k and N−1j are known for a given EM tasks. The error on each inverse ∆N−1

i can be exposed at the lower

bound of the fidelity. And by counting the sampling cost of getting ∆N−1, one can bound the fidelity from thesampling cost.

Appendix D: A sufficient condition on improving expectation values

The goal of error mitigation on the expectation value of an observables A is

|Tr(ρEMA)− Tr(ρideal

out A)| ≤ |Tr

(ρideal

out A)− Tr(ρexp

outA)|. (D1)

The left hand side of Eq. (D1) is∣∣Tr[(ρEM − ρideal

out )A]∣∣ = |Tr(∆ρEMA)| = |Tr(Un···1 ◦∆N (ρexp

out) ·A)| =∣∣∣⟨v(U†n···1)v(A†),v(∆N )v(ρexp

out)⟩∣∣∣ . (D2)

The right hand side of Eq. (D1) equals to∣∣Tr[(ρideal

out − ρexpout)A

]∣∣ =∣∣Tr[(Un···1 ◦ (U†N−1)1···n − I)(ρexp

out) ·A]∣∣ =

∣∣∣Tr[Un···1 ◦ [(U†N−1)1···n − U†1···n](ρexp

out) ·A]∣∣∣

=∣∣∣⟨v(U†n···1)v(A†),v((U†N−1)1···n − U†1···n)v(ρexp

out)⟩∣∣∣ (D3)

It is difficult to draw conclusions directly from Eq. (D2) and Eq. (D3) since v(∆N ) and v(U†N−11···n − U†1···n) can

be arbitrary. However,∣∣∣⟨v(U†n···1)v(A†),v(∆N )v(ρexpout)

⟩∣∣∣ ≤ ∥∥∥v(U†n···1)v(A†)∥∥∥ ‖v(ρexp

out)‖ ‖v(∆N )‖ ,

∣∣∣⟨v(U†n···1)v(A†),v((U†N−1)1···n − U†1···n)v(ρexpout)

⟩∣∣∣ ≥ ∥∥∥v(U†n···1)v(A†)∥∥∥ ‖v(ρexp

out)‖ inf‖x‖=1

∥∥∥v((U†N−1)1···n − U†1···n)x∥∥∥ .

Therefore, if

‖v(∆N )‖ ≤ inf‖x‖=1

∥∥∥v((U†N−1)1···n − U†1···n)x∥∥∥ ,

Eq. (D1) is guaranteed. That means the EM process will improve the expectation value for any observable A and any

desired circuit Un···1 when the above is satisfied. It is a harsh requirement. If v((U†N−1)1···n−U†1···n) has a nontrivial

null space, than it will focus the noise channel estimation Ni to be perfect, i.e. Ni = Ni for ∀i ∈ {1, · · · , n}.

Page 17: NISQ: Error Correction, Mitigation, and Noise Simulation

17

Appendix E: Examples of oversimplified noise channels

The Karus representation of N and D are

N :{√

p1I,√p2X,

√p3Y,

√(1− p1 − p2 − p3)Z

};

D :

{√1− 3λ

4I,

√λ

4X,

√λ

4Y,

√λ

4Z

}.

For a given set of {p1, p2, p3}, what is the optimal λ to minimize ‖N−D‖ of a chosen norm ‖·‖? One approach is thatwe can write down a matrix representation of N and D, then solve λ by minimizing ‖N −D‖ (use a particular norm‖ · ‖). For different representations and different norms, the optimization outcome could be different. The optimal λwill bound the distance ‖N − D‖ from below for any possible experimental implementation for this particular norm‖ · ‖.

As mentioned in the main text, the two vectors, ~n := (√p1,√p2,√p3,√

(1− p1 − p2 − p3)) and ~d :=(√1− 3λ

4 ,√

λ4 ,√

λ4 ,√

λ4

), are also representations for N and D respectively. Since ~n and ~d are normalized, mini-

mizing the distance between N and D is equivalent to maximizing ~n · ~d. i.e.

maxλ∈[0,1]

{√p1(1− 3λ

4) + [√p2 +

√p3 +

√(1− p1 − p2 − p3)]

√λ

4

}.

This can be solved by taking the derivative of the expression, and setting it to be zero. The result is

λmax =[√p2 +

√p3 +

√(1− p1 − p2 − p3)]2p1

94p

21 + 3

4p1[√p2 +

√p3 +

√(1− p1 − p2 − p3)]2

, or λ = 1, or λ = 0. (E1)

The superoperators of N and D are

v(N ) = p1

1 0 0 00 1 0 00 0 1 00 0 0 1

+ p2

0 0 0 10 0 1 00 1 0 01 0 0 0

+ p3

0 0 0 10 0 −1 00 −1 0 01 0 0 0

+ (1− p1 − p2 − p3)

1 0 0 00 −1 0 00 0 −1 00 0 0 1

,

v(D) = (1− 3λ

4)

1 0 0 00 1 0 00 0 1 00 0 0 1

4

0 0 0 10 0 1 00 1 0 01 0 0 0

4

0 0 0 10 0 −1 00 −1 0 01 0 0 0

4

1 0 0 00 −1 0 00 0 −1 00 0 0 1

.

Even with the optimal λ in Eq. (E1), when p2, p3 and 1− p2− p3 are not equal to each other, the distance betweenN and D is not zero.

The following are two examples of different {pi} sets.

1. When p1 = p3 = 0 and p2 = 1, the optimal λmax is 1. Therefore

v(N ) =

0 0 0 10 0 1 00 1 0 01 0 0 0

, v(D) =1

2

1 0 0 10 0 0 00 0 0 01 0 0 1

In this case, the estimated D is not a invertible channel while N is invertible. The inverse D will definitelyworsen the outcomes.

Page 18: NISQ: Error Correction, Mitigation, and Noise Simulation

18

0 10 20 30 40 50

0.6

0.4

0.2

0.0

0.2

0.4

0.6Tr

(Y)

1

ideal

(a) Expectation values of Pauli Y for 50 randomly generatedstates.

0 10 20 30 40 50

0.925

0.950

0.975

1.000

1.025

1.050

1.075

1.100

1.125

Fide

lity 1

(b) Fidelities of 50 randomly generated states.

FIG. 9: The x-axis is a dummy label for the tested states. Because the channel D−1 ◦ N is not physical (not CP),the output D−1 ◦ N (ρ) are not eligible quantum states. In this case the fidelity is no longer a good metric for

distinguishing two “states”.

2. When p1 = 12 and p2 = p3 = 0, according to Eq. (E1), λmax = 1

3 .

v(N ) =

1 0 0 00 0 0 00 0 0 00 0 0 1

, v(D) =1

6

5 0 0 10 4 0 00 0 4 01 0 0 5

The inverse of D is

v(D−1) =1

4

5 0 0 −10 6 0 00 0 6 0−1 0 0 5

.

Therefore,

v(N re) := v(D−1)v(N ) =1

4

5 0 0 −10 0 0 00 0 0 0−1 0 0 5

This resulting channel N re has eigenvalues { 3

2 , 1, 0, 0}, which will worsen the outcome. In Fig. 9, we tested50 randomly generated quantum state ρ for this example. Fig. 7 shows the expectation value of Pauli Y forthese 50 states. The information of the expectation value Tr(Y ρ) is erased by the noise channel N can notbe helped by D−1. For Tr(Zρ) in Fig. 7, the channel D−1 has made the outcome worse. Fig. 9b shows thefidelities F (N (ρ), ρ) and F (N re(ρ), ρ). Since D−1 is non-CP, the outputs D−1 ◦ N (ρ) are not valid quantumstates anymore. The fidelity function does not always smaller than 1, thus is no longer a good metric. Thisexplains why the recovery D−1 does not improve any expectation value but seems to have higher fidelities.