Variable-length Feedback Codes with Several Decoding Times ...

13
1 Variable-length Feedback Codes with Several Decoding Times for the Gaussian Channel Recep Can Yavas, Victoria Kostina, and Michelle Effros Abstract—We investigate variable-length feedback (VLF) codes for the Gaussian point-to-point channel under maximal power, average error probability, and average decoding time con- straints. Our proposed strategy chooses K < decoding times n1,n2,...,nK rather than allowing decoding at any time n =0, 1, 2,... . We consider stop-feedback, which is one-bit feedback transmitted from the receiver to the transmitter at times n1,n2,... only to inform her whether to stop. We prove an achievability bound for VLF codes with the asymptotic approximation ln M NC(P ) 1- - q N ln (K-1) (N ) V (P ) 1- , where ln (K) (·) denotes the K-fold nested logarithm function, N is the average decoding time, and C(P ) and V (P ) are the capacity and dispersion of the Gaussian channel, respectively. Our achievabil- ity bound evaluates a non-asymptotic bound and optimizes the decoding times n1,...,nK within our code architecture. Index Terms—Variable-length stop-feedback codes, Gaussian channel, second-order achievability bound. I. I NTRODUCTION Although Shannon’s work [1] shows that feedback does not increase the capacity of memoryless, point-to-point chan- nels, it is known that feedback has several important bene- fits in channel coding such as simplifying coding schemes and improving higher-order achievable rates. Several results demonstrate this effect in the fixed-length regime. Feedback simplifies coding in Horstein’s scheme for the binary symmet- ric channel [2] and Schalkwijk and Kailath’s scheme [3] for the Gaussian channel. Wagner et al. [4] show that feedback improves the second-order achievable rate for any discrete memoryless channel (DMC) with multiple capacity-achieving input distributions giving distinct dispersions. The benefits of feedback increase for codes with arbitrary decoding times (called variable-length or rateless codes). In [5], Burnashev shows that feedback significantly improves the optimal error exponent of variable-length codes for DMCs. In [6], Polyanskiy et al. extend the work of Burnashev to the finite-length regime with non-vanishing error probabili- ties, introducing variable-length feedback (VLF) and variable- length feedback with termination (VLFT) codes, and deriving achievability and converse bounds for their performance. In VLF codes, the receiver decides when to stop transmissions and decode; in VLFT codes, the transmitter decides when to stop transmissions using its knowledge of the source message and the feedback that it receives from the receiver. As a R. C. Yavas, V. Kostina, and M. Effros are with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA (e-mail: ryavas, vkostina, [email protected]). This work was supported in part by the National Science Foundation (NSF) under grant CCF-1817241 and CCF-1956386. special case of VLF codes, Polyanskiy et al. define variable- length stop-feedback (VLSF) codes that use feedback from the receiver to the transmitter at each potential decoding time to inform the transmitter whether to stop transmitting; in this regime, the codewords are fixed when the code is designed, and feedback affects how much of a codeword is sent but not that codeword’s value. The result in [6, Th. 2] shows that variable-length decoding improves the first-order term in the asymptotic expansion of the maximum achievable message size from NC to NC 1- , where C is the capacity of the DMC, N is the average decoding time, and is the average error probability. The second-order achievable term within the class of VLF and VLFT codes is O(ln N ), which means that VLF and VLFT codes have zero dispersion, and the convergence to the capacity is much faster than that achieved by the fixed- length codes [7], [8]. Variations of VLSF and VLFT codes are studied in [9]– [15]. In [9], Kim et al. consider VLSF codes where decoding must occur at a decoding time less than or equal to some constant and the decoding times satisfy n i = id for some d Z + . In [10], Altu˘ g et al. modify the VLSF coding paradigm by replacing the average decoding time constraint with a constraint on the probability that the decoding time exceeds a target value; the benefit in the first-order term does not appear under this probabilistic delay constraint [10]. Truong and Tan [11], [12] extend the results in [6] to the Gaussian point-to-point and multiple access channels under an average power constraint. Trillingsgaard et al. [13] study the VLSF scenario where a common message is transmitted across a K-user discrete memoryless broadcast channel. For the Gaussian channel, the effect of feedback depends both on whether the power constraint limits maximum or average power 1 and on whether the code is fixed-length or variable-length. For example, in the fixed-length regime, feedback does not improve the code performance in its first-, second-, or third-order terms under the maximal power con- straint [8], but does improve the achievable second-order term under the average power constraint [16]. In the variable- length regime where the decoder can decode at any time and an average power constraint is employed, feedback improves performance in the first-order term [11], [12]. While high rates of feedback are impractical for many applications — especially wireless applications on half-duplex devices most prior work on VLSF codes [9]–[13] allows an unbounded number of potential decoding times, i.e., K = . 1 The maximal power constraint is also called the short-term, per-codeword, or peak power constraint; the average power constraint is also known as the long-term or expected power constraint. arXiv:2103.09373v1 [cs.IT] 17 Mar 2021

Transcript of Variable-length Feedback Codes with Several Decoding Times ...

1

Variable-length Feedback Codes with SeveralDecoding Times for the Gaussian Channel

Recep Can Yavas, Victoria Kostina, and Michelle Effros

Abstract—We investigate variable-length feedback (VLF) codesfor the Gaussian point-to-point channel under maximal power,average error probability, and average decoding time con-straints. Our proposed strategy chooses K < ∞ decodingtimes n1, n2, . . . , nK rather than allowing decoding at any timen = 0, 1, 2, . . . . We consider stop-feedback, which is one-bitfeedback transmitted from the receiver to the transmitter attimes n1, n2, . . . only to inform her whether to stop. We provean achievability bound for VLF codes with the asymptoticapproximation lnM ≈ NC(P )

1−ε −√

N ln(K−1)(N)V (P )1−ε , where

ln(K)(·) denotes the K-fold nested logarithm function, N is theaverage decoding time, and C(P ) and V (P ) are the capacity anddispersion of the Gaussian channel, respectively. Our achievabil-ity bound evaluates a non-asymptotic bound and optimizes thedecoding times n1, . . . , nK within our code architecture.

Index Terms—Variable-length stop-feedback codes, Gaussianchannel, second-order achievability bound.

I. INTRODUCTION

Although Shannon’s work [1] shows that feedback doesnot increase the capacity of memoryless, point-to-point chan-nels, it is known that feedback has several important bene-fits in channel coding such as simplifying coding schemesand improving higher-order achievable rates. Several resultsdemonstrate this effect in the fixed-length regime. Feedbacksimplifies coding in Horstein’s scheme for the binary symmet-ric channel [2] and Schalkwijk and Kailath’s scheme [3] forthe Gaussian channel. Wagner et al. [4] show that feedbackimproves the second-order achievable rate for any discretememoryless channel (DMC) with multiple capacity-achievinginput distributions giving distinct dispersions.

The benefits of feedback increase for codes with arbitrarydecoding times (called variable-length or rateless codes). In[5], Burnashev shows that feedback significantly improves theoptimal error exponent of variable-length codes for DMCs.In [6], Polyanskiy et al. extend the work of Burnashev tothe finite-length regime with non-vanishing error probabili-ties, introducing variable-length feedback (VLF) and variable-length feedback with termination (VLFT) codes, and derivingachievability and converse bounds for their performance. InVLF codes, the receiver decides when to stop transmissionsand decode; in VLFT codes, the transmitter decides when tostop transmissions using its knowledge of the source messageand the feedback that it receives from the receiver. As a

R. C. Yavas, V. Kostina, and M. Effros are with the Department of ElectricalEngineering, California Institute of Technology, Pasadena, CA 91125, USA(e-mail: ryavas, vkostina, [email protected]). This work was supported inpart by the National Science Foundation (NSF) under grant CCF-1817241and CCF-1956386.

special case of VLF codes, Polyanskiy et al. define variable-length stop-feedback (VLSF) codes that use feedback fromthe receiver to the transmitter at each potential decoding timeto inform the transmitter whether to stop transmitting; in thisregime, the codewords are fixed when the code is designed,and feedback affects how much of a codeword is sent butnot that codeword’s value. The result in [6, Th. 2] showsthat variable-length decoding improves the first-order term inthe asymptotic expansion of the maximum achievable messagesize from NC to NC

1−ε , where C is the capacity of the DMC,N is the average decoding time, and ε is the average errorprobability. The second-order achievable term within the classof VLF and VLFT codes is O(lnN), which means that VLFand VLFT codes have zero dispersion, and the convergenceto the capacity is much faster than that achieved by the fixed-length codes [7], [8].

Variations of VLSF and VLFT codes are studied in [9]–[15]. In [9], Kim et al. consider VLSF codes where decodingmust occur at a decoding time less than or equal to someconstant ` and the decoding times satisfy ni = id for somed ∈ Z+. In [10], Altug et al. modify the VLSF codingparadigm by replacing the average decoding time constraintwith a constraint on the probability that the decoding timeexceeds a target value; the benefit in the first-order termdoes not appear under this probabilistic delay constraint [10].Truong and Tan [11], [12] extend the results in [6] to theGaussian point-to-point and multiple access channels underan average power constraint. Trillingsgaard et al. [13] studythe VLSF scenario where a common message is transmittedacross a K-user discrete memoryless broadcast channel.

For the Gaussian channel, the effect of feedback dependsboth on whether the power constraint limits maximum oraverage power1 and on whether the code is fixed-lengthor variable-length. For example, in the fixed-length regime,feedback does not improve the code performance in its first-,second-, or third-order terms under the maximal power con-straint [8], but does improve the achievable second-order termunder the average power constraint [16]. In the variable-length regime where the decoder can decode at any time andan average power constraint is employed, feedback improvesperformance in the first-order term [11], [12].

While high rates of feedback are impractical for manyapplications — especially wireless applications on half-duplexdevices most prior work on VLSF codes [9]–[13] allows anunbounded number of potential decoding times, i.e., K =∞.

1The maximal power constraint is also called the short-term, per-codeword,or peak power constraint; the average power constraint is also known as thelong-term or expected power constraint.

arX

iv:2

103.

0937

3v1

[cs

.IT

] 1

7 M

ar 2

021

2

Exceptions include [14], which numerically evaluates the per-formance of a VLSF code with K <∞ decoding times usinga new reliability-output Viterbi algorithm at the decoder, and[15], which introduces a sequential differential optimizationalgorithm to optimize the choices of K potential decodingtimes n1, . . . , nK , approximating the probability P [τ ≤ n]that random stopping time τ is no greater than n by adifferentiable function F (n).

This paper presents the first asymptotic expansion for therate achievable by VLSF codes between the two extremes:K = 1 (the fixed-length regime analyzed in [7], [17]), andK = ∞ (the classical variable-length regime defined in [6,Def. 1]). We consider VLSF codes over the Gaussian point-to-point channel and limit the number of decoding times tosome finite integer K that does not grow with the averagedecoding time N . We impose a new maximal power constraint,bounding the power of codewords at every potential decodingtime. The feedback rate of our code is k

nkwhen the decoding

time is nk. Thus our feedback rate approaches 0 as nk growsfor each nk while most other VLSF codes use feedback rate1 bit per channel use. Throughout the paper, we employ theaverage error and average decoding time criteria. Our mainresult shows that for VLSF codes with 2 ≤ K <∞ decodingtimes, message set size M satisfying

lnM ≈ NC(P )

1− ε−√N ln(K−1)(N)

V (P )

1− ε(1)

is achievable. Here ln(K)(·) denotes the K-fold nested log-arithm function, N is the average decoding time, and C(P )and V (P ) are the capacity and dispersion of the Gaussianchannel, respectively. The order of the second-order term in(1) depends on K. The convergence to C(P )

1−ε in (1) is slowerthan the convergence to C(P ) in the fixed-length scenario,which has second-order term O(

√N) [7]. The K = 2 case

in (1) recovers the variable-length scenario without feedback,which has second-order term O(

√N lnN) [6, Th. 1], achieved

with n1 = 0; our bound in Theorem 4 shows that whenK = ∞, the asymptotic approximation in (1) is achievablewith the second-order term replaced by −O(

√N). Our result

in (1) demonstrates how the performance of VLSF codesinterpolates between these two extremes. Despite the order-wise dependence on K, (1) grows so slowly with K that itsuggests little benefit to choosing a large K. For example,when K = 4,

√N ln(K−1)(N) behaves very similarly to

O(√N) for practical values of N (e.g., N ∈ [103, 105]).

Notice, however that the given achievability result providesa lower bound on the benefit of increasing K; bounding thebenefit from above would require a converse, a topic left tofuture work. Table I, below, combines the K = 1 summaryfrom [16, Table I] with the corresponding results for K > 1 tosummarize the performance of VLSF codes for the Gaussianchannel in different communication scenarios.

In what follows, Section II introduces VLSF codes with Kdecoding times, and Section III presents our main results anddiscusses their implications. Proofs are found in Section IV.

II. PROBLEM STATEMENT

A. Notation

For any positive integers k and n, [k] , {1, . . . , k} andxn , (x1, . . . , xn). For any xn, and integers n1 ≤ n2 ≤ n,xn2n1

, (xn1 , xn1+1, . . . , xn2). All-zero and all-one vectorsare denoted by 0 and 1, respectively. The identity matrix ofdimension n is denoted by In. We denote the minimum of twoscalars a and b by a ∧ b. The Euclidean norm of vector xn

is denoted by ‖xn‖ ,√∑n

i=1 x2i . The n-dimensional sphere

with radius r is denoted by Sn(r) , {xn ∈ Rn : ‖xn‖ = r}.We use ln(·) to denote the natural logarithm. We measureinformation in nats. We use the standard O(·) and o(·) nota-tions, i.e., f(n) = O(g(n)) if lim supn→∞ |f(n)/g(n)| <∞and f(n) = o(g(n)) if limn→∞ |f(n)/g(n)| = 0. We denotethe distribution of a random variable X by PX , and we writeN (µ,V) to denote the multivariate Gaussian distribution withmean µ and covariance matrix V. We use Q(·) to representthe complementary Gaussian cumulative distribution functionQ(x) , 1√

∫∞x

exp{− t

2

2

}dt and Q−1(·) to represent its

functional inverse.The k-fold nested logarithm function is defined as

ln(k)(x) ,

{ln(x) if k = 1, x > 0

ln(ln(k−1)(x)) if k > 1, ln(k−1)(x) > 0,(2)

and undefined for all other (k, x) pairs.

B. Channel Model

The output of a memoryless, point-to-point Gaussian chan-nel in response to input Xn ∈ Rn is

Y n = Xn + Zn, (3)

where Z1, . . . , Zn are N (0, 1) random variables independentof Xn and of each other.

The channel’s capacity and dispersion are

C(P ) ,1

2ln(1 + P ) (4)

V (P ) ,P (P + 2)

2(1 + P )2, (5)

respectively. The information density of a channel PY n|Xnunder input distribution PXn is defined as

ı(xn; yn) , lnPY n|Xn(yn|xn)

PY n(yn), (6)

where PY n is the marginal of PXnPY n|Xn .

C. VLSF Codes with K Decoding Times

We consider VLSF codes with a finite number of potentialdecoding times n1 < n2 < · · · < nK . The receiver choosesto end the transmission at the first time nk ∈ {n1, . . . , nK}that it is ready to decode. The transmitter learns of thereceiver’s decision via a single bit of feedback at each oftimes n1, . . . , nk. Feedback bit “0” at time ni means thatthe receiver is not yet ready to decode and the transmittershould continue; feedback bit “1” means that the receivercan decode at time ni and the transmitter must stop. We

3

TABLE ITHE PERFORMANCE OF VLSF CODES FOR THE GAUSSIAN CHANNEL IN SCENARIOS DISTINGUISHED BY THE NUMBER OF AVAILABLE DECODING TIMES

K , THE TYPE OF THE POWER CONSTRAINT, AND THE PRESENCE OF FEEDBACK.

First-order termSecond-order term

Lower Bound Upper Bound

Fixed-length

(K = 1)

No FeedbackMax. power NC(P ) −

√NV (P )Q−1(ε) ([7], [17]) −

√NV (P )Q−1(ε) ([7])

Ave. power NC(P

1−ε

)−√N lnNV

(P

1−ε

)([18]) −

√N lnNV

(P

1−ε

)([18])

FeedbackMax. power NC(P ) −

√NV (P )Q−1(ε) [7], [17] −

√NV (P )Q−1(ε) ([8])

Ave. power NC(P

1−ε

)−O(ln(K)(N)) ([16]) +

√N lnNV

(P

1−ε

)([16])

Variable-length

(K <∞)

Max. power NC(P )1−ε −

√N ln(K−1)(N)

V (P )1−ε (Theorem 1) +O(1) ([11])

Ave. power NC(P )1−ε −

√N ln(K−1)(N)

V (P )1−ε (Theorem 1) +O(1) ([11])

Variable-length

(K =∞)

Max. power NC(P )1−ε −

√N

4C(P ) ln J(P )1−ε (Theorem 4) +O(1) ([11])

Ave. power NC(P )1−ε − lnN ([11]) +O(1) ([11])

impose a maximal power constraint at each possible decodingtime as well as average decoding time and average errorprobability constraints. Definition 1, below, formalizes ourcode description.

Definition 1: Fix ε ∈ (0, 1), positive scalars N and P ,and non-negative integers n1 < . . . < nK and M . An(N, {ni}Ki=1,M, ε, P ) VLSF code comprises

1) a finite alphabet U and a probability distribution PU on Udefining a common randomness random variable U thatis revealed to both the transmitter and the receiver beforethe start of the transmission,2

2) a sequence of encoding functions fn : U × [M ] → R,n = 1, . . . , nK that assign a codeword

f(u,m)nK , (f1(u,m), . . . , fnK (u,m)) (7)

to each message m ∈ [M ] and common randomnessinstance u ∈ U . Each codeword satisfies the nestedmaximal power constraint P on all sub-codewords, giving

‖f(u,m)nk‖2 ≤ nkP ∀m ∈ [M ], u ∈ U , k ∈ [K], (8)

3) a non-negative integer-valued random stopping time τ ∈{n1, . . . , nK} for the filtration generated by {U, Y ni}Ki=1

that satisfies an average decoding time constraint

E [τ ] ≤ N, (9)

4) K decoding functions gnk : U×Rnk → [M ] for k ∈ [K],satisfying an average error probability constraint

P [gτ (U, Y τ ) 6= W |Xτ = f(U,W )τ ] ≤ ε, (10)

where the message W is uniformly distributed on the set[M ].

Random variable U is common randomness shared by thetransmitter and receiver. As in [6], [13], [19], the traditional

2The realization u of U specifies the codebook.

random-coding argument does not prove the existence of asingle (deterministic) code that simultaneously satisfies twoconditions on the code (e.g., (9) and (10)). Therefore, ran-domized codes are necessary for our achievability argument;here, |U| ≤ 2 suffices [19, Appendix D].

The average power constraint on length-nK codewords inTruong and Tan’s VLSF code [12, Def. 1] is given by

E[‖f(U,W )nK‖2

]=

1

M

M∑m=1

nK∑i=1

E[(fi(U,m))2

](11)

≤ NP. (12)

As noted in [12, eq. (100)-(103)], for any code with stoppingtime τ , the expected value on the left-hand side of (11) canbe replaced by the expected value E

[‖f(U,W )τ‖2

]because

setting the symbols after time-slot τ to 0 does not affect theaverage error probability. Taking the expected value of (8)with respect to stopping time τ ∈ {n1, . . . , nK} and mes-sage W shows that any code that satisfies our maximal powerconstraint (8) also satisfies the average power constraint (12).

We define the maximum achievable message sizeM∗(N,K, ε, P ) with K decoding times and nested maximalpower constraint P as

M∗(N,K, ε, P ) = max{M : an (N, {ni}Ki=1,M, ε, P )

VLSF code exists}. (13)

An (N, {ni}Ki=1,M, ε, P )ave VLSF code satisfying an averagepower constraint and its corresponding maximum achievablemessage size M∗(N,K, ε, P )ave are defined analogously byreplacing the maximal power constraint (8) by the averagepower constraint (12).

D. Related WorkThe following discussion summarizes prior asymptotic ex-

pansions of the maximum achievable message size for theGaussian channel.

4

1) M∗(N, 1, ε, P ): For K = 1, P > 0, and ε ∈ (0, 1), Tanand Tomamichel [17, Th. 1] and Polyanskiy et al. [7,Th. 54] show that

lnM∗(N, 1, ε, P )

= NC(P )−√NV (P )Q−1(ε) +

1

2lnN +O(1). (14)

The converse for (14) is derived [7, Th. 54] and theachievability for (14) in [17, Th. 1], which generates i.i.d.codewords uniformly distributed on the n-dimensionalsphere with radius

√nP , and applies maximum likeli-

hood (ML) decoding. These results imply that randomcodewords uniformly distributed on a sphere and MLdecoding are, together, third-order optimal, meaning thatthe gap between the achievability and converse boundsin (14) is O(1).

2) M∗(N, 1, ε, P )ave: For K = 1 with an average-power-constraint, Yang et al. show in [18] that

lnM∗(N, 1, ε, P )ave = N C

(P

1− ε

)−

√N lnN V

(P

1− ε

)+O(

√N). (15)

Yang et al. use a power control argument to showthe achievability of (15). They divide the messagesinto disjoint sets A and [M ] \ A, where |A| =M(1 − ε)(1 − o(1)). For the messages in A, they usean(N, {N}, |A|, 2√

N lnN, P

1−ε (1− o(1)))

VLSF codewith a single decoding time N . The codewords aregenerated i.i.d. uniformly on the sphere with radius√N P

1−ε (1− o(1)). The messages in [M ]\A are assignedthe all-zero codeword. The converse for (15) follows froman application of the meta-converse [7, Th. 26].

3) M∗(N,∞, ε, P )ave: For VLSF codes with K =∞, ni =i−1 for all i, and average power constraint (12), Truongand Tan show in [11, Th. 1] that for any ε ∈ (0, 1) andP > 0,

lnM∗(N,∞, ε, P )ave ≥NC(P )

1− ε− lnN +O(1) (16)

lnM∗(N,∞, ε, P )ave ≤NC(P )

1− ε+hb(ε)

1− ε, (17)

where hb(ε) , −ε ln ε − (1 − ε) ln(1 − ε) is the binaryentropy function (in nats). The bounds in (16)–(17) indi-cate that the ε-capacity (the first-order achievable term)is

lim infN→∞

1

NlnM∗(N,K, ε, P ) =

C(P )

1− ε. (18)

The achievable dispersion term is zero, i.e., the second-order term in the fundamental limit in (16)–(17) iso(√N). The results in (16)–(17) are analogous to the

fundamental limits for DMCs [6, Th. 2] and follow fromarguments similar to those in [6]. Since the informationdensity ı(X;Y ) for the Gaussian channel is unbounded,bounding the expected value of the decoding time in theproof of [11, Th. 1] requires different techniques fromthose applicable to DMCs [6].

III. MAIN RESULT

Our main result is an asymptotic achievability bound for thescenario where K ≥ 2 decoding times are available.

Theorem 1: Fix a finite integer K ≥ 2 and real numbersP > 0 and ε ∈ (0, 1). For the Gaussian channel (3) with noisevariance 1 and N sufficiently large, the maximum messagesize (13) achievable by (N, {ni}Ki=1,M, ε, P ) VLSF codessatisfies

lnM∗ (N,K, ε, P ) ≥ NC(P )

1− ε−√N ln(K−1)(N)

V (P )

1− ε

+O

(√N

ln(K−1)(N)

). (19)

Proof: See Section IV-D.As noted previously, any (N, {ni}Ki=1,M, ε, P ) VLSF code isalso an (N, {ni}∞i=1,M, ε, P )ave VLSF code. Therefore, (17)provides an upper bound on lnM∗(N,K, ε, P ). Theorem 1and the converse bound in (17) together imply that the ε-capacity (18) achievable within the class of VLSF codes inDef. 1 is C(P )

1−ε . Neither switching from the average to themaximal power constraint nor limiting the number of decodingtimes to finite K changes the first-order term. In fact, the sameε-capacity is achievable by a variable-length code withoutfeedback that decodes at time 0 with probability ε(1 − o(1))(see the achievability proof of [6, Th. 1]). However, comparingTheorem 1 and (16), we see that the second-order term of ournew achievability bound is significantly worse than the earlierresults. Whether this is the consequence of our tighter powerconstraint and finite K or a weakness of our code constructionis a topic for future research.

In the achievability bound in Theorem 1, the order of thesecond-order term, −

√N ln(K−1)(N)V (P )

1−ε , depends on thenumber of decoding times K. The larger K grows, the achiev-able rate converges to the capacity. However, the dependenceon K is weak since ln(K−1)(N) grows very slowly in N evenwhen K is small. For example, for K = 4 and N = 1000,ln(K−1)(N) ≈ 0.659. Furthermore, for ε = 10−3, P = 1, andN = 1000, 83.6% of the ε-capacity is achieved with K = 1,85.3% with K = 2, 92.2% with K = 3, and 95.4% withK = 4. The achievability bounds for K ∈ [4] and the conversebound (17) are illustrated in Fig. 1.

Theorem 1 builds on the following achievability bound foran(N, {ni}Ki=1,M, 1√

N lnN, P)

VLSF code.Theorem 2: Fix an integer K ≥ 1 and a real number P > 0.

For the Gaussian channel (3) with noise variance 1 and Nsufficiently large, the maximum message size (13) achievableby(N, {ni}Ki=1,M, 1√

N lnN, P)

VLSF codes satisfies

lnM∗(N,K,

1√N lnN

,P

)≥ NC(P )

−√N ln(K)(N)V (P ) +O

(√N

ln(K)(N)

). (20)

Proof: See Section IV-B.The K = 1 case in Theorem 2 is recovered by [20, Th. 7],which investigates the moderate deviations regime in channelcoding.

5

200 400 600 800 1000 1200 1400 1600 1800 20000.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

Fig. 1. The achievability (Theorem 1) and converse (17) bounds for themaximum achievable rate lnM∗(N,K,ε,P )

Nare shown for K = 1 (14) and

K ∈ {2, 3, 4} (Theorem 1), P = 1 and ε = 10−3. The achievability boundsuse the asymptotic approximation, i.e., we ignore the O(·) term in (19).

Theorems 1 and 2 follow from an application of the non-asymptotic achievability bound in Theorem 3, below.

Theorem 3: Fix a constant γ and decoding times n1 < · · · <nK . For any positive numbers N,P , and ε ∈ (0, 1), thereexists an (N, {ni}Ki=1,M, ε, P ) VLSF code for the Gaussianchannel (3) with

ε ≤ P [ı(XnK ;Y nK ) < γ] + (M − 1) exp{−γ}

+P

[K⋃i=1

{‖Xni‖2 > niP

}](21)

N ≤ n1 +

K−1∑i=1

(ni+1 − ni)P

i⋂j=1

{ı(Xnj ;Y nj ) < γ}

, (22)

where PXnK is a product of distributions of K sub-vectors oflengths nj − nj−1, j ∈ [K], i.e.,

PXnK (xnK ) =

K∏j=1

PXnjnj−1+1

(xnjnj−1+1), (23)

where n0 = 0.Proof: See Section IV-A.

The following result is an achievability bound for an un-bounded number of decoding times, i.e., K = ∞ under themaximal power constraint (8).

Theorem 4: Fix real numbers P > 0 and ε ∈ (0, 1). Forthe Gaussian channel (3) with noise variance 1, the maximumachievable message size (13) of an (N, {ni}∞i=1,M, ε, P )VLSF code satisfies

lnM∗(N,∞, ε, P ) ≥ NC(P )

1− ε−√N

4C(P ) lnJ(P )

1− ε− lnN +O(1), (24)

where

J(P ) , 27

√π

8

1 + P√1 + 2P

. (25)

Although K = ∞ enables decoding at every time, we proveTheorem 4 using an achievability argument that employs onlytimes ni = (i− 1)`N , i = 1, 2, . . . , where `N = O(

√N).

Proof: See Section IV-F.

While the second-order term −√N ln(K−1)(N)V (P )

1−ε inTheorem 1 behaves similarly to −O(

√N) for practical values

of N , it is worse than −O(√N) for any finite K. Theorem 4

demonstrates that the second-order term −O(√N) is achiev-

able when K =∞.

IV. PROOFS

We prove Theorems 3, 2, 1, and 4 in that order.

A. Proof of Theorem 3

The proof of Theorem 3 extends the arguments of the ran-dom coding bound in [6, Th. 3] to the scenario where we allowonly K decoding times, and each codeword satisfies the listof maximal power constraints in (8). Variations of Theorem 3are proved in [9] and [14]. Our encoding function generatescodewords of length nK that comprise K independent sub-codewords. The proof of Theorem 3 extends the analysis in[9, Sec. III] to account for the maximal power constraint (8)on the sub-codewords.

Random code design: Fix some PXnK such that (23) holds.Define the common randomness random variable U on RMnK

with the distribution

PU = PnKX × PnKX × · · · × PnKX︸ ︷︷ ︸M times

. (26)

The realization of U defines the M length-nK codewordsXnK

1 , XnK2 , . . . , XnK

M .Decoder design: Define the information density for message

m and decoding time nk as

Sm,nk , ı(Xnkm ;Y nk) for m ∈ [M ], k ∈ [K]. (27)

Denote the set of decoding times by

T , {n1, n2, . . . , nK}. (28)

The decoder makes a decision at the first decoding timenk ∈ T that the information density exceeds the threshold,i.e., Sm,nk ≥ γ for some message m ∈ [M ]. If Sm,nt < γfor all m ∈ [M ] and nt ∈ T , then none of the informationdensities ever reaches the threshold, i.e., the decision is madeat time nK . The decoding time τ∗ is given by

τ∗ ,

nk if ∃m ∈ [M ] s.t. Sm,nk ≥ γ, and

Sm,nt < γ ∀m ∈ [M ], nt ∈ T , t ∈ [k − 1]

nK if Sm,nt < γ ∀m ∈ [M ], nt ∈ T .(29)

If there exists some m ∈ [M ] such that Sm,τ∗ ≥ γ, then thedecoder’s output at time τ∗ is

gτ∗(U, Ynτ∗ ) , max{m ∈ [M ] : Sm,τ∗ ≥ γ}, (30)

and otherwise, the decoder outputs an arbitrary message.Expected decoding time analysis: Define the stopping times

τm , inf{nk ∈ T : Sm,nk ≥ γ}, ∀m ∈ [M ]. (31)

When the set over which we take the infimum is empty, wedefine the infimum to be equal ∞. For example, in (31), thecase τm = ∞ signifies that the information density of the

6

message m never reaches the threshold, i.e., Sm,nk < γ forall nk ∈ T .

The expected decoding time is bounded as

E [τ∗] = E[

minm∈[M ]

τm ∧ nK]

(32)

≤ E [τ1 ∧ nK |W = 1] (33)

=

∞∑n=0

P [τ1 ∧ nK > n|W = 1] (34)

= n1 +

K−1∑i=1

(ni+1 − ni)P [τ1 > ni|W = 1] , (35)

which is equal to the upper bound (22) on the averagedecoding time.

Error probability analysis: Denote

τ , inf{nk ∈ T : ı(Xnk ;Y nk) ≥ γ} (36)

τ , inf{nk ∈ T : ı(Xnk ;Y nk) ≥ γ}, (37)

where XnK is any codeword other than the transmitted XnK

and therefore is independent of Y nK , i.e.,

PXnK XnKY nK (xnK , xnK , ynK )

= PnKX (xnK )PnKX (xnK )PY nK |XnK (ynK |xnK ). (38)

The error probability is bounded as

ε ≤ P [gτ∗(U, Ynτ∗ ) 6= 1|W = 1] (39)

≤ P[ M⋃m=2

{τm ≤ τ1 <∞}⋃{τ1 =∞}

⋃{K⋃i=1

{‖Xni

1 ‖2> niP

}} ∣∣∣∣W = 1

](40)

≤ P

[M⋃m=2

{τm <∞}

∣∣∣∣∣W = 1

]+ P [τ1 =∞|W = 1]

+P

[K⋃i=1

{‖Xni

1 ‖2> niP

}∣∣∣∣∣W = 1

](41)

≤ (M − 1)P [τ <∞] + P [τ =∞]

+P

[K⋃i=1

{‖Xni‖2 > niP

}]. (42)

where (39)–(40) follow from the definition of the decoder in(30). In (40), the event that the codeword for message W = 1violates the power constraint (8) at any decoding time is treatedas an error. Inequality (42) follows by the union bound andthat given W = 1, τ1 is distributed the same as τ , and τm,m 6= 1 is distributed the same as τ . Using the same argumentsas in [6, eq. (111)-(118)], we have

P [τ <∞] = E [exp{−ı(XnK ;Y nK )}1{τ <∞}] (43)= E [exp{−ı(Xτ ;Y τ )}1{τ <∞}] (44)≤ exp{−γ}, (45)

where (43) follows from changing measure fromPXnKY nK to PXnKY nK , (44) follows since{exp{−ı(Xnk ;Y nk)} : nk ∈ T } is a martingale due tothe product distribution in (38), and (45) follows from the

definition of τ in (36). The second probability in (42) isbounded as

P [τ =∞] = P

[K⋂i=1

{ı(Xni ;Y ni) < γ}

](46)

≤ P [ı(XnK ;Y nK ) < γ] , (47)

where (46) follows from (36). Combining (42), (45), and (47)completes the proof of Theorem 3.

B. Proof of Theorem 2

In the proof that follows, we first present three lemmasused in the proof of Theorem 2 (step 1), we then choose thedistribution PnKX of the random codewords (step 3) and theparameters n1, . . . , nK , γ in Theorem 3 (step 4). Finally, weanalyze the bounds in Theorem 3 under the chosen distributionand parameters (step 5).

1) Supporting lemmas: Lemma 1, below, is the moderatedeviations result that bounds the probability that a sum of ni.i.d. random variables is above a function of n that grows atmost as quickly as n2/3.

Lemma 1 (Petrov [21, Ch. 8, Th. 4]): Let Z1, . . . , Znbe i.i.d. random variables. Let E[Z1] = 0, σ2 = Var [Z1],and µ3 = E

[Z3

1

]. Suppose that the moment generating

function E [exp{tZ]} is finite in the neighborhood of t = 0.(This condition is known as Cramer’s condition.) Given anonnegative zn = O(n1/6), as n→∞ it holds that

P

[n∑i=1

Zi ≥ znσ√n

]= Q(zn) exp

{z3nµ3

6√nσ3

}+O

(1√n

exp

{−z

2n

2

}). (48)

Lemma 2, below, uniformly bounds the Radon-Nikodymderivative of the channel output distribution in response tothe uniform distribution on a sphere compared to the channeloutput distribution in response to i.i.d. Gaussian distribution.

Lemma 2 (MolavianJazi and Laneman [22, Prop. 2]): LetXn be distributed uniformly over the n-dimensional spherewith radius

√nP , Sn(

√nP ). Let Xn ∼ N (0, P In). Let PY n

and PY n denote the channel output distributions in responseto PXn and PXn , respectively, where PY n|Xn is the point-to-point Gaussian channel (3). Then there exists an n0 ∈ N suchthat for all n ≥ n0 and yn ∈ Rn, it holds that

PY n(yn)

PY n(yn)≤ J(P ) , 27

√π

8

1 + P√1 + 2P

. (49)

The following lemma gives the asymptotic expansion ofthe root of an equation, which is used to find the asymptoticexpansion for the gap between two consecutive decoding timesni and ni+1.

Lemma 3: Let f(x) be a differentiable increasing functionthat satisfies f ′(x)→ 0 as x→∞. Suppose that

x+ f(x) = y. (50)

Then, as x→∞ it holds that

x = y − f(y) (1− o(1)) . (51)

7

Proof of Lemma 3: Define the function F (x) , x+f(x)−y.Applying Newton’s method with the starting point x0 = yyields

x1 = x0 −F (x0)

F ′(x0)(52)

= y − f(y)

1 + f ′(y)(53)

= y − f(y)(1− f ′(y) +O(f ′(y)2). (54)

We have f ′(y) = o(1) by assumption. Equality (54) followsfrom the Taylor series expansion of the function 1

1+x aroundx = 0. Let

x? = y − f(y)(1− o(1)). (55)

From Taylor’s theorem, it follows that

f(x?) = f(y)− f ′(y0)f(y)(1− o(1)), (56)

for some y0 ∈ [y − f(y)(1 − o(1)), y]. Therefore, f ′(y0) =o(1), and f(x?) = f(y)(1− o(1)). Putting (55)–(56) in (50),we see that x? is a solution to the equality in (50).

2) Random encoder design: To prove Theorem 2, wechoose the distribution of the random codewords, PXnK , inTheorem 3 as follows. Set n0 = 0. For each codeword, weindependently draw sub-codewords Xnj

nj−1+1, j ∈ [K] fromthe uniform distribution on the (nj−nj−1)-dimensional spherewith radius

√(nj − nj−1)P , i.e.,

PXnK (xnK ) =

K∏j=1

PXnjnj−1+1

(xnjnj−1+1

)(57)

PXnjnj−1+1

(xnjnj−1+1

)=

δ

(∥∥∥xnjnj−1+1

∥∥∥2

− (nj − nj−1)P

)S(nj−nj−1)(

√(nj − nj−1)P )

,

(58)

where δ(·) is the Dirac delta function, and

Sn(r) =2πn/2

Γ(n/2)rn−1 (59)

is the surface area of an n-dimensional sphere Sn(r) withradius r. Notice that codewords chosen under the input distri-bution in (57) never violate the power constraint (8), giving

P

[K⋃i=1

{‖Xni‖2 > niP

}]= 0. (60)

Our random coding design distributes codewords uniformlyat random on the subset of the surface of SnK (

√nKP ) that

satisfies the power constraint with equality at each of thedimensions n1, . . . , nK ; our analysis in [23] shows that forany finite K, and sufficiently large increments ni − ni−1 forall i ∈ [K], using this restricted subset instead of the entirenK-dimensional power sphere results in no change in theasymptotic expansion (14) up to the third-order term.

From Shannon’s work in [24], it is well-known that for theGaussian channel with a maximal power constraint, drawingi.i.d. Gaussian codewords yields a performance inferior to that

achieved by the uniform distribution on the power sphere. Asa result, almost all tight achievability bounds for the Gaussianchannel in the fixed-length regime under a variety of settings(e.g., all four combinations of the maximal/average powerconstraint and feedback/no feedback [8], [16]–[18] in Table I)employ random codewords drawn uniformly on the powersphere. A notable exception is Truong and Tan’s result in(16) [11, Th. 1] for VLSF codes with an average powerconstraint; that result employs i.i.d. Gaussian inputs. TheGaussian distribution works in this scenario because whenK = ∞, the usually dominant term P [ı(XnK ;Y nK ) < γ] in(21) disappears. The second term (M − 1) exp{−γ} in (21)is not affected by the input distribution. Unfortunately, theapproach from [11, Th. 1] does not work here since drawingcodewords i.i.d. N (0, P ) satisfies the average power constraint(12) but not the maximal power constraint (8). When K <∞and the probability P [ı(XnK ;Y nK ) < γ] dominates, usingi.i.d. N (0, P ) inputs achieves a worse second-order term inthe asymptotic expansion (19) of the maximum achievablemessage size. This implies that when K < ∞, using ouruniform distribution on a subset of the power sphere is superiorto choosing codewords i.i.d. N (0, P ) even under the averagepower constraint. In particular, i.i.d. N (0, P ) inputs achieve(19), where the dispersion V (P ) is replaced by the varianceV (P ) = P

1+P of ı(X;Y ) when X ∼ N (0, P ); here V (P )is greater than the dispersion V (P ) for all P > 0 (see [25,eq. (2.56)]). Whether our input distribution is optimal in thesecond-order term remains an open question.

3) Bounding the probability of the information densityrandom variable: For each k ∈ [K], we here bound theprobability

P [ı(Xnk ;Y nk) < γ] (61)

that appears in Theorem 3 under the input distribution (57).Note that the random variable ı(Xnk ;Y nk) is not a sum ofnk i.i.d. random variables. We wish to apply the moderatedeviations result in Lemma 1. To do this, we first bound theprobability in (61) as follows. Let Pnk

Ybe N (0, (1 + P )Ink).

By Lemma 2, we bound (61) as

P [ı(Xnk ;Y nk) < γ]

= P[lnPY nk |Xnk (Y nk |Xnk)

PY nk (Y nk)< γ + ln

PY nk (Y nk)

PY nk (Y nk)

](62)

≤ P[lnPY nk |Xnk (Y nk |Xnk)

PY nk (Y nk)< γ + k ln J(P )

], (63)

where J(P ) is the constant given in (49), and (63) followsfrom the fact that PY nk is product of k output distributionswith dimensions nj − nj−1, j ∈ [K], each induced by a uni-form distribution over a sphere with the corresponding radius.As argued in [7], [17], [22], [23], by spherical symmetry, thedistribution of the random variable

lnPY nk |Xnk (Y nk |Xnk)

PY nk (Y nk)(64)

8

depends on Xnk only through its norm ‖Xnk‖. Since‖Xnk‖2 = nkP with probability 1, any choice of xnk suchthat ‖xni‖2 = niP for i ∈ [k] gives

P[lnPY nk |Xnk (Y nk |Xnk)

PY nk (Y nk)< γ + k ln J(P )

]= P

[lnPY nk |Xnk (Y nk |Xnk)

PY nk (Y nk)< γ + k ln J(P )

∣∣∣∣Xnk = xnk]

(65)

We set xnk = (√P ,√P , . . . ,

√P ) =

√P1 to obtain an

i.i.d. sum in (65). Given Xnk =√P1, the distribution of

ı(Xnk ;Y nk) is the same as the distribution of the sum

nk∑i=1

Ai, (66)

of nk i.i.d. random variables

Ai = C(P ) +P

2(1 + P )

(1− Z2

i +2√PZi

), i ∈ [nk].

(67)

Here Z1, . . . , Znk are drawn independently from N (0, 1) (seee.g., [7, eq. (205)]). The mean and variance of A1 are

E [A1] = C(P ) (68)Var [A1] = V (P ). (69)

From (63)–(66), we get

P [ı(Xnk ;Y nk) < γ] ≤ P

[nk∑i=1

Ai < γ + k ln J(P )

]. (70)

To verify that Lemma 1 is applicable to the right-hand side of(70), it only remains to show that E

[(A1 − C(P ))3

]is finite,

and A1−C(P ) satisfies Cramer’s condition, that is, there existssome t0 > 0 such that E [exp{t(A1 − C(P ))}] < ∞ for all|t| < t0. From (67), (A1 − C(P ))3 is distributed the sameas a 6-degree polynomial of the Gaussian random variableZ ∼ N (0, 1). This polynomial has a finite mean since allmoments of Z are finite. Let c , P

2(1+P ) , d , 2√P

, and t′ , tc.To show that Cramer’s condition holds, we directly compute

E [exp{t(A1 − C(P ))}]= E

[exp{t′(1− Z2 + dZ)}

](71)

=

∫ ∞−∞

1√2π

exp

{−x

2

2+ t′(1− x2 + dx)

}dx (72)

=1√

1 + 2t′exp

{t′ +

t′d

2(1 + 2t′)

}. (73)

Thus E [exp{t(A1 − C(P ))}] < ∞ for t′ > − 12 , and t0 =

12c > 0 together imply that A1 − C(P ) satisfies Cramer’scondition.

4) Choosing the decoding times n1, . . . , nK: We chooseγ, n1, . . . , nK so that the equalities

γ +K ln J(P ) = nkC(P )−√nk ln(K−k+1)(nk)V (P )

(74)

hold for all k ∈ [K]. This choice minimizes the upperbound (22) on the average decoding time (see Section IV-C).Applying Lemma 3 to (74) with

x = ni+1 (75)

y = ni −1

C(P )

√ni ln(K−i+1)(ni)V (P ) (76)

f(x) = − 1

C(P )

√ni+1 ln(K−i)(ni+1)V (P ) (77)

gives the following gaps between consecutive decoding times

ni+1 − ni =1

C(P )

(√ni ln(K−i)(ni)V (P )

−√ni ln(K−i+1)(ni)V (P )

)(1 + o(1)) (78)

for i ∈ {1, . . . ,K − 1}.5) Analyzing the bounds in Theorem 3: For each k ∈ [K],

applying Lemma 1 to (70) with γ, n1, . . . , nK satisfying (74)gives

P [ı(Xnk ;Y nk) < γ]

≤ Q(√

ln(K−k+1)(nk))

· exp

{−(ln(K−k+1)(nk))3/2µ3(P )

6√nV (P )3/2

}

+O

(1√n

exp

{−

ln(K−k+1)(nk)

2

})(79)

≤ 1√2π

1√ln(K−k)(nk)

1√ln(K−k+1)(nk)

·

(1 +O

((ln(K−k+1)(nk))(3/2)

√n

))(80)

for k < K, where

µ3(P ) , E[(A1 − C(P ))3

]<∞, (81)

and (80) follows from the Taylor series expansion exp(x) =1 +x+O(x2) as x→ 0, and the well-known bound [21, e.g.,Ch. 8, eq. (2.46)]

Q(x) ≤ 1√2π

1

xexp

{−x

2

2

}for x > 0. (82)

For k = K, Lemma 1 gives

P [ı(XnK ;Y nK ) < γ]

≤ 1√2π

1√nK

1√lnnK

(1 +O

((lnnK)(3/2)

√n

)). (83)

We bound the probability term in (22) from above byP [ı(Xni ;Y ni) < γ]. Then, by Theorem 3, there exists a VLSFcode with K decoding times n1 < n2 < · · · < nK such thatthe expected decoding time is bounded as

N ≤ n1 +

K−1∑i=1

(ni+1 − ni)P [ı(Xni ;Y ni) < γ] . (84)

By (78), we have

nk = n1(1 + o(1)) (85)

9

for k ∈ [K]. Plugging (78), (80), and (85) into (84), we get

N ≤ n1 +

√V (P )√

2π C(P )

√n1√

ln(K)(n1)(1 + o(1)). (86)

Applying Lemma 3 to (86), we get

n1 ≥ N −√V (P )

2C(P )

√N√

ln(K)(N)(87)

for n1 large enough. Comparing (87) and (78), we observethat for n1 large enough,

n1 < N < n2 < · · · < nK . (88)

Finally, we set message size M such that

lnM = γ − lnN. (89)

Plugging (83) and (89) into (21), we bound the error proba-bility as

ε ≤ P [ı(XnK ;Y nK ) < γ] + (M − 1) exp{−γ} (90)

≤ 1

2

1√nK

1√lnnK

+1

N(91)

≤ 1

2

1√N

1√lnN

+1

N, (92)

where (91) holds for nK large enough and (92) followsfrom (88). Inequality (92) implies that ε ≤ 1√

N lnNfor N

large enough. Plugging (87) and (89) into (74) with k = 1,we conclude that there exists an (N, {ni}Ki=1,M, 1√

N lnN, P )

VLSF code with

lnM ≥ NC(P )−√N ln(K)(N)V (P )

− 1

2

√NV (P )

ln(K)(N)− lnN −K ln J(P ) (93)

for N large enough, which completes the proof.

C. Second-order Optimality of (n1, . . . , nK) in Theorem 2within the Chosen Code Construction

We here investigate the optimality of our parameter choicesin (74). For a fixed M and the given code construction, wewould like to minimize the average decoding time subjectto the desired error probability. We consider the optimizationproblem in Theorem 3, that is,

min n1 +

K−1∑i=1

(ni+1 − ni)P [ı(Xni ;Y ni) < γ]

s.t.1√

nK lnnK= P [ı(XnK ;Y nK ) < γ]

+ (M − 1) exp{−γ}.

(94)

We set nK and γ to satisfy the equations

1√nK lnnK

= P [ı(XnK ;Y nK ) < γ] (1 + o(1)) (95)

lnM = γ − lnnK . (96)

The optimality of the choices in (95)–(96) are discussed laterin Section IV-E.

Given the choices in (95)–(96), the problem reduces tominimizing

N(n1, . . . , nK−1)

= n1 +

K−1∑i=1

(ni+1 − ni)P [ı(Xni ;Y ni) < γ] (97)

for fixed nK and γ under the constraints (95)–(96).Next, we define the functions

g(n) ,nC(P )− γ√

nV (P )(98)

F (n) , Q(−g(n)) = 1−Q(g(n)) (99)

f(n) , F ′(n) =1√2π

exp

{−g(n)2

2

}g′(n). (100)

Assume that γ = γn is such that g(n) = O(n1/6), andlimn→∞

g(n) = ∞. Then by Lemma 1, the step-wise constantfunction of n, P [ı(Xn;Y n) ≥ γ], is approximated by differ-entiable function F (n) as

P [ı(Xn;Y n) ≥ γ] = F (n)(1 + o(1)). (101)

Taylor series expansions give

1− F (n) =1

g(n)

1√2π

exp

{−g(n)2

2

}(1 + o(1)) (102)

f(n) = (1− F (n))g(n)g′(n)(1 + o(1)) (103)

g′(n) =C(P )√nV (P )

(1 + o(1)). (104)

Let n∗ = (n∗1, . . . , n∗K−1) denote the solution to the opti-

mization problem in (94). Then n∗ must satisfy the Karush-Kuhn-Tucker conditions ∇N(n∗) = 0, giving

∂N

∂n1

∣∣∣∣n=n∗

= F (n∗1)− (n∗2 − n∗1)f(n∗1) = 0 (105)

∂N

∂nk

∣∣∣∣n=n∗

= F (n∗k)− F (n∗k−1)− (n∗k+1 − n∗k)f(n∗k) = 0,

(106)

for k = 2, . . . ,K − 1.The method of approximating the probability

P [ı(Xn;Y n) ≥ γ] by a differentiable function F (n) isintroduced in [15, Sec. III] for the optimization problem in(94). In [15], Heidarzadeh et al. consider F (n) as a proxyfor the cumulative distribution function of the decoding timerandom variable τ . They derive the Karush-Kuhn-Tuckerconditions in (105)–(106), and use them to bound theperformance of the VLSF code performance with K decodingtimes. The channel considered there is the binary erasurechannel, and the analysis employs a random linear codingscheme. The analysis in [15] derives the approximation F (n)and uses it to numerically solve the equations (105)–(106)for a fixed K, M , and ε. Unlike [15], we find the analyticsolution to (105)–(106) for the Gaussian channel within ourcode construction as decoding times n1, . . . , nK approachinfinity, and derive the achievable rate in Theorem 1 as afunction of K.

10

Let n = (n1, . . . , nK−1) be the decoding times chosen in(74). We evaluate

g(ni) =√

ln(K−i+1)(ni)(1 + o(1)) (107)

1− F (g(ni)) =1√2π

1

g(ni+1)g(ni)(1 + o(1)) (108)

f(g(ni)) =1√2π

g′(ni)

g(ni+1)(109)

ni+1 − ni =g(ni+1)

g′(ni)(1 + o(1)) (110)

for i = 1, . . . ,K − 1, and

∇N(n) =

(1− 1√

2π,− 1√

2π,− 1√

2π, . . . ,− 1√

)(1 + o(1)). (111)

Our goal is to find a vector ∆n = (∆n1, . . . ,∆nK−1) suchthat

∇N(n + ∆n) = 0, (112)

Assume that ∆n = O(√n). By plugging n+ ∆n into (102)–

(104) and using the Taylor series expansion of g(n+∆n), weget

1− F (n+ ∆n) = (1− F (n))

· exp{−∆ng(n)g′(n)}(1 + o(1)) (113)f(n+ ∆n) = f(n) exp{−∆ng(n)g′(n)}(1 + o(1)).

(114)

Using (113)–(114), and putting n + ∆n in (105)–(106), wesolve (112) as

∆n1 = − ln√

g(n1)g′(n1)(1 + o(1)) (115)

= −√V (P ) ln

√2π

C(P )

√n1√

ln(K)(n1)(1 + o(1)) (116)

∆ni =1

2

g(ni−1)2

g(ni)g′(ni)= o(∆n1)(1 + o(1)) (117)

for i = 2, . . . ,K − 1. Hence, n + ∆n satisfies the optimalitycriterion, and n∗ = n + ∆n.

Now it only remains to evaluate the gap N(n∗) − N(n).We have

N(n∗)−N(n)

=

(∆n1 +

K−1∑i=1

(ni+1 − n)Q(g(ni))

· (exp{−∆nig(ni)g′(ni)} − 1)

)(1 + o(1)) (118)

=

(∆n1 +

(1− 1√

)1

g(n1)g′(ni)−K−1∑i=2

∆ni

)· (1 + o(1)) (119)

= −L√n1√

ln(K)(n1)(1 + o(1)), (120)

where L =(

ln√

2π + 1√2π− 1) √

V (P )

C(P ) is a positive con-stant. From the relationship between nk and n1 in (85) andthe equality (120), we get

N(n) = N(n∗) + L

√N(n∗)√

ln(K)(N(n∗))(1 + o(1)). (121)

Plugging (121) into our VLSF achievability bound (93), weget

lnM ≥ N(n∗)C(P )−√N(n∗) ln(K)(N(n∗))V (P )

−O

(√N(n∗)

ln(K)(N(n∗))

). (122)

Comparing (122) and (93), we see that the decoding timeschosen in (74) have the optimal second-order term in theasymptotic expansion of the maximum achievable messagesize within our code construction. Moreover, the order of thethird-order term in (122) is the same as the order of the third-order term in (93). Whether the proposed code structure isoptimal remains an open question.

D. Proof of Theorem 1

Our achievability proof is inspired by Polyanskiy et al.’scoding scheme in [6] for DMCs. There, the number of decod-ing times is unlimited. In [6], to prove an achievability resultwith an average error probability ε and an average decodingtime N , the decoder decodes to an arbitrary message at timen1 = 0 with probability p =

ε−ε′N1−ε′N

. With probability 1−p, thecode uses another VLSF code with average decoding time N ′

and average error probability ε′N = 1N ′ . With the same choice

of ε′N , Truong and Tan [11], [12] adapt the coding scheme in[6] to the Gaussian point-to-point and multiple access channelswith an unlimited number of decoding times and an averagepower constraint. Like [6], [11], [12], our coding strategy fixesthe smallest decoding time n1 to 0. To achieve the best second-order term within our code construction (see Section IV-E,below), we set

ε′N ,1√

N ′ lnN ′. (123)

Before the transmission starts, the decoder generates a randomvariable D ∼ Bernoulli(p) with

p ,ε− ε′N1− ε′N

. (124)

If D = 1, then the decoder decodes to an arbitrary messageat time n1 = 0; otherwise the encoder and decoder usean (N ′, {nk}Kk=2,M, ε′N , P ) VLSF code. The average errorprobability is bounded by

1 · p+ ε′N · (1− p) = ε, (125)

and the average decoding time is

N = p · 0 + (1− p) ·N ′ = (1− p)N ′. (126)

From (124), and (126), we get the asymptotic expansion

N ′ =N

1− ε

(1 +O

(1√

N lnN

)). (127)

11

By Theorem 2, there exists an (N ′, {nk}Kk=2,M, ε′N , P ) VLSFcode with

lnM = N ′C(P )−√N ′ ln(K−1)(N ′)V (P )

+O

(√N ′

ln(K−1)(N ′)

). (128)

Plugging (127) into (128) and using the Taylor series expan-sion of the function

√N ′ ln(K−1)(N ′) completes the proof.

E. Second-order Optimality of the Parameter ε′N in Sec-tion IV-D within the Chosen Code Construction

In this section, we show that the error value ε′N chosen in(123) is second-order optimal in the sense that under the VLSFcode design in Section IV-D, the error value ε′∗N that minimizesthe average decoding time in (126) has the same asymptoticexpansion (19) of the maximum achievable message size asε′N up to the second-order term.

From the code construction in Section IV-D, Theorem 3,(124), and (126), the average decoding time is

N(n2, . . . , nK , γ) = N ′(1− ε) 1

1− ε′N, (129)

where

N ′ = n2 +

K−1∑i=2

(ni+1 − ni)P [ı(Xni ;Y ni) < γ] (130)

ε′N = P [ı(XnK ;Y nK ) < γ] +M exp{−γ}. (131)

Let (n∗, γ∗) = (n∗2, . . . , n∗K , γ

∗) be the solution of∇N(n∗, γ∗) = 0. As in Section IV-C, we use the approx-imation P [ı(Xni ;Y ni) < γ] = Q(g(ni))(1 + o(1)), whereg(n) is given in (98). The prior discussion (105)–(106)3 findsthe values of n∗2, n

∗3, . . . , n

∗K−1 that minimize N ′. Minimizing

N ′ also minimizes N in (129) since ε′N depends only on nKand γ, and ε is a constant. Therefore, to minimize N , it onlyremains to find (n∗K , γ

∗) such that

∂N

∂nK

∣∣∣∣(n,γ)=(n∗,γ∗)

= 0 (132)

∂N

∂γ

∣∣∣∣(n,γ)=(n∗,γ∗)

= 0. (133)

Consider the case K > 2. Solving (132) and (133) using(105)–(110) gives

g(n∗K) =√

lnn∗K + ln(2)(n∗K) + ln(3) (n∗K) +O(1) (134)

0 = c0 +N ′(

1√2πn∗K

exp

{−g(n∗K)2

2

}(1 + o(1))

−M exp{−γ∗}), (135)

where c0 is a positive constant. Solving (134)–(135) simulta-neously, we obtain

lnM = γ∗ − lnn∗K +O(1). (136)

3Due to (130), we invoke the proof in Section IV-C with K − 1 decodingtimes instead of K.

Plugging (134) and (136) into (131), we get

ε′∗N =c1√

n∗K ln(2)(n∗K) lnn∗K

(1 + o(1)), (137)

where c1 is a constant. Let (n, γ) = (n2, . . . , nK , γ) be theparameters chosen in (74) and (89). Note that ε′∗N is order-wise different than ε′N in (123). Following steps similar to(118)–(120), we compute

N(n∗, γ∗)−N(n, γ) = −O

( √n∗K√

lnn∗K

). (138)

Plugging (138) into (19) gives

lnM =N(n∗, γ∗)C(P )

1− ε

−√N(n∗, γ∗) ln(K−1)(N(n∗, γ∗))

V (P )

1− ε

+O

(√N(n∗, γ∗)

ln(K−1)(N)

)(139)

Comparing (19) and (139), we see that although (123) and(137) are different, the parameters (n, γ) chosen in (74)and (89) have the same second-order term in the asymptoticexpansion of the maximum achievable message size as theparameters (n∗, γ∗) that minimize the average decoding timewithin our code structure.

For K = 2, the summation term in (131) disappears; in thiscase, the solution to (132) gives

ε′∗N =c2√

n∗K lnn∗K(1 + o(1)) (140)

for some constant c2. Following the steps in (138)–(139),we conclude that the parameter choices in (74) and (89) aresecond-order optimal within our code construction for K = 2as well.

F. Proof of Theorem 4

As in the proof of Theorem 1, we first show an achievabilityresult for a vanishing error probability. As in [6], [12], we setthe vanishing error probability to be 1

N .1) Achievability of VLSF codes with a vanishing error

probability:Lemma 4: Fix P > 0. For the Gaussian channel (3) with

noise variance 1, it holds that

lnM∗(N,∞, 1

N,P

)≥ NC(P )−

√N 4C(P ) ln J(P )

− lnN +O(1). (141)

Proof of Lemma 4: To prove Lemma 4, we apply Theorem 3with K =∞ and

T = {nk : k `N , k ∈ Z+}. (142)

Here `N is the gap between consecutive decoding times. Thatgap is here constant within the given code but grows with theaverage decoding time N for which the code is designed. How`N grows with N is determined later (see (157), below).

12

Choosing the input distribution: Let the distribution, PXnK ,of the random codewords, be chosen as in (57). In this case,K =∞.

Expected decoding time analysis: Following the steps in(32)–(33), we bound the average decoding time as

E [τ∗] ≤ E [τ ] , (143)

where τ is given in (36). We define the random variable

ξ , inf

{k ≥ 1:

1

`Nı(Xk`N ;Y k`N ) >

γ

`N

}=

τ

`N. (144)

To bound E [ξ], we use the following result from the renewaltheory literature.

Lemma 5 (Lorden [26, Th. 1]): Let X1, X2, . . . be i.i.d. ran-dom variables with E [X1] = µ > 0 and E

[(max{0, X})2

]=

m < ∞. Let Sn =∑ni=1Xi and τ(γ) = inf{n ∈ N : Sn >

γ}. Then,

supγ>0

E [τ(γ)] ≤ γ

µ+m

µ2. (145)

Note that

1

`Nı(Xk`N ;Y k`N ) =

k∑i=1

Bi (146)

is a sum of k i.i.d. random variables B1, . . . , Bk. The expectedvalue of B1 is bounded as

E [B1] =1

`NE[ı(X`N ;Y `N )

](147)

=1

`NE[

lnPY `N |X`N (Y `N |X`N )

PY `N (Y `N )

](148)

− 1

`NE[lnPY `N (Y `N )

PY `N (Y `N )

](149)

≥ C(P )− 1

`Nmax

y`N∈R`NlnPY `N (y`N )

PY `N (y`N )(150)

≥ C(P )− 1

`Nln J(P ), (151)

where Y `N is a N (0, (1 + P )I`N ) random vector, and J(P )is given in (49). Inequalities (150) and (151) follow from (68)and Lemma 2, respectively. Similarly,

Var [B1] =V (P )

`N(1 + o(1)) (152)

for any `N → ∞ as N → ∞. Applying Lemma 5 to (144),where (X1, X2, . . .) and γ in Lemma 5 are set as (B1, B2, . . .)and γ

`N, respectively, we get

E [ξ] ≤γ`N

E [B1]+

E[(max{0, B1})2

](E [B1])2

(153)

≤γ`N

C(P )− 1`N

ln J(P )+ 1 +

Var [B1]

(E [B1])2, (154)

where (154) follows from (151) and the observation thatE[(max{0, B1})2

]≤ E

[B2

1

]. Setting

γ = NC(P )− `NC(P )− N

`Nln J(P ) +O(1), (155)

we get from (144) and (151)–(154),

E [τ ] ≤ N. (156)

By (143), E [τ∗] is also bounded by N .To maximize the right-hand side of (155), we set

`N ,

√N

ln J(P )

C(P ). (157)

Error probability analysis: Following the steps in (39)-(47),the average error probability is bounded as

P [gτ∗(U, Ynτ∗ ) 6= W ]

≤ (M − 1) exp{−γ}+ P

[K⋃i=1

{‖Xni‖2 > niP

}]. (158)

Note that, as argued in [6, eq. (109], P [τ =∞] = 0 by (156).Since the random codewords (57) satisfy the maximal powerconstraint (8) with probability 1, the second term in (158) is0. Setting

lnM = γ − lnN (159)

bounds the error probability by 1N .

Combining (155), (157), and (159) completes the proof ofLemma 4.

2) Completing the proof of Theorem 4: The strategy issimilar to that of the proof of Theorem 1 with the differencethat we set

ε′N ,1

N ′. (160)

Before the transmission starts, we generate a random variableD ∼ Bernoulli(p) with

p ,ε− ε′N1− ε′N

. (161)

If D = 1, then the decoder decodes to an arbitrary messageat time n1 = 0; otherwise the encoder and decoder use an(N ′, {nk}∞k=2,M, ε′N , P ) VLSF code. Following steps identi-cal to (125)–(127), we see that the average error probabilityof the code is bounded by ε, and that the average decodingtime is N = N ′(1− p), or

N ′ =N

1− ε

(1 +O

(1

N

)). (162)

Combining Lemma 4 and (162) completes the proof.

REFERENCES

[1] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans.on Inf. Theory, vol. 2, no. 3, pp. 8–19, Sep. 1956.

[2] M. Horstein, “Sequential transmission using noiseless feedback,” IEEETrans. Inf. Theory, vol. 9, no. 3, pp. 136–143, Jul. 1963.

[3] J. Schalkwijk and T. Kailath, “A coding scheme for additive noisechannels with feedback–I: No bandwidth constraint,” IEEE Trans. Inf.Theory, vol. 12, no. 2, pp. 172–182, Apr. 1966.

[4] A. B. Wagner, N. V. Shende, and Y. Altug, “A new method for employingfeedback to improve coding performance,” IEEE Trans. Inf. Theory,vol. 66, no. 11, pp. 6660–6681, Nov. 2020.

[5] M. V. Burnashev, “Data transmission over a discrete channel withfeedback: Random transmission time,” Problems of Information Trans-mission, vol. 12, no. 4, pp. 10–30, 1976.

13

[6] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Feedback in the non-asymptotic regime,” IEEE Trans. Inf. Theory, vol. 57, no. 8, pp. 4903–4925, Aug. 2011.

[7] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in thefinite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.2307–2359, May 2010.

[8] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the AWGNchannel with feedback under a peak power constraint,” in Proc. IEEEInt. Symp. Inf. Theory (ISIT), Hong Kong, China, Jun. 2015, pp. 311–315.

[9] S. H. Kim, D. K. Sung, and T. Le-Ngoc, “Variable-length feedback codesunder a strict delay constraint,” IEEE Communications Letters, vol. 19,no. 4, pp. 513–516, Apr. 2015.

[10] Y. Altug, H. V. Poor, and S. Verdu, “Variable-length channel codeswith probabilistic delay guarantees,” in 53rd Annual Allerton Conferenceon Communication, Control, and Computing (Allerton), Monticello, IL,USA, Sep. 2015, pp. 642–649.

[11] L. V. Truong and V. Y. F. Tan, “On Gaussian MACs with variable-lengthfeedback and non-vanishing error probabilities,” arXiv:1609.00594v2,Sep. 2016.

[12] ——, “On Gaussian MACs with variable-length feedback and non-vanishing error probabilities,” IEEE Trans. Inf. Theor., vol. 64, no. 4, p.2333–2346, Apr. 2018.

[13] K. F. Trillingsgaard, W. Yang, G. Durisi, and P. Popovski, “Common-message broadcast channels with feedback in the nonasymptotic regime:Stop feedback,” IEEE Trans. Inf. Theory, vol. 64, no. 12, pp. 7686–7718,Dec. 2018.

[14] A. R. Williamson, T. Chen, and R. D. Wesel, “Variable-length convo-lutional coding for short blocklengths with decision feedback,” IEEETrans. Commun., vol. 63, no. 7, pp. 2389–2403, Jul. 2015.

[15] A. Heidarzadeh, J. Chamberland, R. D. Wesel, and P. Parag, “A sys-tematic approach to incremental redundancy with application to erasurechannels,” IEEE Trans. Commun., vol. 67, no. 4, pp. 2620–2631, Apr.2019.

[16] L. V. Truong, S. L. Fong, and V. Y. F. Tan, “On Gaussian channels withfeedback under expected power constraints and with non-vanishing errorprobabilities,” IEEE Trans. Inf. Theory, vol. 63, no. 3, pp. 1746–1765,Mar. 2017.

[17] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normalapproximation for the AWGN channel,” IEEE Trans. Inf. Theory, vol. 61,no. 5, pp. 2430–2438, May 2015.

[18] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum powercontrol at finite blocklength,” IEEE Trans. Inf. Theory, vol. 61, no. 9,pp. 4598–4615, Sep. 2015.

[19] R. C. Yavas, V. Kostina, and M. Effros, “Random access channel codingin the finite blocklength regime,” IEEE Trans. Inf. Theory, Dec. 2020,early access. doi: 10.1109/TIT.2020.3047630.

[20] Y. Polyanskiy and S. Verdu, “Channel dispersion and moderate devia-tions limits for memoryless channels,” in 2010 48th Annual AllertonConference on Communication, Control, and Computing (Allerton),2010, pp. 1334–1339.

[21] V. V. Petrov, Sums of independent random variables. New York, USA:Springer, Berlin, Heidelberg, 1975.

[22] E. MolavianJazi and J. N. Laneman, “A second-order achievable rateregion for Gaussian multi-access channels via a central limit theoremfor functions,” IEEE Trans. Inf. Theory, vol. 61, no. 12, pp. 6719–6733,Dec. 2015.

[23] R. C. Yavas, V. Kostina, and M. Effros, “Gaussian multiple and randomaccess in the finite blocklength regime,” in Proc. IEEE Int. Symp. Inf.Theory (ISIT), Los Angeles, CA, USA, June 2020, pp. 3013–3018.

[24] C. E. Shannon, “Probability of error for optimal codes in a Gaussianchannel,” The Bell System Technical Journal, vol. 38, no. 3, pp. 611–656,May 1959.

[25] E. MolavianJazi, “A unified approach to Gaussian channels with finiteblocklength,” Ph.D. dissertation, University of Notre Dame, Jul. 2014.

[26] G. Lorden, “On excess over the boundary,” Ann. Math. Statist., vol. 41,no. 2, pp. 520–527, 04 1970.