Highly undersampled magnetic resonance imaging ... · probability of the desired image is...

15
Magn Reson Med. 2019;00:1–15. wileyonlinelibrary.com/journal/mrm | 1 © 2019 International Society for Magnetic Resonance in Medicine Received: 24 January 2019 | Revised: 14 June 2019 | Accepted: 9 July 2019 DOI: 10.1002/mrm.27921 FULL PAPER Highly undersampled magnetic resonance imaging reconstruction using autoencoding priors Qiegen Liu 1 | Qingxin Yang 1 | Huitao Cheng 2,3 | Shanshan Wang 2 | Minghui Zhang 1 | Dong Liang 2,3 1 Department of Electronic Information Engineering, Nanchang University, Nanchang, China 2 Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, P. R. China 3 Medical AI Research Center, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, P. R. China Correspondence Dong Liang, Paul C. Lauterbur Research Center for Biomedical Imaging, Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P. R. China. Email: [email protected] Funding information National Natural Science Foundation of China, Grant/Award Number: 61661031 and 61871206; Basic Research Program of Shenzhen, Grant/Award Number: JCYJ20150831154213680; Natural Science Foundation of Jiangxi Province, Grant/ Award Number: 20181BAB202003 Purpose: Although recent deep learning methodologies have shown promising results in fast MR imaging, how to explore it to learn an explicit prior and leverage it into the observation constraint is still desired. Methods: A denoising autoencoder (DAE) network is leveraged as an explicit prior to address the highly undersampling MR image reconstruction problem. First, in- spired by the observation that the prior information learned from high‐dimension signals is more effective than that from the low‐dimension counterpart in image res- toration tasks, we train the network in a multichannel scenario and apply the learned network to single‐channel image reconstruction by a variables augmentation tech- nique. Second, because of the fact that multiple implementations of artificial noise generation in DAE favors a better underlying result, we introduce a 2‐sigma rule to complement each other for improving the final reconstruction. The whole algorithm is tackled by proximal gradient descent. Results: Experimental results under varying sampling trajectories and acceleration fac- tors consistently demonstrate the superiority of the enhanced autoencoding priors, in terms of peak signal‐to‐noise ratio, structural similarity, and high‐frequency error norm. Conclusion: A simple and effective way to incorporate the DAE prior into highly un- dersampling MR reconstruction is proposed. Once the DAE prior is obtained, it can be applied to the reconstruction tasks with different sampling trajectories and acceleration factors, and achieves superior performance in comparison with state‐of‐the‐art methods. KEYWORDS autoencoding priors, image reconstruction, magnetic resonance imaging, multichannel prior, proximal gradient descent 1 | INTRODUCTION Image reconstruction from highly undersampled k‐space data is a classical problem in MRI, which is of great necessity for reducing acquisition time. 1 However, as the rate of k‐space sampling declines, we may face different challenges, such as noise amplification, blurred object edges, and aliasing arti- facts, attributed to underdetermined conditions. To alleviate these issues, additional constraints and prior knowledge are desired. Introducing explicit priors as good

Transcript of Highly undersampled magnetic resonance imaging ... · probability of the desired image is...

Page 1: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

Magn Reson Med. 2019;00:1–15. wileyonlinelibrary.com/journal/mrm | 1© 2019 International Society for Magnetic Resonance in Medicine

Received: 24 January 2019 | Revised: 14 June 2019 | Accepted: 9 July 2019

DOI: 10.1002/mrm.27921

F U L L P A P E R

Highly undersampled magnetic resonance imaging reconstruction using autoencoding priors

Qiegen Liu1 | Qingxin Yang1 | Huitao Cheng2,3 | Shanshan Wang2 | Minghui Zhang1 | Dong Liang2,3

1Department of Electronic Information Engineering, Nanchang University, Nanchang, China2Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, P. R. China3Medical AI Research Center, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, P. R. China

CorrespondenceDong Liang, Paul C. Lauterbur Research Center for Biomedical Imaging, Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, P. R. China.Email: [email protected]

Funding informationNational Natural Science Foundation of China, Grant/Award Number: 61661031 and 61871206; Basic Research Program of Shenzhen, Grant/Award Number: JCYJ20150831154213680; Natural Science Foundation of Jiangxi Province, Grant/Award Number: 20181BAB202003

Purpose: Although recent deep learning methodologies have shown promising results in fast MR imaging, how to explore it to learn an explicit prior and leverage it into the observation constraint is still desired.Methods: A denoising autoencoder (DAE) network is leveraged as an explicit prior to address the highly undersampling MR image reconstruction problem. First, in-spired by the observation that the prior information learned from high‐dimension signals is more effective than that from the low‐dimension counterpart in image res-toration tasks, we train the network in a multichannel scenario and apply the learned network to single‐channel image reconstruction by a variables augmentation tech-nique. Second, because of the fact that multiple implementations of artificial noise generation in DAE favors a better underlying result, we introduce a 2‐sigma rule to complement each other for improving the final reconstruction. The whole algorithm is tackled by proximal gradient descent.Results: Experimental results under varying sampling trajectories and acceleration fac-tors consistently demonstrate the superiority of the enhanced autoencoding priors, in terms of peak signal‐to‐noise ratio, structural similarity, and high‐frequency error norm.Conclusion: A simple and effective way to incorporate the DAE prior into highly un-dersampling MR reconstruction is proposed. Once the DAE prior is obtained, it can be applied to the reconstruction tasks with different sampling trajectories and acceleration factors, and achieves superior performance in comparison with state‐of‐the‐art methods.

K E Y W O R D Sautoencoding priors, image reconstruction, magnetic resonance imaging, multichannel prior, proximal gradient descent

1 | INTRODUCTION

Image reconstruction from highly undersampled k‐space data is a classical problem in MRI, which is of great necessity for reducing acquisition time.1 However, as the rate of k‐space

sampling declines, we may face different challenges, such as noise amplification, blurred object edges, and aliasing arti-facts, attributed to underdetermined conditions.

To alleviate these issues, additional constraints and prior knowledge are desired. Introducing explicit priors as good

Page 2: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

2 | LIU et aL.

regularizers were usually used at the early stage, such as Tikhonov regularization.2-4 In the past decade, the develop-ment of MR reconstruction methods mainly comes from the novel image priors based on compressed sensing and matrix completion.5-15 One important trend in exploiting sparsity and low‐rank prior information comes with adaptive scheme, either in a patch‐derived matrix or image‐derived convolution formulation.10-15

Very recently, deep learning has been adopted to assist fast imaging and the general image restoration problems,16-18 which could be roughly categorized into two types based on the consistency between the training data and testing data. The first category is supervised learning approaches, which usually need an input‐output pair in the training stage and start from the same type of input data in the testing stage. There are roughly two types of supervised learning methods: One is data‐driven, which collects a large size of a training set to train a network that maps the observed data and ideal reconstruction purely from the perspective of end‐to‐end learning.19-23 The representative and widely used method of this kind is the convolution neural network (CNN) MRI.19 Generally, the better objective functions and network archi-tectures used, the better reconstruction will be obtained. The other is model‐driven, which originates from conventional sparsity‐constrained imaging models. This type of method unrolls the iterative reconstruction process and uses the net-work layers to play the role of iterations.24-28 For example, Sun et al24 adopted the iterative format of alternating direc-tion method of multipliers to form a network for reconstruct-ing images from undersampled k‐space data.

In another category of unsupervised learning approaches for fast MR imaging, they aim to learn the probability dis-tribution of the images to be reconstructed by network

training. After that, the network‐learned image priors are applied to the constrained image reconstruction framework as an explicit constraint.29-31 Recently, Tezcan et al trained a variational autoencoder on patches of fully sampled MR images to capture the distribution of patches and used this prior for image reconstruction.29 Similarly, the strategy was also adopted in the work of Zhang et al30 and Bigdeli et al,31 where they trained networks from a noisy full sam-pled data pair and then used them for general image res-toration tasks. In this work, we further aim to exploit this recent concept of network‐learned priors30,31 as a regular-ization term for single‐channel MRI reconstruction. The denoising autoencoder (DAE) is utilized as an effective prior in our iterative reconstruction procedure, because of its flexible representation and excellent robustness abilities in image restoration.30,31 The main contributions of this work are as follows:

• To our best knowledge, this is the first work to introduce the DAE prior for MRI reconstruction. Unlike the recent deep CNN‐based methods using an end‐to‐end learning fash-ion,19-28 we use network learning as a tool to learn general prior information and incorporate it into the constrained reconstruction framework, whose flowchart illustration is shown in Figure 1. Once the network‐learned image prior is obtained, it can be applied to the reconstruction tasks with different sampling trajectories and acceleration fac-tors, and can guarantee promising results.

• More important, two advanced strategies are proposed to enhance the naïve DAE prior, termed EDAEP (enhanced DAEP). First, considering that the high‐dimension learn-ing strategy may favor a more accurate representation of an image prior, we generate an artificial multichannel training

F I G U R E 1 Demonstration of the flowchart of the proposed EDAEPRec model at training phase and MRI reconstruction phase

Page 3: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 3LIU et aL.

set by means of a variables augmentation technique, and learn the prior from network training, and then use it for an original single‐channel image reconstruction task. Second, recognizing that noise distribution is the most important parameter affecting the DAE prior (DAEP), and different implementations of noise levels may favor different image features, the 2‐sigma rule and average technique are jointly utilized to improve the prior robustness.

2 | THEORY

2.1 | ReviewCompressed sensing (CS) states that an image can be recon-structed from very few linear measurements, as long as it has a sparse representation.1 Therefore, performances of the CS‐MRI methods heavily depend on the sparsity prior inherent in MR images, under the image manifold or structural con-straints. If such a manifold prior J(u) is defined, the image reconstruction model can be given as follows (Equation 1):

where u∈CN is the image to be reconstructed. Fp ∈CQ×N denotes the partially sampled Fourier encoding matrix, and f ∈CQ represents the acquired raw data in k‐space. � is a hy-perparameter that balances the prior constraint and the data‐fidelity term.

2.2 | DAEPA DAE is an autoencoder trained to reconstruct the signal that was corrupted with artificial noise.34-37 Previously, Alain and Bengio38 and Nguyen et al39 used DAE to construct gen-erative models, and pointed out that the output of an optimal DAE is a local mean of the true data density, and the autoen-coder error (the difference between its output and input) is a mean shift vector,40 Inspired by this observation, Bigdeli et al31 used the magnitude of the autoencoder error as a prior (DAEP) for image restoration. Specifically, denoting DAE as D��

and setting the input image to be u, the network output is D��

(u)=D(u+�). Then, a DAE D�� is trained to minimize the

following formulation (Equation 2):

where the expectation is over all images u and Gaussian noise � with standard variance ��. It should be pointed that the for-mulation in Equation 2 is resemble to the nonlocal total vari-ation and block‐matching‐and‐3D‐filtering (BM3D) prior.32,33 A main difference among them is that the

network‐driven representation in Equation 2 is obtained ro-bustly from a corrupted input and subsequently is used for recovering the originally “clean” solution. In a previous work,38 Alain et al revealed that the network output D��

(u) is related to the true data density p(u) as follows (Equation 3):

where g��(�) denotes the Gaussian kernel with standard vari-

ance ��. As indicated, the network output D��(u) is a weighted

average of images in the neighborhood of the network input u. More important, the autoencoder error D��

(u)−u is pro-portional to the gradient of the log likelihood of the smoothed density,38 that is (Equation 4):

We can see that the autoencoder error vanishes at sta-tionary points, including local extrema, of the true density smoothed by the Gaussian kernel.38 Therefore, the squared magnitude of the autoencoder error can be naturally used to be a prior. Besides the basic DAEP described above, more el-egant and sophisticated strategies for enhancing DAEP prior will be introduced later.

2.3 | DAEP for MRI reconstructionBy taking advantage of the conventional maximum a‐ posteriori framework and the discriminative ability of a deep CNN‐induced prior, we integrate the network‐induced regu-larization term with the data‐fidelity term, that is, the posterior probability of the desired image is p(u|f )=p(f |u)p(u)∕p(f ), and we maximize it by minimizing the associate negative log likelihoods L (Equation 5):

Hence, the reconstruction of the underlying image u can be attained by solving the following objective function (Equation 6):

The reconstruction can be conducted in a k‐space updating step and an image recovery step alternatively. A main differ-ence to the classical iterative methods10,12 is that, at the image recovery step, artificial Gaussian noise is added to the previ-ous intermediate result at first and then the DAEP constraint is enforced on it to pursue a more desirable result. The solver of this model, DAEP for MRI reconstruction (DAEPRec), will be detailed later.

(1)Minu

J(u)+ �‖‖‖Fpu− f

‖‖‖2

(2)LDAE

=E�,u

[‖‖‖u−D��

(u)‖‖‖

2]

(3)D��(u)=

∫ (u−�)g��(�)p(u−�)d�

∫ g��(�)p(u−�)d�

(4)D��(u)−u=�2

�∇ log [g��

∗p](u)

(5)Maxu

p(u|f )=Maxu

[L(u)+L(f |u)]

(6)Minu

‖‖‖u−D��(u)

‖‖‖2

+�‖‖‖Fpu− f

‖‖‖2

Page 4: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

4 | LIU et aL.

The success of DAEPRec mainly lies on two properties: First, the more powerful representation ability of the DAEP prior ‖‖‖u−D��

(u)‖‖‖

2

enables high‐quality reconstruction. As

stated in earlier works,30,31,41 the image recovery capability induced by the network‐driven prior is superior to many state‐of‐the‐art, nonlocal, patch‐based priors such as the BM3D prior. Second, unlike traditional alternative methods that update the solution on the intermediate result directly, the introduced DAEPRec updates the solution on the inter-mediate result plus the artificial noise perturbation. As is well known, in many state‐of‐the‐art methods, the intermediate result in the iterative procedure is usually prone to be smooth because of the deficiency of priors. Yet, the artificially added noise in DAEPRec is helpful for obtaining a more robust and more refined version of the intermediate result.

However, there is still some room for improving the per-formance of DAEPRec by modifying the basic DAEP model. First, motivated by the idea of utilizing channel prior informa-tion in supervised learning, we train a CNN with 3‐channel images as input‐output samples to learn the image prior. After the network‐driven prior is learned, we use the aux-iliary variable augmentation technique to incorporate it at the single‐channel image reconstruction task. Second, with the assumption that using different noise levels and differ-ent implementations of generated simulate noises in DAEP may enhance network representation capability and robust-ness, we enforce this strategy to improve the performance of reconstruction.

2.4 | Enhanced DAEP: multichannel learningThe core idea in the first kind of modification on a DAEP prior is to extend its discriminative ability. The straightfor-ward way is to extend the information/representation di-mension, such as adding channel dimension of the input. Intuitively, the respective images at different channels of the network input should have similar feature details.

At the network training stage, assuming the channel num-ber of the training image is 3 and denoting the vector variable with 3 channels as U= [u, u, u], the EDAEP prior is defined as follows (Equation 7):

where the network output is D��(U)=D(U+�). Likewise, in

Equation 2, 3‐channel artificial Gaussian noise � is added to U as the network input. After augmenting the channel number of input data, the data among 3 channels at the network input (i.e., U+�) have an inherent correlation and the artificial noise is totally random.

After the network tailored to the 3‐channel samples {U|U= [u, u, u]} is trained, we turn to the reconstruction

stage of the single‐channel image u as shown in Figure 2. Similar to the variable augmentation operation at the train-ing strategy, we copy and rearrange the single‐channel image to the same 3‐channel images at the reconstruction stage and thus pave the way to apply the trained 3‐channel prior to the tested single‐channel image. After being pro-cessed by the network‐driven prior, we average the 3D out-put to get a single‐channel variable.

In fact, the principle behind both DAEP and EDAEP is their capability in recovering signal from the input of cor-rupted signal. As is well known, the recovery performance largely depends on the corresponding regularization terms.1,8-10 More specifically, the solution updated in the data consis-tency term and the regularization term iteratively and pro-gressively tend to the optimal solution. In the proposed EDAEP model, the channel number of the input samples at the training stage is a crucial factor that contributes to the network learning capability and the subsequent recovery abil-ity. In order to illustrate the strength of utilizing multiple channels, we compare the capability of the DAEP and EDAEP models for removing the additive Gaussian noise at the same standard deviation (i.e., measuring ‖‖‖u−D��

(u)‖‖‖

2

and ‖‖‖U−D��(U)

‖‖‖2

in terms of peak signal‐to‐noise ratio

[PSNR] values). For 12 selected MR test images, the PSNRs obtained by EDAEP are 2 dB higher than that of DAEP. Specifically, in the case of the standard deviation being 15, the average PSNR values of DAEP and EDAEP are 32.87 and 35.26 dB. Additionally, in the case of the standard deviation being 25, the obtained PSNR values of DAEP and EDAEP are 30.21 and 32.82 dB. In fact, this is consistent with the phenomenon that the prior information learned from multi-channel images is more effective than that from the single‐channel counterpart in image restoration tasks.43-45 By incorporating the network‐derived prior from multichannel images into the reconstruction stage, it is predictable that the overall reconstruction performance will be substantially im-proved. More supportive theoretical explanations and experi-mental analyses of the multichannel training strategy are added in Supporting Information S1.

2.5 | Enhanced DAEP: 2‐sigma rule and average techniqueOne of the most important parameters in DAEPRec is the set-ting of simulated Gaussian noise level. In most applications of representation learning, good representations should contain features at different levels of granularity. Denoising autoencod-ers34-36 provide a particularly natural framework in which to formalize this intuition. In a DAE, the network is trained so as to be able to reconstruct each data point from a corrupted ver-sion. The noise‐adding process used to perform the corruption is chosen by the user, and is an important tuning parameter that

(7)LEDAE

=E�,U

[‖‖‖U−D��

(U)‖‖‖

2]

Page 5: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 5LIU et aL.

affects the final representation. In an earlier work,36 Glorot et al noticed that using either too low or too high a level of noise in DAE harms the learnt representation accuracy. In a previ-ous work,37 Geras and Sutton introduced scheduled denoising autoencoders, which are based on the intuition that the features at different scales can be learned by training the same network at multiple noise levels. Therefore, the network can learn mul-tiscale features with a schedule of gradually decreasing noise levels. Although this idea is promising, training networks in a “sequence” way is highly time‐consuming and the determina-tion of noise levels is not easy. By contrast, in this work, we utilize a “parallel” way, that is, to learn the networks at two sigma values separately using two workstations. This strategy elegantly circumvents additional time cost and meanwhile takes advantage of learning from different levels of artificial corruption.

In the case of the simulated noise at different levels, for example, with standard variation ��1 and ��2, if ��1 is set to be relatively small, then it prefers to produce more texture details at the cost of more noise among the results. On the other hand, for ��2 that is set to be relatively larger, it prefers to produce more smoothed results. Therefore, we can average them and benefit each other. The assumed prior is as follows (Equation 8):

At the high noise level, the training data are highly cor-rupted, which forces the network to learn more global, coarse‐grained features of the data. Meanwhile, at a low noise level, the network is devoted to learning features for reconstructing finer details of the training data. From Equation 8, we can see that the enhanced prior will include a combination of both coarse‐ and fine‐grained features. Besides the Gaussian noise with different standard derivations, we can further improve the result by averaging several implementations of the gradi-ent operators.

2.6 | Solver of EDAEPRecBy assembling the discussions in the above two subsections, the final formulation of the EDAEPRec model yields as fol-lows (Equation 9):

At the reconstruction stage, the vector is U= [u, u, u]. A noticeable choice of our approach is that even though the gra-dient maps of the trained network with regard to 3 channels are not the same, it can still be calculated and provide prom-ising results. In comparing Equation 6 with Equation 9, it can be seen that DAEPRec is a special case of the presented EDAEPRec. The determination of the balance parameter � is the same as in classical regularization methods, such as the dictionary learning based reconstruction methods.10,12(8)

LEDAE

=1

2

{E�,U

[‖‖‖U−D��1

(U)‖‖‖

2]+E�,U

[‖‖‖U−D��2

(U)‖‖‖

2]}

(9)

Minu

1

2

‖‖‖U−D��1(U)

‖‖‖2

+1

2

‖‖‖U−D��2(U)

‖‖‖2

+�‖‖‖Fpu− f

‖‖‖2

F I G U R E 2 Visual illustration of the 3‐channel network scheme at the training phase and the auxiliary variables augmentation technique used for single‐channel intermediate image at the iterative reconstruction phase

Page 6: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

6 | LIU et aL.

Similar to the conventional iterative methods in earlier works,10,12 we update the solution in a two‐step alternative manner by tackling the data‐fidelity term and the regulariza-tion term subsequently. The detailed description of the solver is provided in Supporting Information S2. Particularly, the proximal gradient descent method46 is employed to tackle it.

Algorithm EDAEPRec

1: Initialization: u0 =FTp

f , S1 =FFTp

f

2: for k=1, 2,… , K do

3: while stop criterion not satisfied (loop in m) do

4: uk−1,m = [uk−1,m−1−Mean[∇UG(Uk−1,m−1)]/�]

5: end (while)

6: uk =uk−1,m+1

7: update uk+1 by frequency interpolation using S2 =Fuk

8:Fu(kx,ky)=

{S2(kx,ky), (kx,ky)∉Ω

[�S1(kx,ky)+S2(kx,ky)]/

(�+1), (kx,ky)∈Ω

9: end (for)

10: Output: uK

3 | METHODS

We evaluated the performance of the proposed method using a variety of sampling schemes, with different under-sampling factors. Sampling schemes include 2D random sampling, Cartesian sampling with random phase encodings (1D random), and pseudo radial sampling. Reconstruction results on 31 2D complex‐valued MRI  data  with size of 256×256 are presented. According to many pioneered

works on CS‐MRI,1,10 the CS data acquisition was simu-lated by subsampling the 2D discrete Fourier transform of the MR images. The proposed method, EDAEPRec, was compared with the dictionary‐learning–based DLMRI10 and fast dictionary learning on classified patches,47 the reference‐derived sparse representation method, patch‐based nonlocal operator (PANO),11 the grouped low‐rank–based (nonlocal low‐rank compressive sensing; NLR‐CS) method,48 and the recent CNN‐induced difference‐of‐ convex (DC‐CNN) method.21 Processing units used in this research are the Intel Core i7‐7700 central processing unit and GeForce Titan XP. For the convenience of reproducible research, code is available at: https ://github.com/yqx71 50/EDAEP Rec/blob/maste r/version2. Figure 3A displays the fully sampled reconstruction of 3 T2‐weighted MRI images from the 31 testing data.

In each given example, the parameters of all compared methods are set to be default values. In our method, the nom-inal values of the various parameters are set as: filter size 3 × 3, filter number 64, and �=1. The setting of parameter � is the same as in DLMRI.10 The quality of the reconstruc-tion is quantified by PSNR, structural similarity (SSIM),49 and high‐frequency error norm (HFEN).10 Denoting u and u to be the reconstructed image and ground truth, the PSNR is defined as shown by Equation 10:

Additionally, the SSIM is defined as shown by Equation 11:

(10)PSNR(u, u)=20 log10

Max(u)

‖u− u‖2

(11)SSIM(u, u)=(2𝜇u𝜇u+c1)(2𝜎uu+c2)

(𝜇2u+𝜇2

u+c1)(𝜎2

u+𝜎2

u+c2)

F I G U R E 3 Visual illustration of 3 single‐channel testing data and one 3‐channel training data. A, The fully sampled reconstruction of 3 T2‐weighted MRI images: Test1, Test2, and Test3. B, Schematic illustration of generating the 3‐channel training data for EDAEP learning

Page 7: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 7LIU et aL.

where �u and �2u are the average and variances of u. 𝜎uu is the

covariance of u and u. c1 = (k1V)2 and c2 = (k2V)2 are used to maintain a stable constant, and V is the dynamic range of the pixel value; k1 =0.01, k2 =0.03.

Finally, the HFEN metric is used to quantify the quality of reconstruction of edges and fine features. It uses a rotation-ally symmetric Laplacian of Gaussian filter to capture edges. The filter kernel is of size 15 × 15 pixels and has a standard deviation of 1.5 pixels. HFEN is calculated as the ratio be-tween two norms (Equation 12):

3.1 | Training data sets and training convergenceThe raw data were acquired by using a 3D fast‐spin‐echo sequence with T2 weighting from a 3.0T whole‐body MR sys-tem (SIEMENS MAGNETOM TrioTim; Siemens Healthcare, Erlangen, Germany). TR and TE were 2500 and 149 ms, respectively. Seven volunteers participated, and there were 192 slices per subject. The thickness of each slice was 0.86 mm, and the field of view was set to be 220 mm2. The voxel size was 0.9 × 0.9 × 0.9 mm. The number of coils was 12. Coil compres-sion (http://people.eecs.berke ley.edu/~mlust ig/Softw are.html) was applied to obtain a single‐coil (i.e., 1‐channel) data42 at first. Then, the 1‐channel image was copied to be a 3‐channel

image. An illustration of generating 3‐coil complex‐value MR is shown in Figure 3B. Among the acquired data from the MR system, the collected training data set including 500 images was used in this work. Particularly, 400 were used for network training and 100 images for validation. The proposed method falls into the category of unsupervised learning methodology. Similar to other methods, such as the dictionary learning ap-proaches,10,12 its sensitivity on training data is relatively robust. Additionally, the naïve DAE in Bigdeli et al31 for natural image restoration is also trained on the data set with several hundreds of samples. On the other hand, the learned prior contains a num-ber of network parameters, whose estimation accuracy depends on the complexity of training data. Therefore, data with more diversity prefer to achieve better representation capability for training a large network, while at the expense of computational cost. Considering the balance between representation capability and computational cost, a modest training data set and network architecture are adopted in our current setting.

The network architecture used for learning a DAE in this work is the residual encoder‐decoder network (RED‐Net),41 as shown in Figure 4. The RED‐Net network consists of 18 layers, including 9 convolutional and 9 deconvolutional layers symmetrically arranged. Shortcuts connect matching convolutional and deconvolutional layers. Each layer is fol-lowed by its rectified linear units (ReLU). The convolution/deconvolution layer is the core building block. The convo-lution kernels are of size 3 × 3. As can be observed from Figure 4A,B, the number of channels at the input and output

(12)HFEN =‖LoG(u)−LoG(u)‖2

F

�‖LoG(u)‖2

F

F I G U R E 4 The training network architectures used in DAEP and EDAEP learning, respectively. The channel number at the input and output layers is 1 in DAEP network and 3 in EDAEP network, respectively

Page 8: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

8 | LIU et aL.

layer is 1 in the DAEP network and 3 in the EDAEP network, respectively. Instead, the channel number for the rest of the layers in both the DAEP and EDAEP networks is 64. Their difference is that single‐channel data were taken as input in the DAEP network and 3‐channel data in the EDAEP net-work, by adding artificial Gaussian noise to each channel. At the reconstruction stage, the intermediate result at the last it-eration was copied/duplicated to be 3‐channel as the input of the EDAEP. Both DAEP and EDAEP handle complex data by concatenating the real and imaginary parts as channels, that is, the real and imaginary components are set as the network input simultaneously. Hence, we finally convert the input of DAEP from Cm×n space into Rm×n×2 space and EDAEP from Cm×n×3 space into Rm×n×6 space.

The residual learning was performed by minimizing the L2 distance of the output layer to the ground truth. We used the Caffe package and utilized an Adam solver with weight decay of 10−4 and initial learning rate of 10−3. Additionally, the first‐order momentum and the second momentum were set to be �1=0.9 and �2=0.999. Figure 5 presents the train-ing/validation loss and PSNRs of the DAEP network with respect to �=15. The temporal evolutions of the 2 losses demonstrate that the network training gradually converges. Additionally, the PSNR value begins to flatten after the iter-ation number of 4 × 105.

4 | RESULTS

4.1 | Benefits of multichannel learning and 2‐sigma ruleTo investigate the structural strength of utilizing multichan-nel information, Figure 6A depicts the visual profiles of the network. As can be seen, the representative convolutional kernels of the final convolutional layer within the residual

block in DAEP tend to be random distribution, whereas the convolutional kernels in EDAEP contain more structural in-formation. This indicates that, although the artificial noises in 3 channels are randomly distributed, given that the 3‐channel image has inherent high‐dimensional similarity characteris-tics, joint learning of the 3‐channel noisy images can capture this manifold information efficiently. This partly explains why the discriminative ability of EDAEP is better than that of the naïve DAEP.

With regard to the 2‐sigma rule, the gradient images with ��1 =15 and ��2 =25 are demonstrated in Figure 6B. As can be observed, the gradient error image with ��2 =25 focuses on more large‐scale structural information whereas the gradient error image with ��1 =15 focuses on more small‐scale detail information. As a result, adopting the 2‐sigma rule will give rise to performance improvement. Besides, the iterative perfor-mance under the 2‐sigma rule is more robust, where the visual comparison is shown in Supporting Information Figure S1.

4.2 | Comparisons under different sampling ratiosThe comparison performance under different sampling ra-tios was investigated. Figure 7A involves the pseudo radial sampling of k‐space data under acceleration factor R = 3.3, 4, 5, and 10. Table 1 exhibits the PSNR, SSIM, and HFEN results of the 6 methods. It can be observed that EDAEPRec achieves the best values for almost all of the sampling ratios and all test images. Additionally, the superiority becomes more significant at higher undersampling ratios. This phe-nomenon indicates that our proposed regularization is more effective in severely ill‐posed circumstances. The supervised learning method DC‐CNN locates the second with R = 4 and R = 5 acceleration factors. However, its performance de-grades quickly in the case of acceleration factor R = 10.

F I G U R E 5 The loss and PSNR (dB) values in the EDAEP training process

Page 9: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 9LIU et aL.

Figure 7B shows the reconstruction comparisons on image Test1. We can see that the reconstructions with EDAEPRec exhibit better detail preservation than those with DLMRI and PANO and are almost devoid of aliasing artifacts. In contrast, the low‐rank–based NLR‐CS method suffers from oversmooth-ing deficiency, and the CNN‐based DC‐CNN method still

exhibits some artifacts. Moreover, we present an enlargement area to reveal the structures and fine details preserved by each method. As can be observed in the zoom‐in regions enclosed by the green box, only the proposed EDAEPRec method suc-cessfully preserves the vertical line‐like pattern. More visual results are detailed in Supporting Information Figure S2.

F I G U R E 6 Some properties of DAEPRec and EDAEPRec. A, Representative convolutional kernels and feature maps in DAEP and EDAEP. B, Illustration of the intermediate results in the procedure of utilizing the 2‐sigma rule

F I G U R E 7 Reconstruction results of competing methods at different acceleration factors

Page 10: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

10 | LIU et aL.

TA

BL

E 1

A

vera

ge P

SNR

, SSI

M, a

nd H

FEN

val

ues (

mea

n ±

std)

of r

econ

stru

ctin

g 31

test

imag

es (a

) at r

adia

l sam

plin

g tra

ject

orie

s with

diff

eren

t acc

eler

atio

n fa

ctor

R a

nd (b

) at d

iffer

ent

sam

plin

g tra

ject

orie

s with

the

sam

e pe

rcen

tage

(a)

DLM

RI

PAN

OFD

LCP

NLR

‐CS

DC

‐CN

ND

AEP

Rec

EDA

EPR

ec

R =

3.3

33.4

3 (±

0.87

)34

.64

(±1.

26)

34.8

9 (±

1.20

)35

.31

(±1.

48)

35.7

1 (±

1.38

)34

.04

(±1.

00)

35.6

2 (±

1.16

)

0.90

54 (±

0.01

01)

0.91

52 (±

0.01

61)

0.91

35 (±

0.01

70)

0.90

99 (±

0.23

33)

0.92

34 (±

0.01

76)

0.91

11 (±

0.01

22)

0.92

79 (±

0.01

30)

0.63

(±0.

0544

)0.

56 (±

0.07

99)

0.50

(±0.

0670

)0.

47 (±

0.07

78)

0.44

(±0.

0659

)0.

62 (±

0.05

89)

0.42

(±0.

0676

)

R =

432

.41

(±0.

87)

33.6

5 (±

1.23

)34

.04

(±1.

18)

34.3

5 (±

1.43

)34

.07

(±1.

26)

33.2

1 (±

0.98

)34

.49

(±1.

15)

0.88

66 (±

0.01

13)

0.89

95 (±

0.01

80)

0.89

80 (±

0.01

96)

0.89

38 (±

0.02

56)

0.89

92 (±

0.02

56)

0.89

73 (±

0.01

35)

0.91

51 (±

0.01

46)

0.84

(±0.

0852

)0.

73 (±

0.09

99)

0.62

(±0.

0821

)0.

61 (±

0.09

50)

0.69

(±0.

0950

)0.

76 (±

0.06

95)

0.64

(±0.

0822

)

R =

531

.21

(±0.

83)

32.4

4 (±

1.16

)32

.97

(±1.

16)

33.3

2 (±

1.24

)32

.68

(±1.

10)

32.2

9 (±

0.96

)33

.49

(±1.

14)

0.86

02 (±

0.01

35)

0.87

77 (±

0.01

94)

0.87

70 (±

0.22

4)0.

8812

(±0.

0243

)0.

8791

(±0.

0243

)0.

8797

(±0.

0150

)0.

8990

(±0.

0167

)

1.10

(±0.

0933

)0.

96 (±

0.12

17)

0.80

(±0.

1037

)0.

79 (±

0.11

19)

0.95

(±0.

1119

)0.

94 (±

0.08

40)

0.79

(±0.

1015

)

R =

10

27.3

9 (±

0.98

)28

.58

(±1.

00)

29.3

4 (±

1.13

)29

.51

(±1.

16)

28.3

9 (±

0.93

)29

.65

(±0.

94)

30.3

0 (±

1.17

)

0.74

44 (±

0.02

88)

0.78

05 (±

0.02

54)

0.78

56 (±

0.03

33)

0.78

45 (±

0.02

43)

0.77

10 (±

0.02

43)

0.81

60 (±

0.02

13)

0.83

19 (±

0.02

54)

2.18

(±0.

2198

)1.

90 (±

0.20

14)

1.60

(±0.

2008

)1.

65 (±

0.21

40)

1.93

(±0.

2140

)1.

56 (±

0.14

69)

1.40

(±0.

1842

)

(b)

DLM

RI

PAN

OFD

LCP

NLR

‐CS

DC

‐CN

ND

AEP

Rec

EDA

EPR

ec

R =

6.7

, 2D

R

ando

m27

.63

(±0.

98)

29.1

2 (±

1.06

)30

.14

(±1.

19)

30.3

4 (±

1.21

)28

.78

(±1.

02)

29.9

0 (±

0.91

)30

.68

(±1.

20)

0.75

18 (±

0.02

57)

0.79

64 (±

0.02

43)

0.80

04 (±

0.03

28)

0.80

87 (±

0.03

27)

0.78

73 (±

0.03

27)

0.82

32 (±

0.02

01)

0.84

33 (±

0.02

58)

2.02

(±0.

1849

)1.

77 (±

0.19

76)

1.44

(±0.

1890

)1.

46 (±

0.19

92)

1.83

(±0.

1992

)1.

49 (±

0.13

12)

1.31

(±0.

1921

)

R =

6.7

, Ps

eudo

R

adia

l

29.3

6 (±

0.99

)30

.60

(±1.

12)

31.3

1 (±

1.15

)31

.35

(±1.

05)

30.5

7 (±

1.04

)30

.99

(±0.

96)

32.0

0 (±

1.19

)

0.81

03 (±

0.02

43)

0.83

72 (±

0.02

31)

0.83

91 (±

0.02

75)

0.84

94 (±

0.02

32)

0.83

48 (±

0.02

32)

0.85

12 (±

0.01

77)

0.87

16 (±

0.02

13)

1.58

(±0.

1991

)1.

37 (±

0.16

94)

1.13

(±0.

1463

)1.

17 (±

0.13

41)

1.38

(±0.

1341

)1.

22 (±

0.11

02)

1.05

(±0.

1470

)

R =

6.7

, 1D

C

arte

sian

26.5

0 (±

1.08

)27

.51

(±0.

98)

27.9

1 (±

1.07

)28

.23

(±1.

10)

27.0

5 (±

0.89

)28

.67

(±1.

20)

28.8

5 (±

1.32

)

0.73

90 (±

0.04

16)

0.76

83 (±

0.03

20)

0.77

76 (±

0.03

35)

0.77

98 (±

0.03

61)

0.75

06 (±

0.03

61)

0.80

12 (±

0.03

04)

0.80

41 (±

0.03

48)

2.51

(±0.

2223

)2.

28 (±

0.22

14)

2.15

(±0.

2448

)2.

03 (±

0.23

91)

2.44

(±0.

2391

)1.

86 (±

0.25

26)

1.81

(±0.

2807

)

Page 11: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 11LIU et aL.

4.3 | Comparisons under different sampling patternsThe performance of different sampling masks was also inves-tigated. Figure 8A involves variable‐density 2D random sam-pling, pseudo radial sampling, and Cartesian sampling under the same acceleration factor R = 4 of k‐space data (i.e., only acquiring 15% k‐space data). The performance of PSNRs and SSIMs is reported in Table 1. It can be observed that the measures of both NLR‐CS and EDAEPRec are better than those of DLMRI, PANO, and DC‐CNN.

Visually, reconstruction results under R = 4 of 2D random and 1D Cartesian undersampling are depicted in Figure 8B (more results are provided in Supporting Information Figure S3). As can be seen in the first row, there is no obvious blurring and artifacts induced by the proposed EDAEPRec method. Furthermore, the green box enclosed parts were zoomed in and are presented at the bottom right corner of the images. EDAEPRec provides a better reconstruction of object edges and finer textures than the other competing methods.

The reconstruction results at R = 4 of 1D Cartesian undersampling are shown in the second row. In this case, given that very few high‐frequency k‐space data were sampled, the

reconstructed result is much worse than the counterparts in the case of 2D random sampling. As can be observed, the results obtained by DLMRI, PANO, and DC‐CNN suffer from severe blurring from the horizontal direction, whereas EDAEPRec alleviates the blurring effect greatly and still pro-vides sharp edges.

For better illustrating the gain of EDAEPRec over the naïve DAEPRec, a visual plot of their PSNR values in Table 1 is depicted in Supporting Information Figure S4, where it can be observed that EDAEPRec consistently performs better than DAEPRec.

5 | DISCUSSION

As stated in the Introduction section, the introduction of higher‐dimensional training data is the key innovation of our method. Table 2 lists the performance of EDAEPRec with var-ying channel numbers of multichannel training data. It can be seen that with higher‐dimension training data, the performance indeed improves. In the circumstance of acceleration factor R = 4 and 5, the PSNR/SSIM values obtained by the 2‐channel network is inferior to that by the 3‐channel network, and the performance by the 3‐channel network is inferior to that of the

F I G U R E 8 Reconstruction results of various methods from different trajectories at the same acceleration factor R = 6.7

Page 12: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

12 | LIU et aL.

4‐channel network. Additionally, the improvement gain from training 3‐channel to 4‐channel is marginal. Moreover, in the case of R = 10, the reconstruction PSNR/SSIM values obtained with 4‐channel is lower than that with the 3‐channel network. This phenomenon indicates that as the problem becomes more ill‐posed, the strength of enforcing a high‐dimensional prior

decreases. Therefore, we chose the channel number to be 3 as the default value in this work. It is worth noting that, under the graphics processing unit (GPU) equipment, the computa-tional time of DAEPRec and EDAEPRec for reconstructing an image with a size of 256 × 256 at one iteration is 0.10 and 0.23 seconds, respectively.

T A B L E 2 Performance of EDAEPRec for 31 test images with varying channel number of multichannel training data and varying 2‐sigma values

Channel number R = 4 R = 5 R = 10

2‐channel 34.29 (±1.09) 33.30 (±1.08) 30.31 (±1.10)

0.9129 (±0.0140) 0.8967 (±0.0161) 0.8327 (±0.0235)

0.65 (±0.0744) 0.80 (±0.0931) 1.39 (±0.1638)

3‐channel 34.49 (±1.15) 33.49 (±1.14) 30.33 (±1.17)

0.9151 (±0.0146) 0.8990 (±0.0167) 0.8329 (±0.0254)

0.64 (±0.0822) 0.79 (±0.1015) 1.30 (±0.1842)

4‐channel 34.55 (±1.19) 33.52 (±1.19) 30.07 (±1.27)

0.9155 (±0.0151) 0.8992 (±0.0176) 0.8249 (±0.0286)

0.64 (±0.0877) 0.80 (±0.1123) 1.47 (±0.2214)

(��1,�

�2) R = 4 R = 5 R = 10

(10, 20) 34.62 (±1.25) 33.58 (±1.26) 29.95 (±1.34)

0.9160 (±0.0158) 0.8995 (±0.0185) 0.8217 (±0.0302)

0.65 (±0.0951) 0.81 (±0.1228) 1.50 (±0.2408)

(10, 25) 34.57 (±1.19) 33.57 (±1.17) 30.25 (±1.26)

0.9157 (±0.0151) 0.8998 (±0.0171) 0.8300 (±0.0276)

0.64 (±0.0866) 0.79 (±0.1072) 1.42 (±0.2120)

(10, 30) 34.41 (±1.15) 33.42 (±1.14) 30.34 (±1.13)

0.9140 (±0.0149) 0.8978 (±0.0169) 0.8325 (±0.0241)

0.65 (±0.825) 0.80 (±0.1016) 1.39 (±0.1700)

(15, 20) 34.59 (±1.20) 33.57 (±1.20) 30.05 (±1.28)

0.9160 (±0.0152) 0.8998 (±0.0177) 0.8246 (±0.0288)

0.64 (±0.890) 0.79 (±0.1133) 1.47 (±0.2245)

(15, 25) 34.49 (±1.15) 33.49 (±1.14) 30.30 (±1.17)

0.9151 (±0.0146) 0.8990 (±0.0167) 0.8319 (±0.0254)

0.64 (±0.0822) 0.79 (±0.1015) 1.40 (±0.1842)

(15, 30) 34.33 (±1.10) 33.34 (±1.09) 30.29 (±1.10)

0.9133 (±0.0142) 0.8972 (±0.0162) 0.8323 (±0.0236)

0.65 (±0.0765) 0.80 (±0.0957) 1.40 (±0.1674)

(20, 20) 34.47 (±1.16) 33.46 (±1.16) 30.07 (±1.23)

0.9149 (±0.0147) 0.8988 (±0.0171) 0.8258 (±0.0273)

0.64 (±0.0835) 0.79 (±0.1066) 1.46 (±0.2084)

(20, 25) 34.35 (±1.10) 33.36 (±1.10) 30.14 (±1.16)

0.9137 (±0.0141) 0.8977 (±0.0162) 0.8287 (±0.0251)

0.64 (±0.0773) 0.80 (±0.0970) 1.44 (±0.1892)

(20, 30) 34.28 (±1.09) 33.30 (±1.09) 30.27 (±1.09)

0.9127 (±0.0141) 0.8965 (±0.0162) 0.8313 (±0.0235)

0.65 (±0.0749) 0.80 (±0.941) 1.40 (±0.1642)

Average PSNR, SSIM, and HFEN values (mean ± std) are recorded.

Page 13: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 13LIU et aL.

For the test image Test2, we investigated the 2‐sigma rule for EDAEPRec by varying the values of standard derivation (��1, ��2). As can be observed in Table 2, the achieved PSNR value is most striking at the point.15,25 On the other hand, the performance is very predictable, that is, no matter the value of ��1 to be 10, 15, or 20, its PSNR value increases from ��2 =20 to 25 and decreases from ��2 = 25 to 30. Therefore, ��1 =15 and ��2 =25 were chosen in all the examples. Moreover, the network sensitivity of EDAEPRec to filter number, filter size, and training strategy is examined in Supporting Information Tables S1‐S3 respectively.

6 | CONCLUSIONS

In this work, we presented a new and enhanced DAE prior for MRI reconstruction from undersampled k‐space data. In order to pursue the optimal prior, besides introducing the basic DAE, we further enhanced it as an advanced and powerful prior by imposing higher‐dimensional structural information through multichannel learning. Additionally, a 2‐sigma rule was introduced to complement each other for improving the final reconstruction. Both qualitative and quantitative experimental results verified its superior per-formance under a variety of sampling trajectories and ac-celeration factors.

ACKNOWLEDGMENTS

The authors sincerely thank the anonymous reviewers for their valuable comments and constructive sugges-tions that are very helpful in the improvement of this arti-cle. This work was, in part, supported by National Natural Science Foundation of China (61871206, 61661031), Basic Research Program of Shenzhen (JCYJ20150831154213680), and the Natural Science Foundation of Jiangxi Province (20181BAB202003).

ORCID

Qiegen Liu  https://orcid.org/0000-0003-4717-2283 Shanshan Wang  https://orcid.org/0000-0002-0575-6523 Dong Liang  https://orcid.org/0000-0003-0131-2519

REFERENCES

1. Lustig M, Donoho D, Pauly JM. Sparse MRI: the application of compressed sensing for rapid MR imaging. Magn Reson Med. 2007;58:1182–1195.

2. Lin FH, Kwong KK, Belliveau JW, Wald LL. Parallel imaging reconstruction using automatic regularization. Magn Reson Med. 2004;51:559–567.

3. Qu P, Luo J, Zhang B, Wang J, Shen GX. An improved iterative SENSE reconstruction method. Magn Reson Eng. 2007;31:44–50.

4. Ying L, Xu D, Liang ZP. On Tikhonov regularization for image reconstruction in parallel MRI. In Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, 2004. pp. 1056–1059.

5. Block KT, Uecker M, Frahm J. Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint. Magn Reson Med. 2007;57:1086–1098.

6. Ma S, Yin W, Zhang Y, Chakraborty A. An efficient algorithm for compressed MR imaging using total variation and wavelets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, 2008. pp. 1–8.

7. Akçakaya M, Nam S, Hu P, et al. Compressed sensing with wavelet domain dependencies for coronary MRI: Aaretrospective study”. IEEE Trans Med Imaging. 2011;30:1090–1099.

8. Knoll F, Bredies K, Pock T, Stollberger R. Second order total generalized variation (TGV) for MRI. Magn Reson Med. 2011;65:480–491.

9. Liang D, Wang H, Chang Y, Ying L. Sensitivity encoding recon-struction with nonlocal total variation regularization. Magn Reson Med. 2011;65:1384–1392.

10. Ravishankar S, Bresler Y. MR image reconstruction from highly undersampled k‐space data by dictionary learning. IEEE Trans Med Imaging. 2011;30:1028–1041.

11. Qu X, Hou Y, Lam F, Guo D, Zhong J, Chen Z. Magnetic resonance image reconstruction from undersampled measurements using a patch‐based nonlocal operator. Med Image Anal. 2014;18:843–856.

12. Liu Q, Yang K, Luo J, Zhu Y, Liang D. Highly undersampled mag-netic resonance imaging reconstruction using two‐level Bregman method with dictionary updating. IEEE Trans Med Imaging. 2013;32:1290–1301.

13. Xiong J, Liu Q, Wang Y, Xu X. A two‐stage convolutional sparse prior model for image restoration. J Vis Commun Image R. 2017;48:268–280.

14. Liu Q, Leung H. Synthesis‐analysis deconvolutional network for compressed sensing. In Proceedings of the 24th IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017. pp. 1940–1944.

15. He J, Liu Q, Christodoulou AG, Ma C, Lam F, Liang ZP. Accelerated high‐dimensional MR imaging with sparse sampling using low‐rank tensors. IEEE Trans Med Imaging. 2016;32:2119–2129.

16. He K, Zhang X, Ren S, et al. Deep residual learning for image recog-nition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015. pp. 770–778.

17. Bae W, Yoo J, Ye JC. Beyond deep residual learning for image restoration: persistent homology‐guided manifold simplification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017. pp. 145–153.

18. Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced deep residual networks for single image super‐resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017. pp. 136–144.

19. Wang S, Su Z, Ying L, et al. Accelerating magnetic resonance imaging via deep learning. In Proceedings of the IEEE 13th International Symposium on Biomedical Imaging (ISBI), Prague, Czech Republic, 2016. pp. 514–517.

20. Hammernik K, Knoll F, Sodickson D, Pock T. Learning a vari-ational model for compressed sensing MRI reconstruction. In

Page 14: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

14 | LIU et aL.

Proceedings of the 24th Annual Meeting & Exhibition of the ISMRM, Singapore, 2016. p. 1088.

21. Schlemper J, Caballero J, Hajnal JV, Price A, Rueckert D. A deep cascade of convolutional neural networks for MR image recon-struction. In International Conference on Information Processing in Medical Imaging (IPMI); June 25–30, 2017; Boone, NC. pp. 647–658.

22. Mardani M, Gong E, Cheng JY, et al. Deep generative adversarial networks for compressed sensing automates MRI. arXiv preprint. 2017;arXiv:1706.00051.

23. Ye JC, Han Y, Cha E. Deep convolutional framelets: a general deep learning framework for inverse problems. SIAM J Imaging Sci. 2018;11:991–1048.

24. Sun J, Li H, Xu Z. Deep ADMM‐net for compressive sensing MRI. In Proceedings of the Thirtieth Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016. pp. 10–18.

25. Patrick P, Max W. Recurrent inference machines for solving in-verse problems. arXiv preprint. 2017;arXiv:1706.04008.

26. Jonas A, Ozan O. Solving ill‐posed inverse problem using iterative deep neural networks. Inverse Prob. 2017;33:1–24.

27. Zhang J, Bernard G. ISTA‐Net: interpretable optimization‐inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, 2018. pp. 1828–1837.

28. Steven D, Vincent S, Felix H, Gordon W. Unrolled optimization with deep priors. arXiv preprint. 2017;arXiv:1705.08041.

29. Tezcan KC, Baumgartner CF, Luechinger R, Pruessmann KP, Konukoglu E. MR image reconstruction using deep density priors. IEEE Trans Med Imaging. 2019;38:99.

30. Zhang K, Zuo W, Chen Y, Meng D, Zhang L. Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process. 2017;26:3142–3155.

31. Bigdeli SA, Zwicker M. Image restoration using autoencoding pri-ors. arXiv preprint. 2017;arXiv:1703.09964.

32. Katkovnik V, Egiazarian K. Nonlocal image deblurring: varia-tional formulation with nonlocal collaborative l₀‐norm prior.  In Proceedings of the International Workshop on Local and Non‐local Approximation in Image Processing, Tuusula, Finland, 2009. pp. 46–55.

33. Zhang X, Burger M, Bresson X, Osher S. Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J Imaging Sci. 2010;3:253–276.

34. Vincent P, Larochelle H, Bengio Y, Manzagol PA. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008.

35. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders: learning useful representa-tions in a deep network with a local denoising criterion. JMLR. 2010;11:3371–3408.

36. Glorot X, Bordes A, Bengio Y. Deep sparse rectifier networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, 2011. pp. 315–323.

37. Geras KJ, Sutton C. Scheduled denoising autoencoders. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, 2015.

38. Alain G, Bengio Y. What regularized auto‐encoders learn from the data‐generating distribution. JMLR. 2014;15:3743–3773.

39. Nguyen A, Yosinski J, Bengio Y, Dosovitskiy A, Clune J. Plug & play generative networks: conditional iterative generation of im-ages in latent space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017. pp. 4467–4477.

40. Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal Mach Intell. 2002;24:603–619.

41. Mao XJ, Shen C, Yang YB. Image restoration using very deep convo-lutional encoder‐decoder networks with symmetric skip connections. In Proceedings of the Thirtieth Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016. pp. 2802–2810.

42. Zhang T, Pauly JM, Vasanawala SS, Lustig M. Coil compression for accelerated imaging with Cartesian sampling. Magn Reson Med. 2013;69:571–582.

43. Joshi N, Zitnick CL, Szeliski R, Kriegman D. Image deblurring and denoising using color priors.  In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Madison, WI, 2009. pp. 1550–1557.

44. Mousavi HS, Monga V. Sparsity‐based color image super resolu-tion via exploiting cross channel constraints. IEEE Trans Image Process. 2017;26:5094–5106.

45. Dai S, Han M, Xu W, Wu Y, Gong Y, Katsaggelos AK. Softcuts: a soft edge smoothness prior for color image super‐resolution. IEEE Trans Image Process. 2009;18:969–981.

46. Parikh N, Boyd S. Proximal algorithms. Found. Trends Optim. 2014;1:127–239.

47. Zhan Z, Cai JF, Guo D, Liu Y, Chen Z, Qu X. Fast multiclass dic-tionaries learning with geometrical directions in MRI reconstruc-tion. IEEE Trans Biomed Eng. 2016;63:1850–1861.

48. Dong W, Shi G, Li X, Ma Y, Huang F. Compressive sensing via nonlocal low‐rank regularization. IEEE Trans Image Process. 2014;23:3618–3632.

49. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality as-sessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13:600–612.

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

SUPPORTING INFORMATION S1 More supportive theoretical explanations and experimental analyses of multi‐channel training strategySUPPORTING INFORMATION S2 More detailed descrip tion of the EDAEPRec solver FIGURE S1 Performance of the reconstruction PSNR re-sults of DAEPRec (A) and EDAEPRec (B) vs iterations at radial undersampling pattern with acceleration factor R = 10. The randomly generated noise at each stage of implementing DAEPRec may introduce somewhat unstableness and lead to local solutions. The iterative property of the basic DAEPRec needs to be strengthened. Alternatively, the measures of EDAEPRec change quickly during the first few iterations and then tend to a convergence zone. The iterative performance under the 2‐sigma rule is more robust

Page 15: Highly undersampled magnetic resonance imaging ... · probability of the desired image is p(u˜f)=p(f˜u)p(u)∕p(f), and we maximize it by minimizing the associate negative log likelihoods

| 15LIU et aL.

FIGURE S2 More reconstruction results of competing meth-ods under acceleration factor R = 10. The reconstruction comparisons on image Test1 and their corresponding errors are shown. The reconstructions with EDAEPRec exhibits higher resolution than those with DLMRI and PANO and are almost devoid of aliasing artifacts. In contrast, the low‐rank– based NLR‐CS method suffers from oversmoothing defi-ciency, and the CNN‐based DC‐CNN method still exhibits some artifacts. As can be observed in the zoom‐in regions enclosed by the green box, only EDAEPRec successfully pre-serves the vertical line‐like pattern. The reconstruction errors indicate that EDAEPRec attains the lowest error magnitudeFIGURE S3 More reconstruction results of various meth-ods at different trajectories with acceleration factor R = 6.7. In (A), there is no obvious blurring and artifacts induced by EDAEPRec. EDAEPRec provides a better reconstruction of object edges and finer textures than the other competing methods. In (B), the reconstructed results are much worse than the counterparts in the case of 2D random sampling. The results obtained by DLMRI, PANO, and DC‐CNN suf-fer severe blurring from the horizontal direction, whereas EDAEPRec alleviates the blurring effect greatly and still pro-vides sharpening edgesFIGURE S4 A visual PSNR plot of DAEPRec and EDAEPRec. EDAEPRec consistently performs better than DAEPRec

TABLE S1 Performance of EDAEPRec for image Test2 with different filter number at varying acceleration factor R. The performance improves if the network width/filter number increasesTABLE S2 Performance of EDAEPRec for image Test2 with different filter size at varying acceleration factor R. The larger the filter size is, the better performance the method will achieve, while at the cost of running timeTABLE S3 Performance of EDAEPRec using network learned from complex data and magnitude data (called EDAEPRec_Mag). In EDAEPRec_Mag, the network‐driven prior is utilized to the real and imaginary components of the complex‐valued MRI data, respectively. While in EDAEPRec, the network‐driven prior is utilized to do reconstruction in the way same as that at the training stage, i.e., the real and imag-inary parts are set to be channel components in the network input simultaneously. The PSNRs obtained by EDAEPRec are 0.35~0.7 dB higher than these in EDAEPRec_Mag

How to cite this article: Liu Q, Yang Q, Cheng H, Wang S, Zhang M, Liang D. Highly undersampled magnetic resonance imaging reconstruction using autoencoding priors. Magn Reson Med. 2019;00:1–15. https ://doi.org/10.1002/mrm.27921