VLSI 2014 IEEE TITLES
-
Upload
zuaratechnologies -
Category
Documents
-
view
92 -
download
0
description
Transcript of VLSI 2014 IEEE TITLES
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
1. ASIC and FPGA Implementation of the Gaussian Mixture Model Algorithm for
Real-Time Segmentation of High Definition Video
Background identification is a common feature in many video processing systems. This paper
proposes two hardware implementations of the Open CV version of
the Gaussian mixture model (GMM), a background identification algorithm. The implemented
version of the algorithm allows a fast initialization of the background model while an innovative,
hardware-oriented, formulation of the GMM equations makes the proposed circuits able to
perform real-time background identification on highdefinition (HD) video sequences with frame
size 1920 1080. The first of the two circuits is designed with commercial field-programmable
gate-array (FPGA) devices as target. When implemented on Virtex6 vlx75t, the proposed circuit
process 91 HD fps (frames per second) and uses 3% of FPGA logic resources. The second circuit
is oriented to the implementation in UMC-90 nm CMOS standard cell technology, and is
proposed in two versions. Both versions can process at a frame rate higher than 60 HD fps. The
first version uses the constant voltage scaling technique to provide a low power implementation.
It provides silicon area occupation of 28847 m2 and energy dissipation per pixel of 15.3
pJ/pixel. The second version is designed to reduce silicon area utilization and occupies 21847
m2with an energy dissipation of 49.4 pJ/pixel.
2. Design and FPGA Implementation of High-Speed, Fixed-Latency Serial
Transceivers
Fixed-latency serial links are important components of the distributed measurement and control
systems. However, most high-speed Serializer-Deserializer (SerDes) chips do not keep the same
linklatency after each power-up or reset. In this paper, we propose a fixed-
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
latency serial transceiver based on dynamic clock phase shifting and changeable delay tuning
technologies. Our solution can process all possible phase offsets between the transmitted and
received clocks, so it relaxes the requirement of fanning in the same reference clock both to the
transmitter and to the receiver. It also eliminates the reset-relock process in the roulette approach.
We present a specific example of implementation based on the serial transceiver in Xilinx Virtex
5 FPGA. The experiment results indicate that our transceiver can achieve a
deterministic latency with sub-nanosecond precision.
3. DART: A Programmable Architecture for NoC Simulation on FPGAs
The increased demand for on-chip communication bandwidth as a result of the multicore trend
has made packet-switched networks-on-chip (NoCs) a more compelling choice for the
communication backbone in next-generation systems . However, NoC designs have many power,
area, and performance tradeoffs in topology, buffer sizes, routing algorithms, and flow control
mechanisms hence, the study of new NoC designs can be very time intensive. To address these
challenges, we propose DART, a fast and flexible FPGA-based NoC simulation architecture.
Rather than laying theNoC out in hardware on the FPGA like previous approaches , , our design
virtualizes the NoC by mapping its components to a generic NoC simulation engine, composed
of a fully connected collection of fundamental components (e.g., routers and flit queues). This
approach has two main advantages: 1) since it is virtualized it can simulate any NoC, and 2)
any NoC can be mapped to the engine without rebuilding it, which can take significant time for a
large FPGA design. We demonstrate 1) that an implementation of DART on a Virtex-II Pro
FPGA can achieve over $(100times)$ speedup over the cycle-based software simulator Booksim
, while maintaining the same level of simulation accuracy, and 2) that a more modern Virtex-6
FPGA can accommodate a 49-node DART implementation.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
4. Defense Against Primary User Emulation Attacks in Cognitive Radio Networks
Using Advanced Encryption Standard
This paper considers primary user emulation attacks in cognitive radio networks operating in the
white spaces of the digital TV (DTV) band. We propose a reliable AES-assisted DTV scheme, in
which an AES-encrypted reference signal is generated at the TV transmitter and used as the sync
bits of the DTV data frames. By allowing a shared secret between the transmitter and the
receiver, the reference signal can be regenerated at the receiver and used to achieve accurate
identification of the authorized primaryusers. In addition, when combined with the analysis on
the autocorrelation of the received signal, the presence of the malicious user can be detected
accurately whether or not the primary user is present. We analyze the effectiveness of the
proposed approach through both theoretical analysis and simulation examples. It is shown that
with the AES-assisted DTV scheme, the primary user, as well as malicious user, can be detected
with high accuracy under primary user emulation attacks. It should be emphasized that the
proposed scheme requires no changes in hardware or system structure except for a plug-in AES
chip. Potentially, it can be applied directly to today's DTV system
under primary useremulation attacks for more efficient spectrum sharing.
5. Energy-Efficient Resource Allocation in OFDM Systems With Distributed Antennas
In this paper, we develop an energy-efficient resource-allocation scheme with proportional
fairness for downlink multiuser orthogonal frequency-division multiplexing
(OFDM) systems with distributedantennas. Our aim is to maximize energy efficiency (EE) under
the constraints of the overall transmit power of each remote access unit (RAU), proportional
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
fairness data rates, and bit error rates (BERs). Because of the nonconvex nature of the
optimization problem, obtaining the optimal solution is extremely computationally complex.
Therefore, we develop a low-complexity suboptimal algorithm, which separates
subcarrier allocation and power allocation. For the low-complexity algorithm, we first allocate
subcarriers by assuming equal power distribution. Then, by exploiting the properties of fractional
programming, we transform the nonconvex optimization problem in fractional form into an
equivalent optimization problem in subtractive form, which includes a tractable solution. Next,
an optimalenergy-efficient power-allocation algorithm is developed to maximize EE while
maintaining proportional fairness. Through computer simulation, we demonstrate the
effectiveness of the proposed low-complexity algorithm and illustrate the fundamental tradeoff
between energy- and spectral-efficienttransmission designs.
6. Design Flow for Flip-Flop Grouping in Data-Driven Clock Gating
Clock gating is a predominant technique used for power saving. It is observed that the commonly
used synthesis-based gating still leaves a large amount of redundant clock pulses. Data-
driven gating aims to disable these. To reduce the hardware overhead involved, flip-flops (FFs)
are grouped so that they share a common clock enabling signal. The question of what is
the group size maximizing the power savings is answered in a previous paper. Here we answer
the question of which FFs should be placed in a group to maximize the power reduction. We
propose a practical solution based on the toggling activity correlations of FFs and their physical
position proximity constraints in the layout. Our data-drivenclock gating is integrated into an
Electronic Design Automation (EDA) commercial backend design flow, achieving total power
reduction of 15%-20% for various types of large-scale state-of-the-art industrial and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
academic designs in 40 and 65 manometer process technologies. These savings are achieved on
top of the sClock gating is a predominant technique used for power saving. It is observed that the
commonly used synthesis-based gating still leaves a large amount of
redundant clock pulses. Data-driven gating aims to disable these. To reduce the hardware
overhead involved, flip-flops (FFs) aregrouped so that they share a common clock enabling
signal. The question of what is the group size maximizing the power savings is answered in a
previous paper. Here we answer the question of which FFs should be placed in a group to
maximize the power reduction. We propose a practical solution based on the toggling activity
correlations of FFs and their physical position proximity constraints in the layout. Our data-
driven clock gating is integrated into an Electronic Design Automation (EDA) commercial
backend design flow, achieving total power reduction of 15%-20% for various types of large-
scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technol-
gies. These savings are achieved on top of the savings obtained by clock gating synthesis
performed by commercial EDA tools, and gating manually inserted into the register transfer
level design.avings obtained by clock gating synthesis performed by commercial EDA tools,
and gating manually inserted into the register transfer level design.
7. Effect of Image Downsampling on Steganographic Security
The accuracy of steganalysis in digital images primarily depends on the statistical properties of
neighboring pixels, which are strongly affected by the image acquisition pipeline as well as any
processing applied to the image. In this paper, we study how the detectability of embedding
changes is affected when the cover image is downsampled prior to embedding. This topic is
important for practitioners because the vast majority of images posted on
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
websites, image sharing portals, or attached to e-mails are downsampled. It is also relevant to
researchers as the security ofsteganographic algorithms is commonly evaluated on databases of
downsampled images. In the first part of this paper, we investigate empirically how the
steganalysis results depend on the parameters of the resizing algorithm-the choice of the
interpolation kernel, the scaling factor (resize ratio), antialiasing, and the downsampled pixel
grid alignment. We report on several novel phenomena that appear valid universally across the
tested cover sources, steganographic methods, and steganalysis features. This paper continues
with a theoretical analysis of the simplest interpolation kernel - the box kernel. By fitting a
Markov chain model to pixel rows, we analytically compute the Fisher information rate for any
mutually independent embedding operation and derive the proper scaling of the secure payload
with resizing. For least significant bit (LSB) matching and a limited range of downscaling, the
theory fits experiments rather well, which indicates the existence of a new scaling law expressing
the length of the secure payload when the cover size is modified by subsampling.
8. An FPGA-Based Fully Synchronized Design of a Bilateral Filter for Real-Time
Image Denoising
In this paper, a detailed description of a synchronous field-programmable gate array
implementation of abilateral filter for image processing is given. The bilateral filter is chosen for
one unique reason: It reduces noise while preserving details. The design is described on register-
transfer level. The distinctive feature of our design concept consists of changing the clock
domain in a manner that kernel-based processing is possible, which means the processing of the
entire filter window at one pixel clock cycle. This feature of the kernel-based design is supported
by the arrangement of the input data into groups so that the internal clock of the design is a
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
multiple of the pixel clock given by a targeted system. Additionally, by the exploitation of the
separability and the symmetry of one filter component, the complexity of the design is widely
reduced. Combining these features, the bilateral filter is implemented as a highly parallelized
pipeline structure with very economical and effective utilization of dedicated resources. Due to
the modularity of the filter design, kernels of different sizes can be implemented with low effort
using our design and given instructions for scaling. As the original form of the bilateral filterwith
no approximations or modifications is implemented, the resulting image quality depends on the
chosen filter parameters only. Due to the quantization of the filter coefficients, only negligible
quality loss is introduced.
9. Subjective evaluation of HEVC and AVC/H.264 in mobile environments
This paper compares the quality of AVC/H.264 and HEVC encoded video in low bandwidth
mobile environments. In this study, the focus within the mobile environment is smart phones.
The key characteristics of a smart phone are smaller screen size, which is usually 3.5 inches
diagonal to 5.0 inches diagonal for high end smart phones and typical cellular network
bandwidth, which is 3G or faster. Subjective evaluations were conducted to evaluate the user
experience on a mobile device with a small screen size and video coded at 200 and 400 Kbps.
The studies showed compelling evidence that a user's experience in low bandwidth mobile
environments is very similar between HEVC and AVC/H.264. The results suggest the benefits of
HEVC over AVC/H.264 in a mobile environment with lower video bitrates and resolutions are
not as clear.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
10. Improved Method to Select the Lagrange Multiplier for Rate-Distortion Based
Motion Estimation in Video Coding
The motion estimation (ME) process used in the H.264/AVC reference software is based on
minimizing a cost function that involves two terms (distortion and rate) that are properly
balanced through a Lagrangian parameter, usually denoted as motion. In this paper we propose
an algorithm to improve the conventional way of estimating motion and, consequently, the ME
process. First, we show that the conventional estimation of motion turns out to be significantly
less accurate when ME-compromising events, which make the ME process to perform poorly,
happen. Second, with the aim of improving the coding efficiency in these cases, an efficient
algorithm is proposed that allows the encoder to choose between three different values of
motion for the Inter 16x16 partition size. To be more precise, for this partition size, the
proposed algorithm allows the encoder to additionally test motion=0 and motionarbitrarily
large, which corresponds to minimum distortion and minimum rate solutions, respectively. By
testing these two extreme values, the algorithm avoids making large ME errors. The
experimental results on video segments exhibiting this type of ME-compromising events reveal
an average rate reduction of 2.20% for the same coding quality with respect to the JM15.1
reference software of H.264/AVC. The algorithm has been also tested in comparison with a
state-of-the-art algorithm called context adaptive Lagrange multiplier. Additionally, two
illustrative examples of the subjective performance improvement are provided.
11. An Overview of Information Hiding in H.264/AVC Compressed Video
Information hiding refers to the process of inserting information into a host to serve specific
purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video
domain are surveyed. First, the general framework of information hiding is conceptualized by
relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
using various data representation schemes such as bit plane replacement, spread spectrum,
histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which
information hiding takes place are then identified, including prediction process, transformation,
quantization, and entropy coding. Related information hiding methods at each venue are briefly
reviewed, along with the presentation of the targeted applications, appropriate diagrams, and
references. A timeline diagram is constructed to chronologically summarize the invention of
information hiding methods in the compressed still image and video domains since 1992. A
comparison among the considered information hiding methods is also conducted in terms of
venue, payload, bitstream size overhead, video quality, computational complexity, and video
criteria. Further perspectives and recommendations are presented to provide a better
understanding of the current trend of information hiding and to identify new opportunities for
information hiding in compressed video.
12. VLSI Architecture Design of Guided Filter for 30 Frames/s Full-HD
Video
Filtering is widely used in image and video processing for various applications. Recently, the
guided filter has been proposed and became one of the popular filtering methods. In this paper, to
achieve the computation demand of guided filtering in full-HD video, a double integral image
architecture for guided filter ASIC design is proposed. In addition, a reformation of the guided
filter formula is proposed, which can prevent the error resulted from truncation in the fractional
part and modify the regularization parameter on user's demand. The hardware architecture of
the guided image filter is then proposed and can be embedded in mobile devices to achieve real-
time HD applications. To the best of our knowledge, this paper is also the first ASIC design for
guided image filter. With a TSMC 90-nm cell library, the design can operate at 100 MHz and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
support for Full-HD (1920 1080) 30 frame/s with 92.9K gate counts and 3.2 KB on-chip
memory. Moreover, for the hardware efficiency, our architecture is also the best compared to
other previous works with bilateral filter.
13. Property Analysis of XOR-Based Visual Cryptography
A (k,n) visual cryptographic scheme (VCS) encodes a secret image into n shadow images
(printed on transparencies) distributed among n participants. When any k participants
superimpose their transparencies on an overhead projector (OR operation), the secret image can
be visually revealed by a human visual system without computation. However, the monotone
property of OR operation degrades the visual quality of reconstructed image for OR-based VCS
(OVCS). Accordingly, XOR-based VCS (XVCS), which uses XOR operation for decoding, was
proposed to enhance the contrast. In this paper, we investigate the relation between OVCS and
XVCS. Our main contribution is to theoretically prove that the basis matrices of (k,n)-OVCS can
be used in (k,n)-XVCS. Meantime, the contrast is enhanced 2(k-1)
times.
14. Effectiveness of Leakage Power Analysis Attacks on DPA-Resistant Logic Styles
Under Process Variations
This paper extends the analysis of the effectiveness of Leakage Power Analysis (LPA) attacks to
cryptographic VLSI circuits on which circuit level countermeasures against Differential Power
Analysis (DPA) are adopted. Security metrics used for assessing the DPA-resistance of crypto
core implementations, such as the minimum number to disclosure (MTD) and the asymptotic
correlation coefficient, have been extended to the case of LPA. The LPA-resistance has been
evaluated in terms of MTD as a function of the on chip noise. Noise variances up to 10000 times
greater than the signal variance have been taken into account and LPA attacks have been
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
successfully executed for all the logic styles under analysis using less than 100000
measurements. Moreover the role of process variations has been investigated through extensive
Monte Carlo simulations in order to evaluate their impact on the leakage model for the logic
styles under analysis. Results show that LPA attacks can be successfully carried out on the
different anti-DPA logic styles even in presence of process variations. To the best of our
knowledge, this work proves for the first time the effectiveness of LPA attacks in a real scenario
where on chip noise and process variations are taken into account.
15. Data Hiding in Encrypted H.264/AVC Video Streams by Codeword Substitution
Digital video sometimes needs to be stored and processed in an encrypted format to maintain
security and privacy. For the purpose of content notation and/or tampering detection, it is
necessary to perform data hiding in these encrypted videos. In this way, data hiding in encrypted
domain without decryption preserves the confidentiality of the content. In addition, it is more
efficient without decryption followed by data hiding and re-encryption. In this paper, a novel
scheme of data hiding directly in the encrypted version of H.264/AVC video stream is proposed,
which includes the following three parts, i.e., H.264/AVC video encryption, data embedding, and
data extraction. By analyzing the property of H.264/AVC codec, the codewords of
intraprediction modes, the codewords of motion vector differences, and the codewords of
residual coefficients are encrypted with stream ciphers. Then, a data hider may embed additional
data in the encrypted domain by using codeword substitution technique, without knowing the
original video content. In order to adapt to different application scenarios, data extraction can be
done either in the encrypted domain or in the decrypted domain. Furthermore, video file size is
strictly preserved even after encryption and data embedding. Experimental results have
demonstrated the feasibility and efficiency of the proposed scheme.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
16. Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images
This paper studies the impact of secure watermark embedding in digital images by proposing a
practical implementation of secure spread-spectrum watermarking using distortion optimization.
Because strong security properties (key-security and subspace-security) can be achieved using
naturalwatermarking (NW) since this particular embedding lets the distribution of the host and
watermarked signals unchanged, we use elements of transportation theory to minimize the global
distortion. Next, we apply this new modulation, called transportation NW (TNW), to design a
secure watermarking scheme for grayscale images. The TNW uses a multiresolution image
decomposition combined with a multiplicative embedding which is taken into account at the
distribution level. We show that the distortion solely relies on the variance of the wavelet
subbands used during the embedding. In order to maximize a target robustness after JPEG
compression, we select different combinations of subbands offering the lowest Bit Error Rates
for a target PSNR ranging from 35 to 55 dB and we propose an algorithm to select them. The use
of transportation theory also provides an average PSNR gain of 3.6 dB on PSNR with respect to
the previous embedding for a set of 2000 images.
17. Impulse Noise Estimation and Removal for OFDM Systems
Orthogonal Frequency Division Multiplexing (OFDM) is a modulation scheme that is widely
used in wired and wireless communication systems. While OFDM is ideally suited to deal with
frequency selective channels and AWGN, its performance may be dramatically impacted by the
presence of impulse noise. In fact, very strong noise impulses in the time domain might result in
the erasure of whole OFDM blocks of symbols at the receiver. Impulse noise can be mitigated by
considering it as a sparse signal in time, and using recently developed algorithms for sparse
signal reconstruction. We propose an algorithm that utilizes the guard band null subcarriers for
the impulse noise estimation and cancellation. Instead of relying on ell_1 minimization as done
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
in some popular general-purpose compressive sensing schemes, the proposed method jointly
exploits the specific structure of this problem and the available a priori information for sparse
signal recovery. The computational complexity of the proposed algorithm is very competitive
with respect to sparse signal reconstruction schemes based on ell_1 minimization. The proposed
method is compared with respect to other state-of-the-art methods in terms of achievable rates
for an OFDM system with impulse noise and AWGN.
18. Bit-Level Optimization of Adder-Trees for Multiple Constant Multiplications
for Efficient FIR Filter Implementation
Multiple constant multiplication (MCM) scheme is widely used for implementing transposed
direct-formFIR filters. While the research focus of MCM has been on more effective common
subexpression elimination, the optimization of adder-trees, which sum up the computed sub-
expressions for each coefficient, is largely omitted. In this paper, we have identified the resource
minimization problem in the scheduling of adder-tree operations for the MCM block, and
presented a mixed integer programming (MIP) based algorithm for more efficient MCM-based
implementation of FIR filters. Experimental result shows that up to 15% reduction of area and
11.6% reduction of power (with an average of 8.46% and 5.96% respectively) can be achieved
on the top of already optimized adder/subtractor network of the MCM block.
19. Frequency Estimation of Distorted and Noisy Signals in Power Systems by FFT-
Based Approach
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
This paper focuses on the accurate frequency estimation of power signals corrupted by a
stationary white noise. The noneven item interpolation FFT based on the triangular self-
convolution window is described. A simple analytical expression for the variance of noise
contribution on the frequency estimation is derived, which shows the variances of frequency
estimation are proportional to the energy of the adopted window. Based on the proposed method,
the noise level of the measurement channel can be estimated, and optimal parameters (e.g.,
sampling frequency and window length) of the interpolation FFT algorithm that minimize the
variances of frequency estimation can thus be determined. The application in a power quality
analyzer verified the usefulness of the proposed method.
20. Accurate and Efficient On-Chip Spectral Analysis for Built-In Testing and
Calibration Approaches
The fast Fourier transform (FFT) algorithm is widely used as a standard tool to carry out spectral
analysis because of its computational efficiency. However, the presence of multiple tones
frequently requires a fine frequency resolution to achieve sufficient accuracy, which imposes the
use of a large number of FFT points that results in large area and power overheads. In this paper,
an FFT method is proposed for on-chip spectral analysis of multi-tone signals with particular
harmonic and intermodulation components. This accurate FFT analysis approach is based on
coherent sampling, but it requires a significantly smaller number of points to make
the FFT realization more suitable for on-chip built-in testing and calibration applications that
require area and power efficiency. The technique was assessed by comparing the simulation
results from the proposed method of single and multiple tones with the simulation results
obtained from the FFT of coherently sampled tones. The results indicate that the proper selection
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
of test tone frequencies can avoid spectral leakage even with multiple narrowly spaced tones.
When low-frequency signals are captured with an analog-to-digital converter (ADC) for on-chip
analysis, the overall accuracy is limited by the ADC's resolution, linearity, noise, and bandwidth
limitations. Post-layout simulations of a 16-point FFT showed that third-order intermodulation
(IM3) testing with two tones can be performed with 1.5-dB accuracy for IM3 levels of up to 50
dB below the fundamental tones that are quantized with a 10-bit resolution. In a 45-nm CMOS
technology, the layout area of the 16-point FFT for on-chip built-in testing is 0.073 mm2, and its
estimated power consumption is 6.47 mW.
21. Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low
Adaptation-Delay
In this paper, we present an efficient architecture for the implementation of a delayed least mean
square adaptive filter. For achieving lower adaptation-delay and area-delay-power efficient
implementation, we use a novel partial product generator and propose a strategy for optimized
balanced pipelining across the time-consuming combinational blocks of the structure. From
synthesis results, we find that the proposed design offers nearly 17% less area-delay product
(ADP) and nearly 14% less energy-delay product (EDP) than the best of the existing systolic
structures, on average, for filter lengths N=8, 16, and 32. We propose an efficient fixed-point
implementation scheme of the proposed architecture, and derive the expression for steady-state
error. We show that the steady-state mean squared error obtained from the analytical result
matches with the simulation result. Moreover, we have proposed a bit-level pruning of the
proposed architecture, which provides nearly 20% saving in ADP and 9% saving in EDP over
the proposed structure before pruning without noticeable degradation of steady-state-error
performance.
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
22. Efficient Integer DCT Architectures for High Efficiency Video CODEC standard
In this paper, we present area- and power-efficient architectures for the implementation of
integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video
Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to
derive parallel architectures for 1-D integer DCT of different lengths. We also show that the
proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32
DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed
architecture could be pruned to reduce the complexity of implementation substantially with only
a marginal affect on the coding performance. We propose power-efficient structures for folded
and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the
proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy
per sample (EPS) compared to the direct implementation of the reference algorithm, on average,
for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20%
saving in EPS can be achieved by the proposed pruning algorithm with nearly the same
throughput rate. The proposed architecture is found to support ultrahigh definition 7680 4320
at 60 frames/s video, which is one of the applications of HEVC.
23. Low-Cost Low-Power ASIC Solution for Both DAB+ and DAB Audio Decoding
DAB+ is the upgraded version of digital audio broadcasting (DAB). DAB and DAB+ coexist in
many countries, so receivers are required to be compatible with both standards. In this paper, a
solution integrating an MPEG1-LayerII (MP2) decoder and an advanced audio coding
(AAC) low-complexity (AAC LC) decoder is proposed to provide basic audio decoding for both
DAB and DAB+. It also utilizes simple methods to improve high frequencies and stereo quality
instead of complicated spectrum band replication and parametric stereo. A highly integrated low-
power audio decoder design compatible with DAB/DAB+ and using a purely ASIC approach is
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
presented. As a result of the system structure optimization and hardware sharing, the audio
decoder is fabricated in 1P4M 0.18- m CMOS technology using only 3.2 mm2 silicon area
(including 147 456 bits RAM and 170 496 bits ROM). The powerconsumption of the audio
decoder is 10.4 mW for DAB audio decoding and 8.5 mW for DAB+ audio decoding.
Laboratory and field tests show that the function is correct and the audio quality is good for
receiving both DAB and DAB+. The audio decoder is thus proven to be a low-cost low-
power solution for the two existing DAB standards.
24. Low-Power Digital Signal Processor Architecture for Wireless Sensor Nodes
Radio communication exhibits the highest energy consumption in wireless sensor nodes. Given
their limited energy supply from batteries or scavenging, these nodes must trade data
communication for on-the-node computation. Currently, they are designed around off-the-
shelf low-power microcontrollers. But by employing a more appropriate processing element, the
energy consumption can be significantly reduced. This paper describes the design and
implementation of the newly proposed folded-tree architecture for on-the-node data processing
in wireless sensor networks, using parallel prefix operations and data locality in hardware.
Measurements of the silicon implementation show an improvement of 10-20 in terms of energy
as compared to traditional modern micro-controllers found in sensor nodes.
25. Memory Footprint Reduction for Power-Efficient Realization of 2-D Finite
Impulse Response Filters
We have analyzed memory footprint and combinational complexity to arrive at a systematic
design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D)
finite impulse response (FIR) filter. We have presented novel block-based structures for
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
separable and non-separable filters with less memory footprint by memory sharing and memory-
reuse along with appropriate scheduling of computations and design of storage architecture. The
proposed structures involve L times less storage per output (SPO), and nearly L times less energy
consumption per output (EPO) compared with the existing structures, where L is the input block-
size. They involve L times more arithmetic resources than the best of the corresponding existing
structures, and produce L times more throughput with less memory band-width (MBW) than
others. We have also proposed separate generic structures for separable and non-separable filter-
banks, and a unified structure of filter-bank constituting symmetric and general filters. The
proposed unified structure for 6 parallel filters involves nearly 3.6L times more multipliers, 3L
times more adders, (N2-N+2) less registers than similar existing unified structure, and computes
6L times more filter outputs per cycle with 6L times less MBW than the existing design, where
N is FIR filter size in each dimension. ASIC synthesis result shows that for filter size (4 4),
input-block size L=4, and image-size (512 512), proposed block-based non-separable and
generic non-separable structures, respectively, involve 5.95 times and 11.25 times less area-
delay-product (ADP), and 5.81 times and 15.63 times less EPO than the corresponding existing
structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO
than the corresponding existing structure.
26. Ultra-High Throughput Low-Power Packet Classification
Packet classification is used by networking equipment to sort packets into flows by comparing
their headers to a list of rules, with packets placed in the flow determined by the matched rule. A
flow is used to decide a packet's priority and the manner in which it is processed. Packet
classification is a difficult task due to the fact that all packets must be processed at wire speed
and rulesets can contain tens of thousands of rules. The contribution of this paper is a hardware
accelerator that can classify up to 433 million packets per second when using rule sets containing
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
tens of thousands of rules with a peak power consumption of only 9.03 W when using a Stratix
III field-programmable gate array (FPGA). The hardware accelerator uses a modified version of
the HyperCuts packet classification algorithm, with a new pre-cutting process used to reduce the
amount of memory needed to save the search structure for large rulesets so that it is small
enough to fit in the on-chip memory of an FPGA. The modified algorithm also removes the need
for floating point division to be performed when classifying a packet, allowing higher clock
speeds and thus obtaining higher throughputs.
27. A Configurable and Low-Power Mixed Signal SoC for Portable ECG Monitoring
Applications
This paper describes a mixed-signal ECG System-on-chip (SoC) that is capable of implementing
configurable functionality with low-power consumption
for portable ECG monitoring applications. A low-voltage and high performance analog front-end
extracts 3-channel ECG signals and single channel impedance measurement with
high signal quality. A custom digital signal processor provides the configurability and advanced
functionality like motion artifact removal and R peak detection. The SoC is implemented in
0.18m CMOS process and consumes minimum 31.1W from a 1.2V.
28. Partial Access Mode: New Method for Reducing Power Consumption of Dynamic
Random Access Memory
Demands have been placed on a dynamic random access memory (DRAM) to not only have
increasedmemory capacity and data transfer speed, but also have reduced operating and standby
currents. When a system uses a DRAM, a refresh operation is necessary because of its data
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
retention time restriction: each bit of the DRAM is stored as an amount of electrical charge in a
storage capacitor that is discharged by the leakage current. Power consumption for the refresh
operation increases in proportion to the memory capacity. We propose
a new method to reduce the refresh powerconsumption by effectively extending the memory cell
retention time. Conversion from 1 cell/bit to$2^{N}$ cells/bit reduces the variation in the
retention time among memory cells. Although active powerincreases by a factor of $2^{N}$ ,
the refresh time increases by more than $2^{N}$ as a consequence of the fact that the majority
decision does better than averaging for the tail distribution of retention time. The conversion can
be realized very simply from the structure of the DRAM array circuit, and it reducesthe
frequency of disturbance and power consumption by two orders of magnitude. On the basis of
this conversion method, we propose
a partial access mode to reduce power consumption dynamically when the full memory capacity
is not required.
29. Reliability-Oriented Placement and Routing Algorithm for SRAM-Based FPGAs
As the feature size shrinks to the nanometer scale, SRAM-based FPGAs will become
increasingly vulnerable to soft errors. Existing reliability-
oriented placement and routing approaches primarily focus on reducing the fault occurrence
probability (node error rate) of soft errors. However, our analysis shows that, besides the fault
occurrence probability, the propagation probability (error propagation probability) plays an
important role and should be taken into consideration. In this paper, we first propose a cube-
based analysis algorithm to efficiently and accurately estimate the error propagation
probability. Based on such a model, we propose a novel reliability-
oriented placement and routingalgorithm that combines both the fault occurrence probability and
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
the error propagation probability together to enhance system-level robustness against soft errors.
Experimental results show that, compared with the baseline versatile place and route technique,
the proposed scheme can reduce the failure rate by 20.73%, and increase the mean time between
failures by 39.44%.
30. Time-Based All-Digital Technique for Analog Built-in Self-Test
A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-
chip voltages in an all-digital manner is presented. The method is well suited for a distributed
architecture, where the routing of analog signals over long paths is minimized. A clock is routed
serially to the sampling heads placed at the nodes of analog test voltages. This sampling head
present at each testnode, which consists of a pair of delay cells and a pair of flip-flops, locally
converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as
many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the
corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew
between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS
process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a
few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals
in the absence of sample-and-hold circuitry.
31. Improved 8-Point Approximate DCT for Image and Video Compression Requiring
Only 14 Additions
Video processing systems such as HEVC requiring low energy consumption needed for the
multimedia market has lead to extensive development in fast algorithms for the efficient
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression
standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT
transforms have been proposed that offer superior compression performance at very low circuit
complexity. Such approximations can be realized in digital VLSI hardware using additions and
subtractions only, leading to significant reductions in chip area and power consumption
compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-
point DCT approximation that requires only 14 addition operations and no multiplications. The
proposed transform possesses low computational complexity and is compared to state-of-the-art
DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The
proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC.
The proposed transform and several other DCT approximations are mapped to systolic-array
digital architectures and physically realized as digital prototype circuits using FPGA technology
and mapped to 45 nm CMOS technology.
32. Reconfigurable CORDIC-Based Low-Power DCT Architecture Based on Data
Priority
This paper presents a low-power coordinate rotation digital computer (CORDIC)-based
reconfigurable discrete cosine transform (DCT) architecture. The main idea of this paper is based
on the interesting fact that all the computations in DCT are not equally important in generating
the frequency domain outputs. Considering the importance difference in the DCT coefficients,
the number of CORDIC iterations can be dynamically changed to efficiently tradeoff image
quality for power consumption. Thus, the computational energy can be significantly reduced
without seriously compromising the image quality. The proposed CORDIC-based 2-D DCT
architecture is implemented using 0.13 m CMOS process, and the experimental results show
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
that our reconfigurable DCT achieves power savings ranging from 22.9% to 52.2% over the
CORDIC-based Loeffler DCT at the cost of minor image quality degradations.
33. Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip
As technology shrinks, the power dissipated by the links of a network-on-chip (NoC) starts to
compete with the power dissipated by the other elements of the communication subsystem,
namely, the routers and the network interfaces (NIs). In this paper, we present a set of data
encoding schemes aimed at reducing the power dissipated by the links of an NoC. The proposed
schemes are general and transparent with respect to the underlying NoC fabric (i.e., their
application does not require any modification of the routers and link architecture). Experiments
carried out on both synthetic and real traffic scenarios show the effectiveness of the proposed
schemes, which allow to save up to 51% ofpower dissipation and 14% of energy consumption
without any significant performance degradation and with less than 15% area overhead in the NI.
34. Achieving High-Performance On-Chip Networks With Shared-Buffer Routers
On-chip routers typically have buffers dedicated to their input or output ports for temporarily
storing packets in case contention occurs on output physical channels. Buffers, unfortunately,
consume significant portions of router area and power budgets. While running a traffic trace,
however, not all input ports of routers have incoming packets needed to be transferred
simultaneously. Therefore, a large number of buffer queues in the network are empty and other
queues are mostly busy. This observation motivates us to design router architecture with shared
queues (RoShaQ), router architecture that maximizes buffer utilization by allowing the sharing
-
Zuara Technologies Battle with bugs
No.82, Station road, Radha nagar, Chrompet, Chennai-44. Mobile: 09095188016. Mail.id: [email protected]
Web site: www.zuaratech.com
82, Station road, Radha nagar,Chrompet Chennai-44
Mob.: 9095188016/9677465689
multiple buffer queues among input ports. Sharing queues, in fact, makes using buffers more
efficient hence is able to achieve higher throughput when the network load becomes heavy. On
the other side, at light traffic load, our router achieves low latency by allowing packets to
effectively bypass these shared queues. Experimental results on a 65-nm CMOS standard-cell
process show that over synthetic traffics RoShaQ has 17% less latency and 18% higher
saturation throughput than a typical virtualchannel (VC) router. Because of its higher
performance, RoShaQ consumes 9% less energy per transferred packet than VC router given the
same buffer space capacity. Over real multitask applications and E3S embedded benchmarks
using near-optimal NMAP mapping algorithm, RoShaQ has 32% lower latency than VC router
and targeting the same application throughput with 30% lower energy per packet.
35. Energy Efficiency Optimization Through Codesign of the Transmitter and Receiver
in High-Speed On-Chip Interconnects
A novel equalized global link architecture and driver-receiver codesign flow are proposed for
high-speed and low-energy on-chip communication by utilizing a continuous-time linear
equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the
formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver-receiver design flow, over 50% energy reduction is observed.
The final optimal solution achieves 20-Gb/s signaling over 10 mm, 2.6- m pitch on-chip
transmission line with 15.5-ps/mm latency and 0.196-pJ/b energy using 45-nm technology.
Monte Carlo simulation also shows that 3 / for power and delay variation in the proposed
global link are 13.1% and 4.6%, respectively.