Project Presentation: Physical Unclonable Functions

21
Project Presentation: Physical Unclonable Functions Michelle Dickson

description

Project Presentation: Physical Unclonable Functions. Michelle Dickson. Outline. Project Goals Resource Selection PUF Architecture Implementation Results Status & Future Work Conclusion. Project Goals. Implement a Physical Unclonable Function - PowerPoint PPT Presentation

Transcript of Project Presentation: Physical Unclonable Functions

Project Presentation: Physical Uncloneable Function

Project Presentation: Physical Unclonable FunctionsMichelle Dickson

OutlineProject GoalsResource SelectionPUF ArchitectureImplementationResultsStatus & Future WorkConclusion

Project GoalsImplement a Physical Unclonable FunctionDetermine feasibility of an authentication scheme based on the PUFs unique key generationIs such an implementation robust enough to withstand environmental variations?

I initially took interest in this project topic because I was skeptical about the robustness of the Physical Unclonable Function. I wanted to determine if an implemented PUF could produce consistent results across variations in the environment in order to decide if an authentication scheme based on the PUFs ability to produce a unique key would be feasible.3Resource SelectionHardware: Virtex-II Pro FF1152 Development Board2VP20 FPGAQTY: 2

I was able to obtain two development boards for use for the project. The development boards are Virtex-II Pro FF1152 Development Boards with 2VP20 FPGAs. I wanted to have two boards so I could confirm that the results produced by the implemented PUF did indeed vary with different hardware.4Resource SelectionToolsXilinx ISE Version 9.2iProject NavigatoriMPACTFPGA EditorConstraints EditorPACEModelsim XE III Version 6.2g

In order to implement the PUF on the FPGA, I also had to obtain a few tools. For implementation, I used the Xilinx ISE Version 9.2i toolset. This includes the Project Navigator, iMPACT for programming the device, FPGA editor for manually controlling placement and routing, Constraints Editor for setting up timing constraints, and PACE for setting area constraints. I also used Modelsim XE III Version 6.2g for simulation.5

PUF ArchitectureCommon PUF ArchitecturesArbiter PUF

Ring Oscillator PUF

After selecting the hardware for use, I had to decide on the architecture for my design. There are three common PUF architectures that are discussed by researchers. The first is the arbiter PUF. In this scheme, a signal pulse is input to two 2-input multiplexors. The path lengths must be identical. A selector bit determines the output of the two multiplexors, and that output is routed as input into two more multiplexors as shown in the image. An array of these are cascaded, and the output of the last multiplexor is latched. In this implementation, the output will be solely based on the variation in trace lengths that are introduced my manufacturing. In essence, the output is the result of a race between two different paths that are theoretically the same length.The second type of PUF is a Ring Oscillator PUF. In this architecture, a series of inverters are connected to each other and looped back around as input to the series. The result is an oscillator whose frequency depends on the delay in the inverters and in the interconnecting traces. Again due to manufacturing variations, the frequencies will vary for every ring oscillator implementation. The output frequencies are counted and compared, and a bit is produced as a result of the comparison.6PUF ArchitectureCommon PUF Architectures (continued)Butterfly PUF

The third common type of PUF implementation is the butterfly PUF. This method is based on the idea of creating structures that behave similarly to an SRAM cell during the startup phase. The butterfly PUF cell is a cross-coupled circuit which can be brought to a floating or unstable state before allowing it to settle to one of two possible states. The structure is constructed with symmetric routing. Each latch has a PRE input (which turns Q to 1) and a CLR input (which turns Q to 0). The PRE signal on the top latch and the CLR signal on the bottom latch are tied low, while the excite signal is tied to the CLR signal on the top and the PRE signal on the bottom. The outputs of the latch are cross-coupled. The excite signal is brought high for a few clock cycles and then returned to low. The output will settle to a state, which produces the single output bit.7Selected PUF ArchitectureRing Oscillator PUF ImplementationEach RO is comprised of one NAND gate and 40 inverters16 ROs implemented on the FPGACompare the outputs of 2 ROsIf the result is greater than, output is 1If the result is less than or equal, output is 0Output is 8-bit signatureMotivation for selecting RO implementationFairly simple to implementDoes not require careful routing or layoutDifferences in oscillator frequencies will dominate skews in routingExtensive work published on RO implementations

For my project, I chose to implement a Ring Oscillator PUF. I chose this because it seemed like it would be fairly simple to implement and it does not require the careful manual routing or layout that the other two require. This is because the differences in the oscillator frequencies will dominate the skews that are caused by asymmetric routing. Finally, I chose this method because there is a lot of data available for Ring Oscillator PUF implementations and many claims to be environmental-resilient. For my implementation, I did a simple 8-bit PUF. I have 16 Ring Oscillators implemented on the FPGA and each Ring Oscillator is comprised of one NAND gate and 40 inverters. The outputs of 2 Ring Oscillators are compared to produce a single bit of the 8-bit signature. For my implementation, if the comparison results in A greater than B, the output is a 1. Otherwise, the bit is a 0. The 8 comparisons produce an 8-bit signature for the device.8ImplementationRing Oscillator component implemented in schematic

Using the Xilinx ISE tools, I implemented a single Ring Oscillator by creating it in a schematic. Here we see the NAND gate, 40 inverters, the output of the final inverter is looped back to the input of the NAND gate and is also connected to the clock input on the 16-bit counter. A clock enable signal tells the counter when to increment, and a clear signal will reset the count output to 0.9ImplementationInputs to Ring OscillatorFeed input: tied to 1PUF enable input: enables the counterClear input: clears the counter to 0sOutput from Ring OscillatorOutput is 15 bit count value

There are three inputs to the Ring Oscillator: the feed input, the PUF enable input, and the clear input. The Feed input is tied high. The PUF enable input is high when the counter should be incrementing and counting the oscillation cycles produced by the combinational logic. The clear input is tied to a system reset and will reset the counter output to 0.

The output of the Ring Oscillator is a 15 bit count value.10ImplementationThe rest of the circuit is implemented in VHDLInstantiate 16 ROsCompare the count of two ROs after a certain period of time to produce a bitIf A is greater than B, the bit value is 1Else, the bit value is 0Oscillation time varied from several thousand clock cycles to 40 secondsThe 8-bit output value is displayed on LEDs on the development boardTo verify functionality of the rest of the circuitCreate a testbench with skewed clock inputs for the ROsRun simulation in ModelsimVerify that PUF bits accurately reflect the variation in oscillator frequencies

The rest of the circuit is implemented in VHDL. I instantiate 16 Ring Oscillators and map the signals appropriately. Then I compare the count for two Ring Oscillators and produce a bit based on the outcome of the comparison. When I implemented this design, I varied the oscillation time from several thousand clock cycles all the way up to 40 seconds. The 8-bit output value was displayed on the 8 on-board LEDs. An illuminated LED corresponded to a bit of 0 while a non-illuminated LED corresponded to 1. A push-button on the board generated a system reset.

The ring oscillator circuit cannot be verified through simulation because the tool will continue to evaluate the combinational logic loop and will quickly time-out. Therefore, I was only able to verify the rest of the circuits functionality through simulation. To do this, I created a testbench and added skewed clock inputs for the Ring oscillators. I was then able to verify that each PUF bit accurately reflected the variation in the oscillator frequencies.11

ImplementationAfter verifying the rest of the circuit through simulation, implement the PUF in actual hardwareSynthesize designImplementGenerate programming file and configure device

After I verified that the rest of the circuit functioned as expected, I was ready to implement the PUF in the FPGA. Using the Xilinx ISE tools, I synthesized the design. The generated RTL schematic is shown here. I verified that the RTL schematic looked accurate, implemented the design, and generated the programming file to configure the device.12ImplementationObstaclesFirst, I had to set up the development environment and familiarize myself with the board and the toolsSearch the internet for board documentationVerify that I am able to program the FPGA and drive the outputsXilinx tool attempts to optimize the circuit and removes the useful componentsThe result is an empty logic design that cannot be mappedLimited visibility of internal logic values makes troubleshooting difficultSometimes the circuit appeared to be functioning as desired, but in reality it was notEvery time a bit file is created and synthesized in hardware, the results varyAnother troubleshooting hurdle

At this point, I encountered a number of obstacle. The first obstacle was just to set up my development environment and become familiar with the hardware and the tools. I hadnt done any VHDL coding in a couple years and had never set up a development environment from scratch. In addition, the board I acquired did not come with any documentation, but thanks to the internet, I was able to find a users manual. So before I could do anything else, I needed to prove that I could burn the FPGA and get it to behave in an expected manner. After this was accomplished, I was ready to try implementing my PUF design. Here I found a number of hurdles. First off, the Xilinx tools dont like for you to use combinational logic to drive clock inputs and they gripe about it quite a bit. In addition, if a circuit function doesnt directly drive an output, the tools think it is unnecessary and they will remove it from the design by default. The result was that the synthesis phase would succeed, but the implementation (map, place, and route) would fail because it would trim out all of the logic. Once I was able to actually get something to place and route successfully, troubleshooting was a challenge because I only had 8 LEDs and a reset button to interface with. The result was that sometimes things would appear to be working, but as I continued to troubleshoot, I would learn that they werent in fact doing what I thought. Another anomaly made troubleshooting difficult as well. This was the fact that every time I implemented the design (after a small VHDL change, for example) the mapping and place and route were not the same as the previous time. This meant that between minor modifications, results could vary drastically.13ImplementationSolutions to some obstaclesTo prevent the Xilinx tool from removing the circuitEach net has to be assigned a KEEP attribute with a value of TRUEDisable equivalent register removalDisable optimization propertiesDisable trim unconnected signals*This allowed me to actually synthesize the design in hardwareTo create more consistent results between implementations on the same deviceLimit max fanout to 5Create area constraints for each Ring Oscillator*This simplified my troubleshooting efforts

I was able to overcome most of these obstacles. First, to prevent the Xilinx tool from removing the circuit, I had to assign a KEEP attribute to each net and set the value to TRUE. In addition, I had to tweak synthesis and implementation properties, including disable equivalent register removal, disable optimization properties, and disable the trim unconnected signals feature. This allowed me to actually synthesize the design in the hardware.Second, to create more consistent results between implementations on the same device (after making small schematic or VHDL changes) I limited the max fanout to 5 and created area constraints for each ring oscillator. This simplified my troubleshooting efforts because between each change, I could more accurately see the effect of the change.14ImplementationPlaced and routeddesign

This image just shows the placed and routed design. 8 main blocks can be seen. These are the locations to which I constrained the Ring Oscillators.15ResultsI tested my PUF implementation on two different development boardsTests for the reference were completed at ambient temperature with nominal power inputsThrough troubleshooting, I foundOscillators were indeed oscillating at different frequenciesTo verify this, I simply changed the VHDL to check for count equality instead of inequality and verified that each comparison consistently produced a FALSE valueVarying the oscillation time before checking the PUF output did not seem to make a large differenceWhether I waited several thousand clock cycles or 40 seconds, the output seemed to have the same consistencyFor this reason, I chose a shorter oscillation period such that the counter would not cycle back to 0x0000 and result in potentially inconsistent comparison resultsResults were most consistent when the ring oscillator counter is always enabled and when the ring oscillator feed input is always tied to 1The alternative was to only activate these inputs during the oscillation time before checking for inequality

As I mentioned before, I obtained two development boards so that I could compare results between the two boards. Up to this point, all tests have been conducted at ambient temperature with nominal power inputs.Through troubleshooting, I was able to determine that the oscillators were indeed oscillating at different frequencies. To verify this, I simply changed the VHDL to check for count equality instead of inequality and verified each comparison consistently produced a false value. I was also able to determine that varying the length of time I allowed the oscillators to oscillate before capturing the count value didnt seem to make a big difference for the output. Whether I waited several thousand clock cycles or 40 seconds, the output seemed to be the same with the same consistency. As a result, I chose a shorter oscillation period such that the counter would not cycle back to 0x0000 because I worried that this had higher potential to produce inconsistent results.Finally, I found that the results were the most consistent when the ring oscillator counter was always enabled and when the feed input was always tied to 1. For this reason, both these inputs were held high and constant. The alternative was to only bring these inputs high during the oscillation time before checking for inequality, but as I stated, these results were less consistent.16ResultsUsing the same bit file, each board produced a unique 8-bit output; however, they only differed by one bitBoard 1 produced the output 01001101Board 2 produced the output 01011101The results were not very consistent, even at ambient conditionsOut of 100 trials, Board 1 produced a different output 10 timesTwo bits were not consistentBit 6 varied 3 times out of 100Bit 2 varied 7 times out of 100Out of 100 trials, Board 2 produced a different output 40 timesTwo bits were not consistentBit 6 varied 2 times out of 100Bit 4 varied 39 times out of 100NOTE: On one trial, both bit 6 and bit 4 varied

17Status & Future WorkSince I havent been able to obtain consistent results at ambient, I have not experimented with any environmental variationsTo improve the consistency of the results, future work would includeCalculate the actual frequency produced by each oscillatorSelect those oscillators with frequencies that are farther apart for comparisonPro: Results will be more consistent and presumably less susceptible to environment variationsCon: This means that each time a PUF is implemented in hardware, it requires manual tweaking to ensure consistency

Status & Future WorkTo improve the randomness of results across different hardwareImplement the Ring Oscillator as a hard macroDont put area constraints on the place and route toolPro: This will make the output unique for each piece of hardware in which the PUF is implementedCon: Cant guarantee consistencyTo improve practicalityExpand the circuit to generate a 128 bit key instead of an 8 bit keySet up a challenge-response based authentication scheme and use board communication channelCascade two boards together to determine feasibility of a system-level signature

ConclusionAlthough I havent been able to obtain consistent results to date, I can see that a Ring Oscillator PUF could be used to generate a unique hardware ID. However, implementation difficulty has been over-simplified by researchers.From my project experience, I dont believe consistent results can be obtained without manual intervention and significant testingThis results in added production costsEnvironmental variations would only exacerbate the problem Adding multiplexors to select which oscillators to compare would make a challenge-response authentication scheme possibleAgain, manual intervention and testing would be required during production to ensure adequate results

Questions?Please contact Michelle [email protected]@iastate.edu