Project 2 Snoopy Cache 2 - University of California, San...
Transcript of Project 2 Snoopy Cache 2 - University of California, San...
1
ECE 254A
Advanced Computer Architecture: Supercomputers
Fall 2006
University of California, Santa Barbara
Department of Electrical and Computer Engineering
Project 2
“Designing a Snoopy Cache”
Ali Umut IRTURK 789139-3
ECE Department & ECON Department
Graduate Student
11/6/2006
2
1) OVERVIEW OF THE PROJECT 3
2) DISCOVERING THE INPUT AND OUTPUT PORTS 4
3) DETAILED INFORMATION ABOUT THE DESIGNED CACHES: 8
4) TEST BENCHES: 14
5) FIGURES 24
6) CODES: 34
3
1) Overview of the Project
The aim in this project is to design a snoopy cache protocol which maintains
coherence for multiple processors using Verilog. In my projecet, I designed 7 blocks to
accomplish this cache protocol fully functional.
The design modules are:
1) Cpu: There are two Cpus in my design. These design units basically requests read
or write to the caches.
2) Cache: There are two Caches in my design. The caches are two-way associative.
There are 8 entries in the cache, and each cache entry has 11 bits, includes data,
tag, update, dirty and valid bits. This design subjects are considered in detail in
the following sections.
3) Memory Mapping Unit: This unit is for converting the virtual addess to the
physical addess. I designed the virtual addess line as 7 bits and the physical
addess line is 5 bits. This is done basically by cutting the most significant two bits
of the virtual addess.
4) Memory Bus Controller: Because we are using different modules which can
access the bus at the same time, the memory bus controller is designed.
5) Memory: The memory has 32 entries, each entry has 10 bits.
These design modules can seen in figures 1-6 and the next section which is named
“discovering the inputs and outputs” gives very detailed information about the usage of
the modules.
The most important part of the design is using:
1) Two phase clocking: This gives us the advantage of using the two edges of the
clock during the state changes.
4
2) Snooping and Invalidation: Snooping Protocols maintain coherence for multiple
processors. To maintain the coherence requirement in snooping protocols, I used
write invalidate protocol.
In the third section, I gave very detailed information about these most important parts
of the project.
After implementing these modules, I wrote 4 different test benches to see my
project is working properly. I designed test benches for read misses, write misses,
snooping and invalidation. The detailed information is given in the fourth section of
the project report.
2) Discovering the Input and Output ports
I will consider every component one by one, and find these input and output ports.
However, at this step I didn’t specify the length of the inputs and outputs.
a) Cpu:
As I said in my design, I implemented two cpus, Cpu A, Cpu B and two caches,
Cache A, Cache B. When the information is needed from the caches or the information is
needed to write, the one of the cpu s accesses to the cache. Thus;
When any need of information is considered;
i) The Cpu must inform this situation by “a read signal.”
ii) The Cpu must inform where the data is by “addess bits”. However at this
step, we are using Memory Mapping Unit.
When writing is considered
iii) The Cpu must inform this situation by “a write signal.”
iv) The Cpu must inform which data is need to be written by “data bits”.
5
v) The Cpu must inform where the data will be written by “addess bits”.
(used in the any need of information process) However at this step, we are using
Memory Mapping Unit.
This shows that there must be 4 outputs from Cpus to the Caches (Cache inputs
from Cpu). I designated them using cpu_cac_NAME_NAMEoftheCPU or
cpu_mmu_NAME_NAMEoftheCPU. Basically the read signal and write signal can be
accomplished by 1 bit. However the addess and data bits will be decided later.
This module can be seen in Figure 1.
b) Memory Mapping Unit and the Relationship between Cpus, MMU and
Caches
The aim of the usage of the Memory Mapping Unit is for converting the virtual
addess to the physical addess. I designed the virtual addess line as 7 bits and the physical
addess line is 5 bits. Basically converting the virtual addess line to physical addess line is
done by cutting the most significant two bits from the virtual addess line.
The addess bits from Cpu A or Cpu B comes to Memory Mapping Unit as an input. After
converting the addess bits to physical addess, MMU sends the addess bits to Cache A or
Cache B.
This module can be seen in Figure 2.
c) Memory:
If a miss occurs in the Cache after Cpu’s request. The cache must access to the
memory, for retrieving data. Thus, memory needs an output to the Cache for transferring
it to the cache:
Output Ports:
6
i) The requested data send by “data bits” from Memory to the Cache. I
designated this using mem_cac_data.
ii) Because we are dealing with different caches and we have a Memory Bus
Controller, there must be a bit indicator which shows the caches the desired data is
available. I designated this output as data_avail_memA or B which is just 1 bit.
Input Ports:
Input ports for the Memory come from the Caches. As I mentioned before, if a miss
occurs, the memory must be accessed for to retrieve the desired block or if the Cpu wants
to write information to the Memory using Cache, there must be several outputs form
Cache to the Memory.
If a miss occurs in Cache
i) This information must be given to Memory by sending “a read bit.”
ii) The Cache must inform where the data is by “address bits”.
If writing situation is considered
iii) The Cache must inform this situation by “a write bit.”
iv) The Cache must inform which data is need to be written by “data bits”.
v) If a priority writing situation occurs, I designated a signal to indicate this
situation which is just 1 bit, priority_wrt_A or B.
This shows that there must be 5 outputs from Cache to the Memory (Memory
inputs from Cache). They are designated as cac_mem_NAME_NameoftheCache.
Basically the read bit, write bit and priority write bit can be accomplished by 1 bit.
This module can be seen in Figure 3.
7
d) Memory Bus Controller
Because we are using different modules which can access the bus at the same
time, the memory bus controller is required in this project. The Memory Bus Controller
receives the request and gives the control of the bus to the requester block. The aim of the
priority write bit is to designate the priority of the write back process.
Inputs to the Memory Bus Controller:
i) The request from cache A: bus_req_A
ii) Priority request from Cache A: priority_ req_A
iii) The request from cache B: bus_ req_B
iv) Priority request from Cache B: priority_ req_B
v) The request from memory: bus_ mem
Output to the Caches and Memory:
vi) The bit data shows the bus is given to the Cache A: bus_A
vii) The bit data shows the bus is given to the Cache B: bus_B
viii) The bit data shows the bus is given to the Memory: bus_mem
This module can be seen in Figure 4.
e) Cache:
The caches are the other important parts of this design. There must be several
outputs from Cache to the Cpu and Memory. And there are several inputs from the other
design modules: Cpu, MMU, Memory and Memory Bus Controller.
The relationship between Cache and Cpu
As discussed in the Cpu part, there must be 4 inputs from the Cpu to the Cache,
such as read, write, addess and data bits. And there must be another input from the MMU,
as the addess bits which is physical addess.
8
Output Ports:
If the Cpu gives read signal and send the addess of the data
i) If the data requested by the processor appears in the Cache, this is called
hit. First this information must be given to Cpu by sending “a hit bit.” And the found
data must be sent back to Cpu, so Cache needs an output to the Cpu to send “data bits”.
ii) If the data is not found in the Cache, the request is called a miss. The
memory is then accessed to retrive the block containing the requested data. This
information must be given to the Cpu by sending “a miss bit.”
This shows that there must be 3 outputs from Cache to the Cpu (Cpu inputs from
Cache). These are designated by cac_cpu_NAME_NameoftheCache. Basically the hit
signal and miss signal can be accomplished by 1 bit.
As a result, by considering the above blocks, we can draw the cache block without
considering snooping and invalidation which can be seen in Figure 5.
At this point we designed every required module for the project. I combineD these
modules in Figure 7 which we can see the general picture.
3) Detailed information About the Designed Caches:
Set Associative Cache Design, Snooping and Invalidation and Write Back
Where can a block be placed in a cache?
As we know we have three different for this question:
1) Direct Mapped Cache Design
2) Fully Associative Cache Design
3) Set Associative Cache Design
9
In the first project which is the design of a simple cache, I used direct mapped cache
design. In this project, I used Two-Way Set Associative cache design to make
implementation more realistic.
In this kind of cache design, a block can be placed in a restricted set of places in the
cache. Here a set is group of blocks in the cache. A block is first mapped onto a set, and
then the block can be placed anywhere within that set. The set is chosen by bit selection;
that is,
(Block addess) MOD (Number of sets in the cache which is 2 in this design)
For every data there are two blocks for storage at the same index. We can consider this
situation like two pages on top of each other. This gives us a better understanding about
concept.
But this situation gives us another important question;
Which block should be replaced on a Cache Miss?
After a miss occurs, the cache controller must select a block to be replaced with the
desired data. In our situation there are two possible blocks to replace the desired data. As
we know there are three primary strategies employed for selecting which block to
replace:
1) Random
2) Least-Recently Used
3) First in, First out
In this project I used the second strategy, Least-Recently Used (LRU). In this
approach, we are reducing the chance of throwing out information that will be needed
soon. For achieve this stability, accesses to blocks must be recorded. I achieved this by
using update bits in cache entries. So, relying on the past to predict future, the block
replaced id the one that has been unused for the longest time. In my situation there are
two pages means that there are two possible blocks to replace the desired. I always check
the update bits to understand which is recently used, it must be 1. If I write a data to an
index, I make the update bit 1 and it is important to make the update bit in the next page 0
for later.
10
Here comes another important question:
What happens on a Write?
As we know again, there are two basic options when writing to the cache:
1) Write Through
2) Write Back
I used Write Through in the simple cache design which is easier to implement
than write back. In that situation, the information was written to both the block in the
cache and to the block in the memory. However, in this project there are different
important concepts which I will discuss later in this project report. Thus, I used Write
Back method in this project. In this method, the information is written only to the block
in the cache. The modified cache block is written to memory only when it is replaced.
When using Write Back method, a new feature must be introduced. Usage of dirty bits
reduces the frequency of writing back blocks on replacement. This status bit indicates
whether the block is dirty which means it is modified while in the cache or clean which
means that it is not modified. So, if it is clean, the block is not written back on a miss,
because identical information to the cache can be found in memory. This will be another
bit in the cache entry. Thus, I used the advantage of write back method which gives us
less usage of memory bandwidth.
The cache entry in this project is in Figure.
Figure - Cache Entry. The additions from the first project are the dirty bit and update bit
which is described above.
11
After we decided the Cache Entry, we need to consider the question that how is a block
found if it is the cache.
Snooping Protocols:
This part is the most important part of the project. Snooping Protocols maintain
coherence for multiple processors. The actual name of these types of protocols is called
cache coherence protocols. The key to implementing a cache coherence protocol is
tracking the state of any sharings of a data block. Actually there are two classes of
protocols:
1) Directory based
2) Snooping
In my project, we considered the snooping protocols. In this protocol, every cache
that has a copy of the data from a block of physical memory also has a copy of the
sharings status of the block, and no centralized state is kept. The point is the caches are
on a shared-memory bus, and all cache controllers snoop on the bus to determine whether
or not they have a copy of a block that is requested on the bus.
To maintain the coherence requirement in snooping protocols, there are two
methods:
1) Write invalidate protocol
2) Write update or Write broadcast protocol
In my design, I used write invalidate protocol as described in our lectures. In Write
invalidate protocol, processor has exclusive access to a data item before it writes that
item. The name is write invalidate because it invalidates other copies on a write.
Exclusive access ensures that no other readable or writeable copies of an item exist when
the write occurs because all other cached copies of the item are already invalidated.
For Invalidation, the processor simply acquires bus access and broadcasts the address
to be invalidated on the bus. All processors continuously snoop on the bus, watching the
12
addresses. The processors check whether the address on the bus is in their cache. If so,
the corresponding data in the cache are invalidated.
Cache State Transitions:
We have three states which can be seen from the figure 9: Invalid, Shared, Exclusive.
In Invalid State:
There are two states of addressed cache block, read miss and write miss.
Suppose a CPU requests a read, and read miss occurred. Then the read miss must
be placed on the bus, and after the data stored in the cache, the state must be changed to
Shared.
Suppose a CPU requests a write, and write miss occurred. Then the write miss
must be placed on the bus. And after the data is stored in the cache, the state must be
changed to Exclusive.
In Shared State:
There are three states of addressed cache block, read miss, read hit and write miss.
Suppose a Cpu requests a read, and read miss occurred. Then the read miss must
be placed on the bus. After the data is stored in the cache, the state must stay in the
Shared state.
Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in
the Shared state.
Suppose a Cpu requests a write, and write miss occurs. Then the write miss must
be placed on the bus. After the data is stored in the cache, the state must be changed to
Exclusive State.
Exclusive State:
There are four states of addressed cache block read miss, read hit, write hit and write
miss.
Suppose a Cpu requests a read, and read miss occurred. Then the read miss must
be placed on the bus. After the data is stored in the cache, the state must be changed to
Shared State.
13
Suppose a Cpu requests a read, and read hit occurred. Then the state must stay in
the Exclusive State.
Suppose a Cpu requests a write, and write miss occurs. Then the write miss must
be placed on the bus. After the data is stored in the cache, the state must stay in Exclusive
State.
Suppose a Cpu requests a write and write hit occurred. Then the state must stay in
Exclusive state.
This different situation can be seen perfectly from the figure 9.
14
4) Test Benches:
1) Testing Read processes:
The goal is to see that Memory Bus Controller is working properly
In my design, the cache entries are filled by zeros. As we know I have two different
Cpus. These Cpus are sent read request at the same time to different caches. The Cpu A
sends request to Cache A and Cpu B sends request to Cache B. Because all the Caches
are filled by zeros, read misses occur. In this situation each cache needs to access the
memory to retrieve data. However, the simultaneous access is not possible. The Memory
Bus Controller gives the control to one of the cache. The data is retrieved from the
memory (grant is given to the other cache) and stored in the first page of the granted
Cache. After this process, another read requests are sent with same addesses. And read hit
must occur. The process:
1) The Cpu A requests the data in the addess 0011101 and Cpu B requests
the data in the addess 1110011.
2) Because both of the Caches are filled with zeros, no tags matched and read
misses occurred.
3) Both of the caches send requests to the Memory Control Unit.
Bus_bus_bus_req_A = 1 and Bus_bus_req_B = 1.
4) Firstly the bus is granted to the Cpu A.
5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the
desired addess. After this, the bus is granted to the Cache B.
6) Cac_memAdd_A set to 11101. Memory has the 00010 in the addess
11101. Thus, mem_cac_data_A is stored with the desired data 00010.
7) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_A is
set to 00010.
8) After the datas are stored to the both caches. The same read request
occurs, and this results read hit in cache A.
15
Read hit occurred after the read misses. The requests are handled by the memory bus
controller. The coordination between the caches, memory bus controller and memory is
working properly.
2) Testing Read processes:
The goal is to see that Two-way associative is working properly.
This test bench is related to the Test bench 1. In Test Bench 1, the Cpus requested data in
the same addess two times. What happens if the Cpus requested data from a different
addess between previously declared addess. In the first requests, the cache misses must
occur. And the memory bus must be granted to one of the Caches. After retrieving the
data from the memory for both caches, the data must be written to the first pages of the
caches. If the caches request another data from a different addess, the new data must be
written to the second pages of the caches. At last, if the Cpu requests the data from the
16
firstly used addess. There must be a hit. This shows that the two-way associativity is
working properly and stores the update bits.
1) The Cpu A requests the data in the addess 0011101 and Cpu B request the
data in the addess 1110011.
2) Because both of the Caches are filled with zeros, no tags matched and read
misses occurred.
3) Both of the caches send requests to the Memory Control Unit.
bus_bus_bus_req_A = 1 and bus_bus_bus_req_B = 1.
4) Firstly the bus is granted to the Cpu A.
5) Cac_mem_rd_A set to 1 which means that Cache will read the data in the
desired addess. After this, the bus is granted to the Cache B.
6) Cac_memAdd_A set to 11101. Memory has the 00010 in the addess
11101. Thus, mem_cac_data_A is stored with the desired data 00010.
7) The data is stored the cache A page 0 and sent to Cpu, cac_cpu_data_A is
set to 00010.
8) The Cpu A requests the data from a new addess 1000101 and Cpu B
request the data from a new addess 0101010.
9) No tags matched and read misses occurred.
10) Both of the caches send requests to the Memory Control Unit. B = 1 and
Bus_bus_req_B = 1.
11) The bus is granted to the Cpu A again.
12) Cac_mem_rd_A set to 1 which means that Cache will read the data in the
desired addess. After this, the bus is granted to the Cache B.
13) Cac_memAdd_A set to 00101. Memory has the 11010 in the addess
00101. Thus, mem_cac_data_A is stored with the desired data 11010.
14) The data is stored the cache A page 1 and sent to Cpu, cac_cpu_data_A is
set to 11010.
15) Again, the Cpu A requests the data from the first used addess 0011101 and
Cpu B request the data from the first used addess 1110011.
16) This results read hit in cache A.
17
Read hit occurred after two read misses. Because of the two-way associative cache
system, first data is stored in the page 1 and the second data is stored in page 2. And the
request of the first data is accomplished successfully.
18
3) Testing Write processes:
In this situation, both Cpus request writes to the each cache. However, the first write
attempt results with a write miss. As a result the data is written to the first pages of
the caches, write back to the memory is performed and the data is invalidated. In the
second write attempt to the same addess results with a write hit.
1) The Cpu A requests a write in the addess 0011101 with the data 11011
and Cpu B request a write in the addess 1110011 with the data 10001.
2) Write misses occurred.
19
3) Both of the caches send requests to the Memory Control Unit.
bus_req_A = 1 and bus_req_B = 1.
4) Firstly the bus is granted to the Cpu A.
5) Cac_mem_wrt_A set to 1 and cac_mem_data_A is set to 11011.
6) After the data is written the cache and the memory with the write back,
Invalidation is performed.
7) And again, The Cpu A requests a write in the addess 0011101 with
the data 00100 and Cpu B request a write in the addess 1110011 with
the data 01110.
8) Write hit occurs and no write back is performed.
20
4) Testing Snooping and Write-back:
Any transition to the exclusive state which is required for a processor to write the
block requires a write miss to be placed on the bus, causing all caches to make the
block invalid. In addition, if some other cache had the block in exclusive state, that
cache generates a write back, which supplies the block containing the desired address.
1) The Cpu A requests a write in the addess 0011101 with the data
11011.
21
2) Write miss occurred.
3) Cache A sends a request to the memory. bus_req_A = 1.
4) The bus is granted to the Cpu A.
5) Cac_mem_wrt_A set to 1 and cac_mem_data_A is set to 11011.
6) After the data is written the cache and the memory with the write back,
Invalidation is performed.
7) And again, The Cpu A requests a write in the addess 0011101 with
the data 00100 which is a different data.
8) Write hit occurs and no write back is performed. However this makes
the data dirty.
9) Then the Cache B reads the same address, snooping.
10) Cache A is required to priority write back the dirty data in memory.
22
23
24
5) Figures
Figure 1: The Resulting CPU Module _____________________________________________________ 25 Figure 2: Resulting Memory Mapping Module ______________________________________________ 26 Figure 3 : The Resulting Memory Module__________________________________________________ 27 Figure 4 : The Resulting Memory Bus Controller Module _____________________________________ 28 Figure 5 : The Resulting Cache Module ___________________________________________________ 29 Figure 6 : The relationship between Cache A and Cache B ____________________________________ 30 Figure 7 : General Design of the Project __________________________________________________ 31 Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs
the addess which is sent by Cpu. _________________________________________________________ 32 Figure 9 : Cache State Transitions _______________________________________________________ 33
25
Figure 1: The Resulting CPU Module
26
Figure 2: Resulting Memory Mapping Module
27
Figure 3 : The Resulting Memory Module
28
Figure 4 : The Resulting Memory Bus Controller Module
29
Figure 5 : The Resulting Cache Module
30
Figure 6 : The relationship between Cache A and Cache B
31
Figure 7 : General Design of the Project
32
Figure 8 : The addess which is sent by Cpu matches to the both cache. Cpu Tag and Cpu Index constructs the addess which is sent by Cpu.
33
Figure 9 : Cache State Transitions
34
6) Codes:
Cache_A
`timescale 1ns/100ps // Module module Cache_A( //Global inputs and outputs clock,
//Asynchronous active low reset rst_l,
//Inputs from CPU cpu_cac_add_A, cpu_cac_rd_A, cpu_cac_wrt_A, cpu_cac_data_A,
//Inputs from Main Memory mem_cac_data_A, data_avail_memA, //Inputs from mmu mmu_cac_add_A;
//Inputs to the Memory Bus Controller bus_bus_bus_req_A, priority_bus_bus_req_A,
//Snooping input ports snoop_B, snoop_add_B,
invalidate_B
//Outputs to CPU
cac_cpu_hit_A, cac_cpu_miss_A, cac_cpu_data_A,
//Outputs to Main Memory cac_data_avail_mem_Add_A, cac_mem_data_A, cac_mem_rd_A, cac_mem_wrt_A, priority_wrt_A,
//Outputs from Memory Bus Controller bus_A,
//Snooping output ports snoop_A, snoop_add_A, invalidate_A, ); // Input Ports
35
//Global input clock; input rst_l; //CPU input [4:0]cpu_cac_data_A; input [4:0]cpu_cac_add_A; input cpu_cac_rd_A; input cpu_cac_wrt_A; // Memory input [4:0]mem_cac_data_A; input data_avail_memA; //Memory Bus Controller input bus_A; //Snooping input snoop_B; input [4:0] snoop_add_B; input invalidate_B; // Output Ports //CPU output [4:0] cac_cpu_data_A; output cac_cpu_hit_A; output cac_cpu_miss_A; // Memory output [4:0] cac_mem_data_A; output [4:0] cac_data_avail_mem_Add_A; output cac_mem_rd_A; output cac_mem_wrt_A; output priority_wrt_A; //Memory Bus Controller output bus_bus_bus_req_A; output priority_bus_bus_req_A; //Snooping output snoop_A; output [4:0] snoop_add_A; output invalidation_A; // Registers //CPU reg [4:0] cac_cpu_data_A; reg cac_cpu_hit_A; reg cac_cpu_miss_A; // Memory reg [4:0] cac_mem_data_A; reg [4:0] cac_memAdd_A; reg cac_mem_rd_A; reg cac_mem_wrt_A; reg priority_wrt_A; //Memory Bus Controller reg bus_bus_bus_req_A; reg priority_bus_bus_req_A;
36
//Snooping reg snoop_A; reg [4:0] snoop_add_A; reg snoop_page_1; reg snoop_page_2; reg invalidate_A; //Cache Buffer reg [10:0] buffer_1 [0:7]; reg [10:0] buffer_2 [0:7]; //Read/Write regs reg read_A; reg write_A; reg dirty; //Cache FSM reg [2:0] state; reg [2:0] next_s; reg [2:0] back_t_s; // Net //Cache-CPU Interface wire [10:0] cpu_buffer_1; //Buffer value at current index in page 0 wire [10:0] cpu_buffer_2; //Buffer value at current index in page 1 wire [2:0] cpu_index; //Index value from CPU addess wire [1:0] cpu_tag; //Tag value from CPU addess wire [1:0] cur_tag_1; wire [1:0] cur_tag_2; wire [4:0] add_mem; wire [4:0] cac_data; //Current Cache Data wire [4:0] mem_data; wire [1:0] snoop_tag; wire [2:0] snoop_index; wire [10:0] snoop_buffer_1; wire [10:0] snoop_buffer_2; wire [10:0] cac_data_1; wire [10:0] cac_data_2; wire valid_1; wire valid_2; wire dirty_1; wire dirty_2; wire update_1; wire update_2; //Integer integer i; //Parameters parameter S0 = 0; // Initial parameter S0 = 1; // Wait State 1 parameter S2 = 2; // Store State parameter S3 = 3; // Wait State 2 parameter S4 = 4; parameter S5 = 5; //priority_write state
37
//Assign //Cache-CPU assign cpu_tag = cpu_cac_add_A[1:0]; assign cur_tag_1 = cpu_buffer_1[6:5]; assign cur_tag_2 = cpu_buffer_2[6:5]; assign cpu_index = cpu_cac_add_A[4:2]; assign add_mem = cpu_cac_add_A[4:0]; assign valid_1 = cpu_buffer_1[10]; assign valid_2 = cpu_buffer_2[10]; assign dirty_1 = cpu_buffer_1[9]; assign dirty_2 = cpu_buffer_2[9]; assign update_1 = cpu_buffer_1[7]; assign update_2 = cpu_buffer_2[7]; assign cpu_buffer_1 = buffer_1[cpu_index]; assign cpu_buffer_2 = buffer_2[cpu_index]; //Cache- Memory assign cac_data_1 = buffer_1[cpu_index]; assign cac_data_2 = buffer_2[cpu_index]; assign mem_data = mem_cac_data_A[4:0]; assign snoop_tag = snoop_add_B[1:0]; assign snoop_index = snoop_add_B[4:2]; assign snoop_buffer_1 = buffer_1[snoop_index]; assign snoop_buffer_2 = buffer_2[snoop_index]; //Begin // Cache_A // // 2 phase clock always @(negedge clock) next_s <= state; always @(posedge clock or negedge rst_l) begin
if(rst_l == 0) //Store the initials
begin cac_mem_rd_A <= 1'b0;
cac_mem_wrt_A <= 1'b0; cac_cpu_data_A <= 5'b0;
cac_cpu_miss_A <= 1'b0; cac_cpu_hit_A <= 1'b0; cac_mem_data_A <= 5'b0; cac_memAdd_A <= 5'b0; snoop_A <= 1'b0; invalidate_A <= 1'b0;
bus_bus_bus_req_A <= 1'b0; state <= S0; for(i=0; i <= 7; i = i+1) //Store 0’s into both pages begin buffer_1[i] <= 11'b0;
38
buffer_2[i] <= 11'b0; end end else // If its not a hard reset begin case(next_s) // State S0 - Initial S0: Begin //store initials again cac_cpu_data_A <= 5'b0; cac_cpu_miss_A <= 1'b0; cac_cpu_hit_A <= 1'b0; cac_mem_data_A <= 5'b0; cac_memAdd_A <= 5'b0; cac_mem_rd_A <= 1'b0; cac_mem_wrt_A <= 1'b0; priority_bus_bus_req_A <= 1'b0; snoop_A <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; invalidate_A <= 1'b0; read_A <= 1'b0; write_A <= 1'b0; dirty <= 1'b0;
priority_wrt_A <= 1'b0; bus_bus_bus_req_A <= 1'b0; / //If it is a read if(cpu_cac_rd_A == 1'b1) begin read_A <= 1'b1; if(cpu_tag == cur_tag_1) // For Page 1 begin //read miss in INVALID state if(valid_1 == 1'b0) begin $display ("Read Miss In Invalid State PAGE 1"); cac_cpu_miss_A <= 1'b1; bus_bus_bus_req_A <= 1'b1; state <= S0; end //read hit in EXCLUSIVE state
if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) begin $display ("Read Hit In Exculsive State PAGE 1"); cac_cpu_hit_A <= 1'b1; cac_cpu_data_A <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] | 11'b00010000000; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b11101111111; state <= S0; end //read hit in SHARED state
39
if((dirty_1 == 1'b0) & (valid_1 == 1'b1))
begin $display ("Read Hit In Shared State PAGE 1"); cac_cpu_hit_A <= 1'b1; cac_cpu_data_A <= cpu_buffer_1[4:0]; buffer_1[cpu_index] <= buffer_1[cpu_index] | 11'b00010000000; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b11101111111; state <= S0; end end if(cpu_tag == cur_tag_2) //Check Page 2 of the Cache begin if(valid_2 == 1'b0) begin $display ("Read Miss In Invalid State PAGE 2"); cac_cpu_miss_A <= 1'b1; bus_bus_bus_req_A <= 1'b1; state <= S0; end //read hit in EXCLUSIVE state if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) begin $display ("Read Hit In Exculsive State PAGE 2"); cac_cpu_hit_A <= 1'b1; cac_cpu_data_A <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] | 11'b00010000000; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b11101111111; state <= S0; end //read hit in SHARED state if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) begin $display ("Read Hit In Shared State PAGE 2"); cac_cpu_hit_A <= 1'b1; cac_cpu_data_A <= cpu_buffer_2[4:0]; buffer_2[cpu_index] <= buffer_2[cpu_index] | 11'b00010000000; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b11101111111; state <= S0; end end if((cpu_tag != cur_tag_1) & (cpu_tag != cur_tag_2)) begin $display ("Read Miss In Invalid State"); cac_cpu_miss_A <= 1'b1; bus_bus_bus_req_A <= 1'b1; state <= S0; end end //if it is a WRITE if(cpu_cac_wrt_A == 1'b1) begin write_A <= 1'b1; if(cpu_tag == cur_tag_1) begin //write miss in INVALID state if(valid_1 == 1'b0) begin $display ("Write Miss In Invalid State PAGE 1"); cac_cpu_miss_A <= 1'b1;
40
bus_bus_bus_req_A <= 1'b1; state <= S0; end //write miss in EXCLUSIVE state else if((dirty_1 == 1'b1) & (valid_1 == 1'b1)) begin $display ("Write Hit In Exculsive State PAGE 1"); cac_cpu_hit_A <= 1'b1;
buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b11101111111;
state <= S0; end //write hit in SHARED state else if((dirty_1 == 1'b0) & (valid_1 == 1'b1)) begin $display ("Write Hit In Shared State PAGE 1");
buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b11101111111;
bus_bus_bus_req_A <= 1'b1; back_t_s <= S0; state <= S4; end end else if(cpu_tag == cur_tag_2) begin if(valid_2 == 1'b0) begin $display ("Write Miss In Invalid State PAGE 2"); cac_cpu_miss_A <= 1'b1; bus_bus_bus_req_A <= 1'b1; state <= S0; end //write hit in EXCLUSIVE state else if((dirty_2 == 1'b1) & (valid_2 == 1'b1)) begin $display ("Write Hit In Exculsive State PAGE 2"); cac_cpu_hit_A <= 1'b1;
buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b11101111111;
state <= S0; end //write hit in SHARED state else if((dirty_2 == 1'b0) & (valid_2 == 1'b1)) begin $display ("Write Hit In Shared State PAGE 2");
buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b11101111111;
bus_bus_bus_req_A <= 1'b1; back_t_s <= S0; state <= S4; end end else if((cpu_tag != cur_tag_1) & (cpu_tag != cur_tag_2)) begin $display ("Write Miss In Invalid state"); cac_cpu_miss_A <= 1'b1; bus_bus_bus_req_A <= 1'b1; state <= S0; end end //if snooping
41
if(snoop_B == 1'b1) begin if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) begin if(snoop_buffer_1[9] == 1'b1) begin priority_bus_bus_req_A <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S0; state <= S3; end else begin state <= S0; end end else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) begin if(snoop_buffer_2[9] == 1'b1) begin priority_bus_bus_req_A <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S0; state <= S3; end else begin state <= S0; end end else begin state <= S0; end end //if Invalidation if(invalidate_B == 1'b1) begin if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) begin buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b00101111111); buffer_2[snoop_index] <= (buffer_2[snoop_index] | 11'b00010000000); state <= S0; end else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_1[10] == 1'b1)) begin buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b00101111111); buffer_1[snoop_index] <= (buffer_1[snoop_index] | 11'b00010000000); state <= S0; end else begin state <= S0; end end end //State 0 //State S0 - Wait S0:
42
begin cac_cpu_miss_A <= 1'b0; priority_wrt_A <= 1'b0; $display ("Waiting in S0"); if(bus_A == 1'b1) begin cac_mem_rd_A <= 1'b1; cac_memAdd_A <= add_mem; snoop_A <= 1'b1; snoop_add_A <= add_mem; bus_bus_bus_req_A <= 1'b0; state <= S2; end //if invalidation else if(invalidate_B == 1'b1) begin if((snoop_tag == snoop_buffer_1[6:5])&(snoop_buffer_1[10] == 1'b1)) begin buffer_1[snoop_index] <= (buffer_1[snoop_index] & 11'b00101111111); buffer_2[snoop_index] <= (buffer_2[snoop_index] | 11'b00010000000); if(snoop_add_B == cpu_cac_add_A) begin bus_bus_bus_req_A <= 1'b0; cac_cpu_miss_A <= 1'b1; state <= S0; end else begin state <= S0; end end else if((snoop_tag == snoop_buffer_2[6:5])&(snoop_buffer_2[10] == 1'b1)) begin
buffer_2[snoop_index] <= (buffer_2[snoop_index] & 11'b00101111111); buffer_1[snoop_index] <= (buffer_1[snoop_index] | 11'b00010000000);
if(snoop_add_B == cpu_cac_add_A) begin bus_bus_bus_req_A <= 1'b0; cac_cpu_miss_A <= 1'b1; state <= S0; end else begin state <= S0; end end else begin state <= S0; end end else begin state <= S0; end end //S0 //State S2
43
S2: begin dirty <= 1'b0; snoop_A <= 1'b0; cac_mem_wrt_A <= 1'b0; cac_mem_rd_A <= 1'b0; snoop_page_1 <= 1'b0; snoop_page_2 <= 1'b0; priority_wrt_A <= 1'b0; if((data_avail_memA == 1'b1)|(dirty == 1'b1)) begin if(update_1 == 1'b0) //Data needs to be written in page 1 begin if(dirty_1 == 1'b1) begin cac_mem_wrt_A <= 1'b1; cac_memAdd_A <= add_mem; cac_mem_data_A <= cpu_cac_data_A; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b10111111111; dirty <= 1'b1; state <= S2; end
else if(dirty_1 == 1'b0) begin buffer_1[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b11101111111; if((read_A == 1'b1) & (write_A == 1'b0)) begin state <= S0; end else if((read_A == 1'b0) & (write_A == 1'b1)) begin buffer_1[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_A <= 1'b1; cac_memAdd_A <= add_mem; cac_mem_data_A <= cpu_cac_data_A; invalidate_A <= 1'b1; snoop_add_A <= add_mem; state <= S0; end end end else if(update_2 == 1'b0) begin if(dirty_2 == 1'b1) begin cac_mem_wrt_A <= 1'b1; cac_memAdd_A <= add_mem; cac_mem_data_A <= cpu_cac_data_A; buffer_2[cpu_index] <= buffer_2[cpu_index] & 11'b10111111111; dirty <= 1'b1; state <= S2; end else if(dirty_2 == 1'b0) begin buffer_2[cpu_index] <= {1'b1,1'b0,1'b0,1'b1,cpu_tag,mem_cac_data_A}; buffer_1[cpu_index] <= buffer_1[cpu_index] & 11'b11101111111; if((read_A == 1'b1) & (write_A == 1'b0)) begin state <= S0; end
44
if((read_A == 1'b0) & (write_A == 1'b1))
begin buffer_2[cpu_index] <= {1'b1,1'b1,1'b0,1'b1,cpu_tag,cpu_cac_data_A}; cac_mem_wrt_A <= 1'b1; cac_memAdd_A <= add_mem; cac_mem_data_A <= cpu_cac_data_A; invalidate_A <= 1'b1; snoop_add_A <= add_mem; state <= S0; end end end end //if snooping else if(snoop_B == 1'b1) begin if((snoop_tag == snoop_buffer_1[6:5]) & (snoop_buffer_1[10] == 1'b1)) begin if(snoop_buffer_1[9] == 1'b1) begin priority_bus_bus_req_A <= 1'b1; snoop_page_1 <= 1'b1; back_t_s <= S2; state <= S3; end
else begin state <= S2; end end else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) begin if(snoop_buffer_2[9] == 1'b1) begin priority_bus_bus_req_A <= 1'b1; snoop_page_2 <= 1'b1; back_t_s <= S2; state <= S3; end
else begin state <= S2; end end else begin state <= S2; end end //if invalidation else if(invalidate_B == 1'b1) begin if((snoop_tag == snoop_buffer_1[6:5])& (snoop_buffer_1[10] == 1'b1)) begin
45
buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b00110111111; buffer_2[snoop_index] <= buffer_2[snoop_index] | 11'b00001000000; state <= S2; end else if((snoop_tag == snoop_buffer_2[6:5]) & (snoop_buffer_2[10] == 1'b1)) begin buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b00110111111; buffer_1[snoop_index] <= buffer_1[snoop_index] | 11'b00001000000; state <= S2; end else begin state <= S2; end end end // S2 // State S3 S3: begin if(bus_A == 1'b1) begin priority_bus_bus_req_A <= 1'b0; if((snoop_page_1 == 1'b1)&(snoop_page_2 == 1'b0)) begin priority_wrt_A <= 1'b1; cac_memAdd_A <= snoop_add_B; cac_mem_data_A <= snoop_buffer_1[4:0];
buffer_1[snoop_index] <= buffer_1[snoop_index] & 11'b10111111111; state <= S5;
end
else if((snoop_page_1 == 1'b0)&(snoop_page_2 == 1'b1)) begin priority_wrt_A <= 1'b1; cac_memAdd_A <= snoop_add_B; cac_mem_data_A <= snoop_buffer_2[4:0];
buffer_2[snoop_index] <= buffer_2[snoop_index] & 11'b10111111111; state <= S5;
end end else begin state <= S3; end end//S3 //State S4 S4: begin if(bus_A == 1'b1) begin bus_bus_bus_req_A <= 1'b0; cac_cpu_hit_A <= 1'b1; invalidate_A <= 1'b1; snoop_add_A <= add_mem; cac_mem_wrt_A <= 1'b1;
46
cac_memAdd_A <= add_mem; cac_mem_data_A <= cpu_cac_data_A; state <= back_t_s; end end // S4 //State S5 S5: begin state <= back_t_s; end endcase end
end endmodule //Cache_A
47
Memory Module `timescale 1ns/100ps module Memory (clock, rst_1, cac_data_avail_mem_Add_A, cac_mem_data_A, cac_mem_rd_A, cac_mem_wrt_A, cac_data_avail_mem_Add_B, cac_mem_data_B, cac_mem_rd_B, cac_mem_wrt_B, mem_cac_data_B, mem_cac_data_A, bus_bus_mem, priority_wrt_A, priority_wrt_B, memA, memB, bus_mem ); //Inputs input clock, rst_1; input cac_mem_rd_A,cac_mem_wrt_A,cac_mem_rd_B,cac_mem_wrt_B; input [4:0] cac_data_avail_mem_Add_A; input [4:0] cac_data_avail_mem_Add_B; input [4:0] cac_mem_data_A; input [4:0] cac_mem_data_B; input priority_wrt_A; input priority_wrt_B; input bus_mem; //Outputs output [4:0] mem_cac_data_A; output [4:0] mem_cac_data_B; output bus_bus_mem; output memA; output memB; //Registers reg [0:9] memArray [31:0]; reg [4:0] mem_cac_data_A; reg [4:0] mem_cac_data_B; reg [4:0] mem_cache_data; reg memA; reg memB; reg bus_bus_mem; reg ready_bit_A; reg ready_bit_B; reg nextA; reg nextB reg [2:0] state; reg [4:0] add;
48
reg [4:0] next_add; //Internals parameter S0 = 0; parameter S0 = 1; parameter S2 = 2; parameter S3 = 3; parameter S4 = 4; //Memory always @(posedge clock or negedge rst_1) begin
if (~rst_1) begin memArray[31] = 10'b1111100000; memArray[30] = 10'b1111000001; memArray[29] = 10'b1110100010; memArray[28] = 10'b1110000011; memArray[27] = 10'b1101100100; memArray[26] = 10'b1101000101; memArray[25] = 10'b1100100110; memArray[24] = 10'b1100000111; memArray[23] = 10'b1011101000; memArray[22] = 10'b1011001001; memArray[21] = 10'b1010101010; memArray[20] = 10'b1010001011; memArray[19] = 10'b1001101100; memArray[18] = 10'b1001001101; memArray[17] = 10'b1000101110; memArray[16] = 10'b1000001111; memArray[15] = 10'b0111110000; memArray[14] = 10'b0111010001; memArray[13] = 10'b0110110010; memArray[12] = 10'b0110010011; memArray[11] = 10'b0101110100; memArray[10] = 10'b0101010101; memArray[9] = 10'b0100110110; memArray[8] = 10'b0100010111; memArray[7] = 10'b0011111000; memArray[6] = 10'b0011011001; memArray[5] = 10'b0010111010; memArray[4] = 10'b0010011011; memArray[3] = 10'b0001111100; memArray[2] = 10'b0001011101; memArray[1] = 10'b0000111110; memArray[0] = 10'b0000011111; mem_cac_data_A <= 5'b0; mem_cac_data_B <= 5'b0; memA <= 1'b0; memB <= 1'b0; nextA <= 1'b0; nextB <= 1'b0;
add <= 5'b0; state <= S0; bus_bus_mem <= 1'b0; end else
49
begin state <= S0; end case(state) S0: begin memA <= 1'b0; memB <= 1'b0;
ready_bit_A <= 1'b0; ready_bit_B <= 1'b0; bus_bus_mem <= 1'b0
add <= 5'b0; ; // IF WRITE if (priority_wrt_A == 1'b1) begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S0; end else if (priority_wrt_B == 1'b1) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_B}; state <= S0; end else if ((cac_mem_wrt_A == 1'b1) & (cac_mem_rd_A == 1'b0)) begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S0; end else if ((cac_mem_wrt_B == 1'b1) & (cac_mem_rd_B == 1'b0)) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_B}; state <= S0; end // FINISH WRITE else if ((cac_mem_wrt_A == 1'b0) & ((cac_mem_rd_A == 1'b1)|(nextA == 1'b1))) begin if ((cac_mem_rd_A == 1'b1)&(nextA == 1'b0)) begin add <= cac_data_avail_mem_Add_A; ready_bit_A <= cac_mem_rd_A; state <= S0; end else if ((cac_mem_rd_A == 1'b0)&(nextA == 1'b1)) begin add <= next_add; ready_bit_A <= 1'b1; nextA <= 1'b0; state <= S0; end end else if ((cac_mem_wrt_B == 1'b0) & ((cac_mem_rd_B == 1'b1)|(nextB == 1'b1)))
50
begin if ((cac_mem_rd_B == 1'b1)&(nextB == 1'b0)) begin add <= cac_data_avail_mem_Add_B; ready_bit_B <= cac_mem_rd_B; state <= S0; end else if ((cac_mem_rd_B == 1'b0)&(nextB == 1'b1)) begin add <= next_add; ready_bit_B <= 1'b1; nextB <= 1'b0; state <= S0; end end else begin state <= S0; end end//S0 S0: begin bus_bus_mem <= 1'b1; if(priority_wrt_A == 1'b1) begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S2; end else if (priority_wrt_B == 1'b1) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_B}; state <= S2; end else if(cac_mem_wrt_A == 1'b1) begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S2; end else if (cac_mem_wrt_B == 1'b1) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_B}; state <= S2; end else begin state <= S2; end end//S0 S2: begin if(priority_wrt_A == 1'b1)
51
begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S2; end else if (priority_wrt_B == 1'b1) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_A}; state <= S2; end else if (cac_mem_rd_A == 1'b1) begin nextA <= 1'b1; next_add <= cac_data_avail_mem_Add_A; state <= S2; end else if (cac_mem_rd_B == 1'b1) begin nextB <= 1'b1; next_add <= cac_data_avail_mem_Add_B; state <= S2; end else if (cac_mem_wrt_A == 1'b1) begin memArray[cac_data_avail_mem_Add_A] <= {cac_data_avail_mem_Add_A, cac_mem_data_A}; state <= S2; end else if (cac_mem_wrt_B == 1'b1) begin memArray[cac_data_avail_mem_Add_B] <= {cac_data_avail_mem_Add_B, cac_mem_data_B}; state <= S2; end else if (bus_mem == 1'b1) begin bus_bus_mem <= 1'b0; if ((ready_bit_A == 1'b1)&(ready_bit_B == 1'b0)) begin mem_cac_data_A <= memArray[add]; memA <= 1'b1; state <= S0; end if ((ready_bit_A == 1'b0)&(ready_bit_B == 1'b1)) begin mem_cac_data_B <= memArray[add]; memB <= 1'b1; state <= S0; end end else begin state <= S2; end end//S2 endcase
52
end endmodule
Memory Bus Controller `timescale 1ns/100ps module MemoryBusController( clock, rst_1, bus_bus_req_A, bus_bus_req_B, bus_mem, bus_mem, bus_A, bus_B, priority_bus_req_A, priority_bus_req_B ); // Inputs input clock; input rst_1; input bus_bus_req_A; input bus_bus_req_B; input priority_bus_req_A; input priority_bus_req_B; input bus_mem; //Outputs output bus_A; output bus_B; output bus_mem; //Registers reg bus_A; reg bus_B; reg bus_mem; reg[2:0] bus_state; //Internals wire clock; wire rst_1; wire bus_bus_req_A; wire bus_bus_req_B; wire priority_bus_req_A; wire priority_bus_req_B; wire bus_mem; //Parameters parameter S0 = 0; //Initial parameter S1 = 1; //Granting the bus to the Cache A parameter S2 = 2; //Granting the bus to the Cache B parameter S3 = 3; //Granting the bus to the memory parameter S4 = 4; //Wait
53
// Module always @(negedge clock or negedge rst_1) begin if(rst_1 == 0) begin //Initials Bus_state <= S0;
bus_A <= 1'b0; bus_B <= 1'b0; bus_mem <= 1'b0; end else case(Bus_state) S0: //Initial begin bus_A <= 1'b0; bus_B <= 1'b0; bus_mem <= 1'b0; if (priority_bus_req_A == 1'b1) //Priority Cache A begin Bus_state <= S1; end else if (priority_bus_req_B == 1'b1) //Priority Cache B begin Bus_state <= S2; end else if (bus_bus_req_A == 1'b0 & bus_bus_req_B ==1'b0 & bus_mem ==1'b1) //Memory begin Bus_state <= S3; end else if (bus_bus_req_A ==1'b1 & bus_bus_req_B == 1'b0 & bus_mem == 1'b1) begin Bus_state <= S3; end else if (bus_bus_req_A == 1'b0 & bus_bus_req_B == 1'b1 & bus_mem == 1'b1) begin Bus_state <= S3; end else if (bus_bus_req_A == 1'b1 & bus_bus_req_B == 1'b1 & bus_mem == 1'b1) begin Bus_state <= S3; end else if (bus_bus_req_A == 1'b1 & bus_bus_req_B == 1'b0 & bus_mem == 1'b0) begin Bus_state <= S1; end else if (bus_bus_req_A == 1'b0 & bus_bus_req_B == 1'b1 & bus_mem == 1'b0) begin Bus_state <= S2; end
54
else if (bus_bus_req_A == 1'b1 & bus_bus_req_B == 1'b1 & bus_mem == 1'b0) begin Bus_state <= S1; end else begin Bus_state <= S0; end end S1: //For Cache A begin bus_A <= 1'b1; Bus_state <= S0; end S2: //For Cache B begin bus_B <= 1'b1; Bus_state <= S0; end S3: //For memory begin bus_mem <= 1'b1; Bus_state <= S0; end S4: begin bus_A <= 1'b0; bus_B <= 1'b0; bus_mem <= 1'b0;
Bus_state <= S0; end endcase end endmodule
Memory Mapping Unit
`timescale 1ns / 100ps module MMU ( clock, rst_1, cpu_cac_add_A, cpu_cac_add_B, cpu_cac_rd_A, cpu_cac_rd_B, cpu_cac_wrt_A, cpu_cac_wrt_B, cpu_cac_data_A, cpu_cac_data_B, cpu_mmu_add_A, cpu_mmu_add_B, cpu_mmu_rd_A, cpu_mmu_rd_B, cpu_mmu_wrt_A, cpu_mmu_wrt_B,
55
cpu_mmu_data_A, cpu_mmu_data_B ); //Inputs input clock, rst_1; input cpu_mmu_rd_A, cpu_mmu_rd_B; input cpu_mmu_wrt_A, cpu_mmu_wrt_B; input [4:0] cpu_mmu_data_A; input [4:0] cpu_mmu_data_B; input [6:0] cpu_mmu_add_A; input [6:0] cpu_mmu_add_B; //Outputs output cpu_cac_rd_A, cpu_cac_rd_B; input cpu_cac_wrt_A, cpu_cac_wrt_B; output [4:0] cpu_cac_data_A; output [4:0] cpu_cac_data_B; output [4:0] cpu_cac_add_A; output [4:0] cpu_cac_add_B; //Registers reg cpu_cac_rd_A, cpu_cac_rd_B; reg cpu_cac_wrt_A, cpu_cac_wrt_B; reg [4:0] cpu_cac_data_A; reg [4:0] cpu_cac_data_B; reg [4:0] cpu_cac_add_A; reg [4:0] cpu_cac_add_B; //Internals wire clock, rst_1, cpu_mmu_rd_A, cpu_mmu_rd_B; wire cpu_mmu_wrt_A, cpu_mmu_wrt_B; //Begin// always @(posedge clock or negedge rst_1) begin if (~rst_1) begin //initials cpu_cac_add_A <= 5'b0; cpu_cac_add_B <= 5'b0; cpu_cac_data_A <= 5'b0; cpu_cac_data_B <= 5'b0; cpu_cac_rd_A <= 1'b0; cpu_cac_rd_B <= 1'b0; cpu_cac_wrt_A <= 1'b0; cpu_cac_wrt_B <= 1'b0; end else begin cpu_cac_wrt_A <= cpu_mmu_wrt_A; cpu_cac_wrt_B <= cpu_mmu_wrt_B; cpu_cac_rd_A <= cpu_mmu_rd_A; cpu_cac_rd_B <= cpu_mmu_rd_B; cpu_cac_data_A <= cpu_mmu_data_A;
56
cpu_cac_data_B <= cpu_mmu_data_B; cpu_cac_add_A <= cpu_mmu_add_A[4:0]; cpu_cac_add_B <= cpu_mmu_add_B[4:0]; end end endmodule
Test Beches:
1/2
`timescale 1ns / 100ps module Test_Bench_1/2 (); //Inputs reg clock, rst_l, cpu_mmu_rd_A, cpu_mmu_wrt_A; reg cpu_mmu_rd_B, cpu_mmu_wrt_B; reg [4:0] cpu_mmu_data_A; reg [4:0] cpu_mmu_data_B; reg [6:0] cpu_mmu_add_A; reg [6:0] cpu_mmu_add_B; // Cache - Cpu wire cache_cpu_hit_A,; wire cac_cpu_miss_A; wire [4:0] cac_cpu_data_A; wire cac_cpu_hit_B; wire cac_cpu_miss_B; wire [4:0] cac_cpu_data_B; // Cache – Memory Bus Controller wire bus_req_A, priority_req_A; wire bus_req_B, priority_req_B, req; wire mem, enable_mem, bus_A; // Cache - Memory wire cac_mem_rd_A, cac_mem_wrt_A, data_avail_mem_A, priority_wrt_A; wire cac_mem_rd_B, cac_mem_wrt_B, priority_wrt_B, data_avail_mem_B; wire [4:0] cac_mem_add_A, cac_mem_data_A, mem_cac_data_A; wire [4:0] cac_mem_add_B, cac_mem_data_B, mem_cac_data_B; // MMU - Cache wire cpu_cac_rd_A, cpu_cac_wrt_A; wire cpu_cac_rd_B, cpu_cac_wrt_B; wire [4:0] cpu_cac_add_A, cpu_cac_data_A; wire [4:0] cpu_cac_add_B, cpu_cac_data_B; //Cache A - B wire snoop_A, snoop_B; wire invalidate_A, invalidate_B; wire [4:0] snoop_add_A; wire [4:0] snoop_add_B; // Instantiate Memory Memory (clock, rst_l, cac_mem_add_A, cac_mem_data_A, cac_mem_rd_A, cac_mem_wrt_A, mem_cac_data_A, cac_mem_add_B, cac_mem_data_B, cac_mem_rd_B, cac_mem_wrt_B, mem_cac_data_B, bus_req_mem, data_avail_mem_A, data_avail_mem_B, priority_wrt_A, priority_wrt_B, bus_mem);
57
//Instantiate MMU MMU (clock, rst_l, cpu_mmu_add_A, cpu_mmu_add_B, cpu_mmu_rd_A, cpu_mmu_rd_B, cpu_mmu_wrt_A, cpu_mmu_wrt_B, cpu_mmu_data_A, cpu_mmu_data_B, cpu_cac_add_A, cpu_cac_add_B, cpu_cac_rd_A, cpu_cac_rd_B, cpu_cac_wrt_A, cpu_cac_wrt_B, cpu_cac_data_A, cpu_cac_data_B); //Instantiate CacheA CacheA (clock, rst_l, cpu_cac_add_A, cpu_cac_rd_A, cpu_cac_wrt_A, cpu_cac_data_A, mem_cac_data_A, data_avail_mem_A, cac_cpu_hit_A, cac_cpu_miss_A, cac_cpu_data_A, cac_mem_add_A, cac_mem_data_A, cac_mem_rd_A, cac_mem_wrt_A, priority_wrt_A, bus_req_A, bus_A, priority_req_A, snoop_B, snoop_add_B, snoop_A, snoop_add_A, invalidate_A, invalidate_B); //Instantiate CacheB CacheB (clock, rst_l, cpu_cac_add_B, cpu_cac_rd_B, cpu_cac_wrt_B, cpu_cac_data_B, mem_cac_data_B, data_avail_mem_B, cac_cpu_hit_B, cac_cpu_miss_B, cac_cpu_data_B, cac_mem_add_B, cac_mem_data_B, cac_mem_rd_B, cac_mem_wrt_B, priority_wrt_B, bus_req_B, bus_B, priority_req_B, snoop_A, snoop_add_A, snoop_B, snoop_add_B, invalidate_A, invalidate_B); //Instantiate MemoryBusController MemoryBusController (clock, rst_l, bus_req_A, bus_req_B, req_mem, priority_req_A, priority_req_B, enable_mem, bus_A, bus_B);
always begin #5 clock <= ~clock; end // Start //initials begin clock <= 1'b0; rst_l <= 1'b1; cpu_mmu_rd_A <= 1'b0; cpu_mmu_rd_B <= 1'b0; cpu_mmu_wrt_A <= 1'b0; cpu_mmu_wrt_B <= 1'b0; cpu_mmu_add_A <= 7'b0; cpu_mmu_add_B <= 7'b0; cpu_mmu_data_A <= 5'b0; cpu_mmu_data_B <= 5'b0; # 10 rst_l <= 1'b0; # 10 rst_l <= 1'b1; # 10 cpu_mmu_rd_A <= 1'b1; cpu_mmu_add_A <= 7'b0011101; cpu_mmu_rd_B <= 1'b1; cpu_mmu_add_B <= 7'b1110011; # 10 cpu_mmu_rd_A <= 1'b0; cpu_mmu_rd_B <= 1'b0; # 100 # 10 cpu_mmu_rd_A <= 1'b1; cpu_mmu_add_A <= 7'b1000101;
58
cpu_mmu_rd_B <= 1'b1; cpu_mmu_add_B <= 7'b0101010; # 10 cpu_mmu_rd_A <= 1'b0; cpu_mmu_rd_B <= 1'b0; # 100 #10 cpu_mmu_rd_A <= 1'b1; cpu_mmu_add_A <= 7'b0011101; cpu_mmu_rd_B <= 1'b1; cpu_mmu_add_B <= 7'b1110011; # 10 cpu_mmu_rd_A <= 1'b0; cpu_mmu_rd_B <= 1'b0; end endmodule
3
# 10 cpu_mmu_wrt_A <= 1'b1; cpu_mmu_add_A <= 7'b0011101; cpu_mmu_data_A <= 5'b11011; cpu_mmu_wrt_B <= 1'b1; cpu_mmu_add_B <= 7'b1110011; cpu_mmu_data_B <= 5'b10001; # 10 cpu_mmu_wrt_A <= 1'b0; cpu_mmu_wrt_B <= 1'b0; # 150 #10 cpu_mmu_wrt_A <= 1'b1; cpu_mmu_add_A <= 7'b0011101; cpu_mmu_data_A <= 5'b00100; cpu_mmu_wrt_B <= 1'b1; cpu_mmu_add_B <= 7'b1110011; cpu_mmu_data_B <= 5'b01110; # 10 cpu_mmu_wrt_A <= 1'b0; cpu_mmu_wrt_B <= 1'b0; end endmodule
4
59
# 10 cpu_mmu_wr_A <= 1'b1; cpu_mmu_addr_A <= 7'b0011101; cpu_mmu_data_A <= 5'b11011; # 10 cpu_mmu_wr_A <= 1'b0; # 100 # 10 cpu_mmu_wr_A <= 1'b1; cpu_mmu_addr_A <= 7'b0011101; cpu_mmu_data_A <= 5'b00100; # 10 cpu_mmu_wr_A <= 1'b0; # 50 #10 cpu_mmu_rd_B <= 1'b1; cpu_mmu_addr_B <= 7'b0011101; # 10 cpu_mmu_rd_B <= 1'b0; #200 end endmodule
60
References
1) Computer Architecture “A Quantitative Approach,” John L. Hennessy & David A. Patterson
2) Computer Organization and Design, John L. Hennessy & David A. Patterson
3) Advanced Digital Design with the Verilog HDL, Michael D. Ciletti