Minimising the Hardware Resources for a Cellular · PDF fileMinimising the Hardware Resources...
Transcript of Minimising the Hardware Resources for a Cellular · PDF fileMinimising the Hardware Resources...
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
1
Minimising the Hardware Resources for aCellular Automaton with Moving Creatures
FB InformatikFG RechnerarchitekturMathias HalbachRolf Hoffmann
ContentIntroductionDescription of the problemMethodologyImplementation VariantsConclusion
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
2
Creatures and the Environment
Challenge:Find the best behaviour of moving creatures automatically.Question in detail:n creatures are moving around in an unknownenvironment in order to visit all cells in shortest timewithout map generation.
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
3
Modelling
based on the cellular automata model by John von Neumann
– massively parallel– perfectly suited to be supported by hardware
fixed obstacles, empty cells, moving creatures.each creature (max. one per cell)
– can observe and move only in one direction (front cell),– can only move to empty cells,– must turn left or right,– has a state s (= creature's memory)
and an algorithm (= rule). ?
move and turn turn only
blockedfree
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
4
Problem Description
environment– obstacle h
location– creature c or ci
– position pi
– position of front cellcreature's
– state si
– direction ri
signals– turn di
– moving condition mi= can be visited
next position
next state and direction
ipr
ipr
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
5
Methodology of Evaluation
generate state machine (behaviour)
discard similar state machines
∀ environments: simulate and evaluate
discard bad behaviour
store performance of state machine for later statistics
∀al
gorit
hms
FPG
A H
ardw
are
Sof
twar
e
identify best algorithms
detailed statistics (state machines, environments)
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
6
Amount of State Machines
inspected different n state algorithms– n = 5: (2 × 5)(2 × 5) = 1010 = 10,000,000,000– n = 6: (2 × 6)(2 × 6) = 1212 = 8,916,100,448,256– n = 7: (2 × 7)(2 × 7) = 1414 = 11,112,006,825,558,016
very time consuming
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
7
One of the Best 6-State Algorithm
1 L3 Lm2 L 1 Rm0 L5 Lm4 R0 Rm5 R4 Lm3 R2 Rm
0 00 11 01 12 02 13 03 14 04 15 05 1
s r
v
m
Lm
L
R
Rm
0 2
1
3 5
4
L
L
R R
LmLmRmRm
Lm
L
R
Rm
0 2
1
3 5
4
L
L
R R
LmLmRmRm
s : control stater : directionv : new directionm : creature can moveL/R : turn left/R if (m=1)Lm/Rm : turn left/R and move if (m=0)
d
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
8
Performance of Different Algorithms
top 10 results for one environment, 6 states
What about multiple creatures?
<
generations
cros
sed
cells
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
9
Problem: Collision
multiple creatures want to move to the same empty cellone solution
– no priority– forbidden for all
our implementation– front cell collects requests– evaluates (Σ > 1?)– sending collision warning (occupied) or grant signal– can be done within the current clock cycle
front cell
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
10
Other Kind of Implementation
pipeline structure– phase 1: forecast for collision based on position– phase 2: calculate next state, position, and direction– forwarding needed!– forwarding results in single-cycle implementation– i. e. no pipelining possible
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
11
Implementation Variant: Uniform
union of environment cell and creature cellmassive parallel algorithm
– if creature is blocked: state and direction is set to new values;– if creature can move: empty cell becomes creature with new
direction and state, meanwhile creature becomes empty cell;– all other empty cells and obstacles remain unchanged.
collision detection– for empty cell which is front cell of multiple creatures:
send occupied signal and remain empty;– for creature: if front cell sends occupied, do blocked action.
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
12
Implementation Variant: Separated
creatures (state machines) anchored on fixed placeconnection to area via position pointer (i.e. multiplexer)creature stores
– position pi
– state si
– direction ri
collision detection by extra logic
ipr
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
13
Implementation Variant: Augmented A
intelligent environment (iEnv.) similar to "uniform"creatures functions separated from environmentenvironment stores creature index ci and direction rcreature is state machine
– input from area: movable mi
– output to area: turn di
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
14
Implementation Variant: Augmented B
intelligent creature (iCrea.) similar to "separated"environment cells know about
– own index (position) 0, 1, 2, …– occupied by creature i pi
– direction to front cell ri
– the calculated multiple access status of the front cell mi
multiplexer and collision detection are shifted into the area cells
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
15
Chip Overview for 8 × 8 Field with 8 Creatures
FPGA Altera Cyclone EP1C20F324C7
– clock at 50 MHz– change algorithm (rule)
during runtime– generate statistics– showing sequence of
calculationserial interface
– connected to a PC– for control and to archive data
easily
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
16
Experiment Result (Fit to 50 MHz)
5,433 LE 56 MHz
11,766 LE 51 MHznot possible
20,007 LEdoesn't fit
17 × 16 (2 creatures)
4,846 LE 55 MHz
11,088 LE 52 MHz
3,159 LE 53 MHz
18,883 LE doesn't fit
16 × 16 (2 creatures)
2,988 LE 51 MHz
6,346 LE 51 MHz
2,515 LE 54 MHz
11,582 LE67 MHz
12 × 12 (2 creatures)
5,674 LE 64 MHz
6,727 LE 39 MHz
3,785 LE 50 MHz
7,687 LE 76 MHz
8 × 8 (8 creatures)
1,873 LE 56 MHz
3,295 LE 52 MHz
1,623 LE 60 MHz
5,414 LE 81 MHz
8 × 8 (2 creatures)
1,439 LE 61 MHz
3,181 LE 56 MHz
1,312 LE 65 MHz
4,761 LE 52 MHz
8 × 8 (1 creature)
augmented B (iCrea.)
augmented A (iEnv.)
separateduniformenvironment
LE = logic elements
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
17
Analysis
Uniform: restricted to small field sizeSeparated: reasonable for small amount of creatures,
but limited field size (256 positions)Augmented A (iEnv.): restricted to small field sizeAugmented B (iCrea.): reasonable for large fields or many creatures
2 creatures8 × 8 cells
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
18
Conclusion
To find an optimization of a creature's state machine needs a lot of computation time.The logic should be minimized in order to increase the degree of parallelism and speeding up the optimization.Different implementation variants have been proposed and evaluated.
It is better to separate the functionality of the cells than to integrate them together.
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
19
Thank You for Your Attention!
Mathias [email protected] Darmstadt, Computer Architecture GroupHochschulstrasse 1064289 Darmstadt, Germany
℡ +49 6151 16 3713+49 6151 16 5410http://www.ra.informatik.tu-darmstadt.de/
Dar
mst
adt U
nive
rsity
of T
echn
olog
yC
ompu
ter A
rchi
tect
ure
Gro
up
20
Field Size 4 × 4 (relative more creatures)
0
1000
2000
3000
4000
5000
6000
1 2 4 8 12 16
Creatures
Logi
c El
emen
ts augmented B
augmented A
separated
uniform