Post on 20-Jan-2018
description
Simulation of O2 offline processing – 02/2015Faculty of Electrical Engineering, Mechanical Engineering and Naval ArchitectureEugen Mudnić
What has been done:• Created Omnet++ DE framework for simulation of massive
data processing– Implemented network flow model
– Implemented simulation of simple global file system
– Implemented simulation of job generation– Implemented simulation of (primitive)
• Processing node• Storage node
– Started tests/debugging of simulation framework
Omnet++ (4.6)
– A lot of C++ 11 code
– More manageable code thanprevious C++ vers.
– Requires good C++programming skills
Implemented network flow model - topology
– Bandwidth sharing links, discrete data flow changes
– Included some dynamics for smaller files (to be refined if necessary)
– Model is defined programmatically from standard Omnet++ module/channel topology description (NED language with optional visualization)
– Network simulation consumes most of simulation time
– Test case:• 3x300 EPN (10Gbps)• 1 EPN = 8 slots• 3 x SE (400Gbps)• Non-blocking switches• Simulation time ?
System configuration / job workload / ….
Processing node
• Groups of processing nodes (A,B,C) with common parameters
• Multiple execution slots per node • Capabilities (could be matched with job requirements)• Slots[0..n]<- bandwidth -> BUS <- bandwidth -> network• Job execution (at this moment):
– Load input files (remote->local storage/memory)
– Execute (exec. time based on kHEPSpec of the machine)
– Save output (->remote storage)
Storage node
• Groups of storage nodes (A,B,C) with common parameters• One storage unit <- bandwidth -> BUS <- bandwidth -> network
– More detailed model required • Global file system / storage node content:
• Storage state – preserved in database for successive simulations
Global file system• What is stored where – minimal description• Where job can find required input files
– Some files are with fixed position
– Other have probability that they exist in on some SE • File types
• Storage elements
• File instances
Simulation running
• 20000 jobs -> 900 processing nodes– Input 60000 files/output 20000 files ~4PB data
• EPN_A uses SE_A for data input• Real time ~4h - simulation time ~4h • Simulation time depends heavily on data transport parallelism• At this moment not optimized
Current work - further steps• Settled Omnet framework for massive job processing
simulation• Current work: improving performances, debugging• Further steps: customizing to O2 data processing scenarios
– Implement O2 job workload management system
– Define O2-like network/EPN/storage topology
– Define data distribution on storage elements (what is where)
– More detailed storage and processing node model