Graph Algorithms for Irregular, Unstructured Data
description
Transcript of Graph Algorithms for Irregular, Unstructured Data
Graph Algorithms for Irregular, Unstructured Data
John FeoCenter for Adaptive Supercomputing Software
Pacific Northwest National Laboratory
July, 2010
Analytic methods and applications
Community thought leaders
Blog Analysis
Community Activities
FaceBook - 300 M users
Connect-the-dots
Bus
HayashiZaire
TrainAnthrax
MoneyEndo
National Security
People, Places, & Actions
Semantic Web
Anomaly detection
Security
N-x contingency analysis
SmartGrid
Data analytics
Sample queries: Allegiance switching: identify entities that switch communities.Community structure: identify the genesis and dissipation of communitiesPhase change: identify significant change in the network structure
Traditional graph partitioning often fails:Topology: Interaction graph is low-diameter and has no good separatorsIrregularity: Communities are not uniform in sizeOverlap: individuals are members of one or more communities
1000x growth
in 3 years!
has more than 300 million active users
Graphs are not grids
Graphs arising in informatics are very different from the grids used in scientific computing
Static or slowly involving
Planar
Nearest neighbor communication
Work performed per cell or node
Work modifies local data
Scientific Grids
Dynamic
Non-planar
Communications are non-local and dynamic
Work performed by crawlers or autonomous agents
Work modifies data in many places
Graphs for Data Informatics
Small-world and scale-free
In low diameter graphswork explodesdifficult to partitionhigh percentage of nodes are visited
“Six degrees of separation”
Large hubs are in grey
In scale-free graphs difficult to partitionwork concentrates in a few nodes
PathsShortest path
Betweenness
Min/max flow
StructuresSpanning trees
Connected components
Graph isomorphism
GroupsMatching/Coloring
Partitioning
Equivalence
Graph methodsInfluential Factors
Degree distributionNormal
Scale-free
Planar or non-planar
Static or dynamic
Weighted or unweightedWeight distribution
Typed or untyped edges
Load imbalanceNon-planar
Concurrent insertsand deletions
Difficult to partition
Challenges
Problem sizeTon of bytes, not ton of flops
Little data localityHave only parallelism to tolerate latencies
Low computation to communication ratioSingle word accessThreads limited by loads and stores
Frequent synchronizationNode, edge, record
Work tends to be dynamic and imbalancedLet any processor execute any thread
Grids, Uniform, and Scale-Free GraphsUSA Roadmap
Uniform
Scale-Free
METIS Partitioner
System requirementsGlobal shared memory
No simple data partitionsLocal storage for thread private data
Network support for single word accessesTransfer multiple words when locality exists
Multi-threaded processorsHide latency with parallelismSingle cycle context switchingMultiple outstanding loads and stores per thread
Full-and-empty bitsEfficient synchronizationWait in memory
Message driven operationsDynamic work queuesHardware support for thread migration
Cray XMT
Center for Adaptive Supercomputer Software
Driving Development of Next-Generation Massively Multithreading Architectures
Sponsored by DOD
Summary
The new HPC is irregular and sparseThere are commercial and consumer applications
If the applications are important enough, machines will be built
HPC is too large and too diverse for “one size fits all”We need to build the right machines for the problems we have to solve