Graph Algorithms for Irregular, Unstructured Data

Graph Algorithms for Irregular, Unstructured Data

John FeoCenter for Adaptive Supercomputing Software

Pacific Northwest National Laboratory

July, 2010

Analytic methods and applications

Community thought leaders

Blog Analysis

Community Activities

FaceBook - 300 M users

Connect-the-dots

Bus

HayashiZaire

TrainAnthrax

MoneyEndo

National Security

People, Places, & Actions

Semantic Web

Anomaly detection

Security

N-x contingency analysis

SmartGrid

Data analytics

Sample queries: Allegiance switching: identify entities that switch communities.Community structure: identify the genesis and dissipation of communitiesPhase change: identify significant change in the network structure

Traditional graph partitioning often fails:Topology: Interaction graph is low-diameter and has no good separatorsIrregularity: Communities are not uniform in sizeOverlap: individuals are members of one or more communities

1000x growth

in 3 years!

has more than 300 million active users

http://www.new.facebook.com/album.php?profile&id=20531316728

Graphs are not grids

Graphs arising in informatics are very different from the grids used in scientific computing

Static or slowly involving

Planar

Nearest neighbor communication

Work performed per cell or node

Work modifies local data

Scientific Grids

Dynamic

Non-planar

Communications are non-local and dynamic

Work performed by crawlers or autonomous agents

Work modifies data in many places

Graphs for Data Informatics

Small-world and scale-free

In low diameter graphswork explodesdifficult to partitionhigh percentage of nodes are visited

“Six degrees of separation”

Large hubs are in grey

In scale-free graphs difficult to partitionwork concentrates in a few nodes

PathsShortest path

Betweenness

Min/max flow

StructuresSpanning trees

Connected components

Graph isomorphism

GroupsMatching/Coloring

Partitioning

Equivalence

Graph methodsInfluential Factors

Degree distributionNormal

Scale-free

Planar or non-planar

Static or dynamic

Weighted or unweightedWeight distribution

Typed or untyped edges

Load imbalanceNon-planar

Concurrent insertsand deletions

Difficult to partition

Challenges

Problem sizeTon of bytes, not ton of flops

Little data localityHave only parallelism to tolerate latencies

Low computation to communication ratioSingle word accessThreads limited by loads and stores

Frequent synchronizationNode, edge, record

Work tends to be dynamic and imbalancedLet any processor execute any thread

Grids, Uniform, and Scale-Free GraphsUSA Roadmap

Uniform

Scale-Free

METIS Partitioner

System requirementsGlobal shared memory

No simple data partitionsLocal storage for thread private data

Network support for single word accessesTransfer multiple words when locality exists

Multi-threaded processorsHide latency with parallelismSingle cycle context switchingMultiple outstanding loads and stores per thread

Full-and-empty bitsEfficient synchronizationWait in memory

Message driven operationsDynamic work queuesHardware support for thread migration

Cray XMT

Center for Adaptive Supercomputer Software

Driving Development of Next-Generation Massively Multithreading Architectures

Sponsored by DOD

Summary

The new HPC is irregular and sparseThere are commercial and consumer applications

If the applications are important enough, machines will be built

HPC is too large and too diverse for “one size fits all”We need to build the right machines for the problems we have to solve

Graph Algorithms for Irregular, Unstructured Data

Documents

Transcript of Graph Algorithms for Irregular, Unstructured Data