Targeted Bottleneck #1: Rule Matching EECS Electrical Engineering and Computer Sciences B ERKELEY P...

1
Targeted Bottleneck #1: Rule Matching BERKELEY PAR LAB Parallel Cascading Style Sheets Leo Meyerovich, Chan Siu Man, Chan Siu On, Ras Bodik {lmeyerov, siuman, siuon, bodik}@eecs.berkeley.edu Status Targeted Bottleneck #2: Layout The Solution Focus on parallelizing layout: later build outwards with langs & libs Parallelize layout bottleneck #1: rule matching • Task-parallel design for embarrassingly parallel general case • SIMD solution for common special case Parallelize layout bottleneck #2: layout solving • Define semantics of CSS kernel to expose structure, dependencies • Optimistically parallel solving informed by semantics The Problem Visual layout solving is a bottleneck for web browsers: •Network improvements (ex: 3GLTE) make the software a bottleneck •Layout slows initial response time •Layout solving blocks subsequent JavaScript manipulations •A common computational task on devices so a power concern Hard to Optimize: • The specification is defined in English: dependencies are unclear • Core parts are undefined and/or have competing implementations • Flow-based layouts are intuitively modeled sequentially • Sequential optimizations already in place (10-15 years of work) “Multiprocessors are no help to TeX“ -- Knuth What Happens When a Page Loads? document style XML rule s parse pars e XML normali ze style tree rule match frame tree layout pixel image pain t Normalize Rearrange XML document using tree rewrite rules to simplify layout Rule match For each XML document node, find and combine style rules matching it Layout Compute box size, position, color, font, etc. of document nodes Paint Convert tree of relatively positioned atomic content into a pixel grid Where Does the CPU Time Go? Templates style documents. Properties are batched inside rules. Rules are path predicates that may apply to multiple document nodes. CSS ::= (Rule ‘{‘ <property>* ‘}’)* Rule ::= E [‘,’ E]* E ::= E 1 E 2 | E 1 ‘>’ E 2 | E 1 ‘+’ E 2 | L L ::= [<tag> | ... | ‘#’ <id>] (‘.’ <class>)* disjunction descendant child sibling node attributes Rule Matching Task Decomposition The basic sequential matching algorithm examines nodes individually. It simultaneously walks up from a node and backwards from a selector. A common optimization is to initially compact the tree into a trie. Conflicts between properties from rules are are broken using a total ordering on rules. We skip this now. At least two simple axis for decomposition for parallelism: 1.Partitioning the document tree 2.Partitioning the rules SIMD Parallelism Optimize the Common Case Trees are small and rules are quickly computed data partitions should be big or, dispatch, fast single node or small descendant chain common By expanding a tree into a table of paths, might be able to match multiple rules to a node, or vice versa. Possible Future Work People (UC Berkeley) End of current phase: understanding & modeling • browser profiling • CSS layout semantics • sequential layout solving • analyzing typical page • sequential rule matching Next phase: experimentation • parallel layout solver • parallel rule matcher, SIMD rule matcher • extending layout model preprocessing Exploit Pipeline Parallelism Initial load is dominated by parsing and processing, which can also be described well by a pipeline of attribute and tree rewrite grammars. Doing so, and combining with layout grammars, provides room for both local and global optimizations. Explore Memoization and Adaptivity Changes to layout often only have local impact; it is unclear how to factor this in with parallelism Parallel Extensions • animation operators • expose dependencies to a scripting runtime • … Generalize the Solver It is unclear what types of constraints CSS solves; we want to generalize our solver. 1 2 3 4 5 6 1 2 3 4 5 Given style properties on nodes, information is propagated around the tree to yield sizes, colors, positions, etc. of nodes. Solving has convoluted and long dependencies 1.Tree-rewrite grammar to normalize 2.Attribute grammar for tentative widths 3.Attribute grammar for final widths 4.Attribute grammar for heights, positions Proposing the First Semantics of CSS Layout Decomposition Every attribute grammar reveal 2-3 passes and thus a way to decompose based on tree structure 1.Downwards flows, forking on branches 2.Upwards flows, topological wavefront (joins) Optimistic Parallelism Semantics reveal and isolate unlikely sequential dependency inducing calculation threaded through the grammar. We can wrap these in futures and optimistically decompose ignoring them, recomputing later as needed. Leo Meyerovich Chan Siu Man Chan Siu On Ras Bodik Average from slashdot, wikipedia, digg Numbers from Firefox; Microsoft reports layout/matching = 40% in IE layout rule matching painting parsing, xml munging Average % of rule matches by selectors per rule (from 16 popular pages)

Transcript of Targeted Bottleneck #1: Rule Matching EECS Electrical Engineering and Computer Sciences B ERKELEY P...

Page 1: Targeted Bottleneck #1: Rule Matching EECS Electrical Engineering and Computer Sciences B ERKELEY P AR L AB Parallel Cascading Style Sheets Leo Meyerovich,

Targeted Bottleneck #1: Rule MatchingTargeted Bottleneck #1: Rule Matching

BERKELEY PAR LAB

Parallel Cascading Style Sheets Leo Meyerovich, Chan Siu Man, Chan Siu On, Ras Bodik

{lmeyerov, siuman, siuon, bodik}@eecs.berkeley.edu

StatusStatusTargeted Bottleneck #2: LayoutTargeted Bottleneck #2: Layout

The SolutionThe SolutionFocus on parallelizing layout: later build outwards with langs & libs

Parallelize layout bottleneck #1: rule matching• Task-parallel design for embarrassingly parallel general case• SIMD solution for common special case

Parallelize layout bottleneck #2: layout solving• Define semantics of CSS kernel to expose structure, dependencies• Optimistically parallel solving informed by semantics

The ProblemThe ProblemVisual layout solving is a bottleneck for web browsers:

•Network improvements (ex: 3GLTE) make the software a bottleneck•Layout slows initial response time•Layout solving blocks subsequent JavaScript manipulations•A common computational task on devices so a power concern

Hard to Optimize:

• The specification is defined in English: dependencies are unclear• Core parts are undefined and/or have competing implementations• Flow-based layouts are intuitively modeled sequentially• Sequential optimizations already in place (10-15 years of work)

“Multiprocessors are no help to TeX“-- Knuth

What Happens When a Page Loads?What Happens When a Page Loads?

documentdocument

stylestyle

XMLXML

rulesrules

parse

parse

XMLXMLnormalize

style treestyle tree

rule matchrule match

frame treeframe treelayoutlayout

pixel imagepixel imagepaint

NormalizeRearrange XML document using tree rewrite rules to simplify layout

Rule matchFor each XML document node, find and combine style rules matching it

LayoutCompute box size, position, color, font, etc. of document nodes

PaintConvert tree of relatively positioned atomic content into a pixel grid

Where Does the CPU Time Go?Where Does the CPU Time Go?

Templates style documents. Properties are batched inside rules. Rules are path predicates that may apply to multiple document nodes.

CSS ::= (Rule ‘{‘ <property>* ‘}’)*Rule ::= E [‘,’ E]* E ::= E1 E2 | E1 ‘>’ E2 | E1 ‘+’ E2 | L L ::= [<tag> | ... | ‘#’ <id>] (‘.’ <class>)*…

disjunctiondescendant

childsibling

node attributes

Rule Matching Task DecompositionRule Matching Task Decomposition

The basic sequential matching algorithm examines nodes individually. It simultaneously walks up from a node and backwards from a selector. A common optimization is to initially compact the tree into a trie.

Conflicts between properties from rules are are broken using a total ordering on rules. We skip this now.

At least two simple axis for decomposition for parallelism:

1. Partitioning the document tree 2. Partitioning the rules

SIMD ParallelismSIMD Parallelism

Optimize the Common CaseOptimize the Common Case

Trees are small and rules are quickly computed• data partitions should be big or, dispatch, fast• single node or small descendant chain common

By expanding a tree into a table of paths, might be able to match multiple rules to a node, or vice versa.

Possible Future WorkPossible Future Work

People (UC Berkeley)People (UC Berkeley)

End of current phase: understanding & modeling • browser profiling• CSS layout semantics• sequential layout solving• analyzing typical page• sequential rule matching

Next phase: experimentation

• parallel layout solver• parallel rule matcher, SIMD rule matcher• extending layout model

preprocessing

Exploit Pipeline Parallelism

Initial load is dominated by parsing and processing, which can also be described well by a pipeline of attribute and tree rewrite grammars. Doing so, and combining with layout grammars, provides room for both local and global optimizations.

Explore Memoization and Adaptivity

Changes to layout often only have local impact; it is unclear how to factor this in with parallelism

Parallel Extensions

• animation operators• expose dependencies to a scripting runtime• …

Generalize the Solver

It is unclear what types of constraints CSS solves; we want to generalize our solver.

1

2

3 4

5 6

11 223344

55

Given style properties on nodes, information is propagated around the tree to yield sizes, colors, positions, etc. of nodes.

Solving has convoluted and long dependencies

1.Tree-rewrite grammar to normalize2.Attribute grammar for tentative widths3.Attribute grammar for final widths4.Attribute grammar for heights, positions

Proposing the First Semantics of CSSProposing the First Semantics of CSS

Layout DecompositionLayout Decomposition

Every attribute grammar reveal 2-3 passes and thus a way to decompose based on tree structure

1. Downwards flows, forking on branches2. Upwards flows, topological wavefront (joins)

Optimistic ParallelismOptimistic Parallelism

Semantics reveal and isolate unlikely sequential dependency inducing calculation threaded through the grammar.

We can wrap these in futures and optimistically decompose ignoring them, recomputing later as needed.

Leo Meyerovich Chan Siu Man

Chan Siu On Ras Bodik

Average from slashdot, wikipedia, diggNumbers from Firefox; Microsoft reports layout/matching = 40% in IE

layoutrule matching

painting

parsing, xml munging

Average % of rule matches by selectors per rule

(from 16 popular pages)