CSS parsing: performance tips & tricks

Post on 07-Jan-2017

222 views 0 download

Transcript of CSS parsing: performance tips & tricks

CSS Parsing performance tips & tricks

Roman Dvornov Avito

Moscow, September 2016

Frontend lead in Avito

Specializes in SPA

Maintainer of:basis.js, CSSO, component-inspector, csstree and others

CSS parsing (russian)

3

tinyurl.com/csstree-intro

This talk is the continuation of

CSSTree

CSSTree – fastest detailed CSS parser

5

How this project was born

About a year ago I started to maintain CSSO

(a CSS minifier)

7

github.com/css/csso

CSSO was based on Gonzales (a CSS parser)

8

github.com/css/gonzales

What's wrong with Gonzales• Development stopped in 2013

• Unhandy and buggy AST format

• Parsing mistakes

• Excessively complex code base

• Slow, high memory consumption, pressure for GC

9

But I didn’t want to spend my time developing the

parser…

10

Alternatives?

You can find a lot of CSS parsers

12

Common problems• Not developing currently

• Outdated (don't support latest CSS features)

• Buggy

• Unhandy AST

• Slow13

PostCSS parser is a good choice if you need one now

14

postcss.org

PostCSS pros• Сonstantly developing

• Parses CSS well, even non-standard syntax + tolerant mode

• Saves formatting info

• Handy API to work with AST

• Fast15

General con: selectors and values are not parsed

(are represented as strings)

16

That forces developers to• Use non-robust or non-effective approaches

• Invent their own parsers

• Use additional parsers: postcss-selector-parser postcss-value-parser

17

Switching to PostCSS meant writing our own selector and value parsers,

what is pretty much the same as writing an entirely new parser

18

However, as a result of a continuous refactoring within a few months

the CSSO parser was completely rewrote (which was not planned)

19

And was extracted to a separate project

github.com/csstree/csstree

20

Performance

CSSO – performance boost story (russian)

22

tinyurl.com/csso-speedup

My previous talk about parser performance

After my talk on HolyJS conference the parser's

performance was improved one more time :)

23

* Thanks Vyacheslav @mraleph Egorov for inspiration

24

CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms

bootstrap.css v3.3.7 (146Kb)

github.com/postcss/benchmark

Non-detailed AST

Detailed AST

PostCSS Full = + postcss-selector-parser

+ postcss-value-parser

Epic fail as I realised later I extracted

the wrong version of the parser

25

😱github.com/csstree/csstree/commit/57568c758195153e337f6154874c3bc42dd04450

26

CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms

bootstrap.css v3.3.7 (146Kb)

github.com/postcss/benchmark

Time after parser update

13 ms

Parsers: basic training

Main steps

• Tokenization

• Tree assembling

28

Tokenization

30

• whitespaces – [ \n\r\t\f]+ • keyword – [a-zA-aZ…]+ • number – [0-9]+ • string – "string" or 'string' • comment – /* comment */ • punctuation – [;,.#\{\}\[\]\(\)…]

Split text into tokens

31

.foo { width: 10px;}

[ '.', 'foo', ' ', '{', '\n ', 'width', ':', ' ', '10', 'px', ';', '\n', '}']

We need more info about every token: type and location

32

It is more efficient to compute type and location

on tokenization step

33

.foo { width: 10px;}

[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]

Tree assembling

35

function getSelector() { var selector = { type: 'Selector', sequence: [] };

// main loop

return selector;}

Creating a node

36

for (;currentToken < tokenCount; currentToken++) { switch (tokens[currentToken]) { case TokenType.Hash: // # selector.sequence.push(getId()); break; case TokenType.FullStop: // . selector.sequence.push(getClass()); break; … }

Main loop

37

{ "type": "StyleSheet", "rules": [{ "type": "Atrule", "name": "import", "expression": { "type": "AtruleExpression", "sequence": [ ... ] }, "block": null }]}

Result

Parser performance boost Part 2: new horizons

39

[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]

Token's cost: 24 + 5 * 4 + array =

min 50 bytes per token

Our project ~1Mb CSS 254 062 tokens

= min 12.7 Mb

Out of the box: changing approach

Compute all tokens at once and then assembly a tree is much more easy, but needs more memory, therefore is

slower

41

Scanner (lazy tokenizer)

42

43

scanner.token // current token or nullscanner.next() // going to next tokenscanner.lookup(N) // look ahead, returns // Nth token from current token

Key API

44

• lookup(N)fills tokens buffer up to N tokens (if they are not computed yet), returns N-1 token from buffer

• next()shift token from buffer, if any, or compute next token

Computing the same number of tokens, but not simultaneously

and requires less memory

45

Problem: the approach puts pressure on GC

46

Reducing token's cost step by step

48

[ { type: 'FullStop', value: '.', offset: 0, line: 1, column: 1 }, …]

Type as string is easy to understand, but it's for

internal use only and we can replace it by numbers

49

[ { type: FULLSTOP, value: '.', offset: 0, line: 1, column: 1 }, …]

…// '.'.charCodeAt(0)var FULLSTOP = 46;…

50

[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]

51

[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]

We can avoid substring storage in the token – it's very

expensive for punctuation (moreover those substrings

are never used); Many constructions are assembled by several

substrings. One long substring is better than

a concat of several small ones

52

[ { type: 46, value: '.', offset: 0, line: 1, column: 1 }, …]

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

53

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Look, Ma! No strings just numbers!

54

Moreover not an Array, but TypedArray

Array of objects

Arraysof numbers

Array vs. TypedArray• Can't have holes

• Faster in theory (less checking)

• Can be stored outside the heap (when big enough)

• Prefilled with zeros

55

56

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 4 4 4 4

17 per token(tokens count) 254 062 x 17 = 4.3Mb

4.3Mb vs. 12.7Mb (min)

57

Houston we have a problem: TypedArray has a fixed length,

but we don't know how many tokens will be found

58

59

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 4 4 4 4

17 per token(symbols count) 983 085 x 17 = 16.7Mb

16.7Mb vs. 12.7Mb (min)

60

16.7Mb vs. 12.7Mb (min)

60

Don't give up, let's look on arrays

more attentively

61

start = [ 0, 5, 6, 7, 9, 11, …, 35 ]

end = [ 5, 6, 7, 9, 11, 12, …, 36 ]

61

start = [ 0, 5, 6, 7, 9, 11, …, 35 ]

end = [ 5, 6, 7, 9, 11, 12, …, 36 ]

62

start = [ 0, 5, 6, 7, 9, 11, …, 35 ]

end = [ 5, 6, 7, 9, 11, 12, …, 36 ]

offset = [ 0, 5, 6, 7, 9, 11, …, 35, 36 ] start = offset[i] end = offset[i + 1]

+

=

63

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 4 4 4 4

13 per token983 085 x 13 = 12.7Mb

64

a { top: 0;}

lines = [ 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3]

columns = [ 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1]

lines & columns

64

a { top: 0;}

lines = [ 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3]

columns = [ 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1]

lines & columns

65

line = lines[offset];

column = offset - lines.lastIndexOf(line - 1, offset);

lines & columns

65

line = lines[offset];

column = offset - lines.lastIndexOf(line - 1, offset);

lines & columns

It's acceptable only for short lines, that's why we cache the last line

start offset

66

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 4 4 4 4

9 per token983 085 x 9 = 8.8Mb

67

8.8Mb vs. 12.7Mb (min)

Reduce operations with strings

Performance «killers»*• RegExp • String concatenation • toLowerCase/toUpperCase • substr/substring • …

69

* Polluted GC pulls performance down

Performance «killers»*• RegExp • String concatenation • toLowerCase/toUpperCase • substr/substring • …

70

* Polluted GC pulls performance down

We can’t avoid using these things, but we

can get rid of the rest

71

var start = scanner.tokenStart;

scanner.next();

scanner.next();

return source.substr(start, scanner.tokenEnd);

Avoid string concatenations

72

function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }

for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);

if (sourceCode !== strCode) { return false; } }

return true;}

String comparison

No substring!

73

function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }

for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);

if (sourceCode !== strCode) { return false; } }

return true;}

String comparison

Length fast-check

74

function cmpStr(source, start, end, str) { if (end - start !== str.length) { return false; }

for (var i = start; i < end; i++) { var sourceCode = source.charCodeAt(i); var strCode = str.charCodeAt(i - start);

if (sourceCode !== strCode) { return false; } }

return true;}

String comparison

Compare strings by char codes

Case insensitive comparison of strings*?

75

* Means avoid toLowerCase/toUpperCase

Heuristics• Comparison with the reference strings only (str)

• Reference strings may be in lower case and contain latin letters only (no unicode)

• I read once on Twitter…

76

Setting of the 6th bit to 1 changes upper case latin letter to lower case

(works for latin ASCII letters only)

'A' = 01000001'a' = 01100001

'A'.charCodeAt(0) | 32 === 'a'.charCodeAt(0)

77

78

function cmpStr(source, start, end, str) { … for (var i = start; i < end; i++) { … // source[i].toLowerCase() if (sourceCode >= 65 && sourceCode <= 90) { // 'A' .. 'Z' sourceCode = sourceCode | 32; }

if (sourceCode !== strCode) { return false; } } …}

Case insensitive string comparison

Benefits• Frequent comparison stops on length check

• No substring (no pressure on GC)

• No temporary strings (e.g. result of toLowerCase/toUpperCase)

• String comparison don't pollute CG

79

Results• RegExp • string concatenation • toLowerCase/toUpperCase • substr/substring

80

No arrays in AST

What's wrong with arrays?• As we are growing arrays their memory

fragments are to be relocated frequently (unnecessary memory moving)

• Pressure on GC

• We don't know the size of resulting arrays

82

Solution?

83

Bi-directional list

84

85

85

AST node AST node AST node AST node

Needs a little bit more memory than arrays, but…

86

Pros• No memory relocation

• No GC pollution during AST assembly

• next/prev references for free

• Cheap insertion and deletion

• Better for monomorphic walkers87

Those approaches and others allowed to reduce memory consumption,

pressure on GC and made the parser twice faster than before

88

89

CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms

bootstrap.css v3.3.7 (146Kb)

github.com/postcss/benchmark

It's about this changes

13 ms

But the story goes on 😋

90

Parser performance boost story Part 3: а week after FrontTalks

In general

• Simplify AST structure

• Less memory consumption

• Arrays reusing

• list.map().join() -> loop + string concatenation

• and others…92

Once more time about token costs

94

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 types 4 offsets 4 4 lines 4

9 per token983 085 x 9 = 8.8Mb

lines can be computed on demand

95

96

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 types 4 offsets 4 4 lines 4

5 per token983 085 x 5 = 4.9Mb

Do we really needs all 32 bits for the offset?

Heuristics: no one parses more than 16Mb of CSS

97

98

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

99

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i]

+

=

100

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i]offsetAndType = [ 16777216, 788529157, … ]

+

=

101

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i]offsetAndType = [ 16777216, 788529157, … ]start = offsetAndType[i] & 0xFFFFFF;type = offsetAndType[i] >> 24;

+

=

102

[ { type: 46, start: 0, end: 1, line: 1, column: 1 }, …]

Uint8Array Uint32Array Uint32Array Uint32Array Uint32Array

1 types 4 offsets 4 4 lines 4

4 per token983 085 x 4 = 3.9Mb

3.9-7.8 Mb vs. 12.7 Mb (min)

103

104

class Scanner { ... next() { var next = this.currentToken + 1;

this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next + 1] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}

Needs 2 reads for 3 values (tokenEnd becomes tokenStart)

105

class Scanner { ... next() { var next = this.currentToken + 1;

this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next + 1] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}

But 2 reads look redundant, let's fix it…

106

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i + 1] & 0xFFFFFF;type = offsetAndType[i] >> 24;

106

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i + 1] & 0xFFFFFF;type = offsetAndType[i] >> 24;

107

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

The first offset is always zero

108

offset = [ 0, 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

Shift offsets to the left

109

offset = [ 5, 6, 7, 9, 11, 11, …, 1234 ]

type = [ 1, 47, 47, 4, 4, 47, 5, …, 3 ]

offsetAndType[i] = type[i] << 24 | offset[i + 1]offsetAndType[i] = type[i] << 24 | offset[i]start = endend = offsetAndType[i] & 0xFFFFFF;type = offsetAndType[i] >> 24;

110

class Scanner { ... next() { var next = this.currentToken + 1;

this.currentToken = next; this.tokenStart = this.tokenEnd; this.tokenEnd = this.offsetAndType[next] & 0xFFFFFF; this.tokenType = this.offsetAndType[next] >> 24; }}

Now we need just one read

111

class Scanner { ... next() { var next = this.currentToken + 1;

this.currentToken = next; this.tokenStart = this.tokenEnd; next = this.offsetAndType[next]; this.tokenEnd = next & 0xFFFFFF; this.tokenType = next >> 24; }}

-50% reads (~250k)

👌

Re-use

The scanner creates arrays every time when it parses

a new string

113

The scanner creates arrays every time when it parses

a new string

113

New strategy• Preallocate 16Kb buffer by default

• Create new buffer only if current is smaller than needed for parsing

• Significantly improves performance especially in cases when parsing a number of small CSS fragments

114

115

CSSTree: 24 msMensch: 31 msCSSOM: 36 msPostCSS: 38 msRework: 81 msPostCSS Full: 100 msGonzales: 175 msStylecow: 176 msGonzales PE: 214 msParserLib: 414 ms

bootstrap.css v3.3.7 (146Kb)

github.com/postcss/benchmark

13 ms 7 ms

Current results

And still not the end… 😋

116

One more thing

CSSTree – is not just about performance

118

New feature*: Parsing and matching of

CSS values syntax

119

* Currently unique across CSS parsers

Example

120

121

csstree.github.io/docs/syntax.html

CSS syntax reference

122

csstree.github.io/docs/validator.html

CSS values validator

123

var csstree = require('css-tree');var syntax = csstree.syntax.defaultSyntax;var ast = csstree.parse('… your css …');

csstree.walkDeclarations(ast, function(node) { if (!syntax.match(node.property.name, node.value)) { console.log(syntax.lastMatchError); }});

Your own validator in 8 lines of code

Some tools and plugins• csstree-validator – npm package + cli command

• stylelint-csstree-validator – plugin for stylelint

• gulp-csstree – plugin for gulp

• SublimeLinter-contrib-csstree – plugin for Sublime Text

• vscode-csstree – plugin for VS Code

• csstree-validator – plugin for Atom

More is coming…124

Conclusion

If you want your JavaScript works as fast as C, make it look like C

126

Previous talks• CSSO – performance boost story (russian)

tinyurl.com/csso-speedup

• CSS parsing (russian)tinyurl.com/csstree-intro

127

github.com/csstree/csstree

128

Your feedback is welcome

Roman Dvornov @rdvornov

github.com/lahmatiy rdvornov@gmail.com

Questions?

github.com/csstree/csstree