This is a short summary of great article How Browses Work by Tali Garsiel. My opinion that it is a must-read article not only for web developers, but for everyone who want to know how browsers work.
I’m not a web developer, but was interesting in this article to know more about how browsers work inside. This article can help to understand the process of rendering html page, loading and processing CSS and JavaScript files by browser.
Here I list of my highlights after reading How Browsers Read article:
As you can see, browser needs to build a few different trees: DOM tree, rendering tree, style rules tree, style context tree etc.
The rendering engine will start getting the contents of the requested document from the networking layer. This will usually be done in 8K chunks.
To render a page faster, browser doesn’t wait while all page and css and scripts are loaded. Instead browser loads a chunk of html code, parse it and build a DOM and rendering trees. Then rendering tree is used to paint the page, so user can see it. If we know the size of chunk, we can predict how fast is html page processed. If the html page size is under 8K (or whatever specific browser limit), we should expect it to load much faster.
For better user experience, the rendering engine will try to display contents on the screen as soon as possible. It will not wait until all HTML is parsed before starting to build and layout the render tree.
The algorithm is expressed as a state machine. Each state consumes one or more characters of the input stream and updates the next state according to those characters. The decision is influenced by the current tokenization state and by the tree construction state. This means the same consumed character will yield different results for the correct next state, depending on the current state.
Special state machine used to parse HTML document. I was interesting how they parse HTML, because it’s pretty hard to parse such forgiving language. From this article it’s clear that state machine is used to parse the document. In article described:
The input to the tree construction stage is a sequence of tokens from the tokenization stage The first mode is the"initial mode". Receiving the html token will cause a move to the "before html" mode and a reprocessing of the token in that mode. This will cause a creation of the HTMLHtmlElement element and it will be appended to the root Document object.
The state will be changed to "before head". We receive the "body" token. An HTMLHeadElement will be created implicitly although we don't have a "head" token and it will be added to the tree.
Browsers use stack to order tags correctly and fix a mix of opened/closed tags. Opened tags are added to the stack, and when they are closed and corrected, they are inserted into the DOM tree. As result DOM tree is always in correct state. Different browsers handle new DOM elements differently: either provide dirty bit or fire events. While parsing document DOM elements are processed and added to the Rendering tree, that becomes a source for painting phase.
When the parsing is finished browser will mark the document as interactive and start parsing scripts that are in "deferred" mode - those who should be executed after the document is parsed. The document state will be then set to "complete" and a "load" event will be fired.
The model of the web is synchronous. Authors expect scripts to be parsed and executed immediately when the parser reaches a <script>
tag. The parsing of the document halts until the script was executed. If the script is external then the resource must be first fetched from the network - this is also done synchronously, the parsing halts until the resource is fetched. This was the model for many years and is also specified in HTML 4 and 5 specifications. Authors could mark the script as "defer" and thus it will not halt the document parsing and will execute after it is parsed. HTML5 adds an option to mark the script as asynchronous so it will be parsed and executed by a different thread.
Asynchronous scripts can be loaded faster maybe even before page is rendered, because browser does not need to wait to load it. Still there is a single rendering thread, so after script is loaded, any changes to the tree will be applied and processed by main thread.
While executing scripts, another thread parses the rest of the document and finds out what other resources need to be loaded from the network and loads them. This way resources can be loaded on parallel connections and the overall speed is better. Note - the speculative parser doesn't modify the DOM tree and leaves that to the main parser, it only parses references to external resources like external scripts, style sheets and images.
This is how browser can parse the document and load resources, like CSS, scripts and images.
The style contexts contain end values. The values are computed by applying all the matching rules in the correct order and performing manipulations that transform them from logical to concrete values. For example - if the logical value is percentage of the screen it will be calculated and transformed to absolute units. The rule tree idea is really clever. It enables sharing these values between nodes to avoid computing them again. This also saves space.
As it said in article, handling style correctly was made very clever. A tree structure and other optimizations help to handle correctly styles inheritance and large number or styles properties.
In the article also said:
All the matched rules are stored in a tree. The bottom nodes in a path have higher priority. The tree contains all the paths for rule matches that were found. Storing the rules is done lazily. The tree isn't calculated at the beginning for every node, but whenever a node style needs to be computed the computed paths are added to the tree.
As result, real values (like height, margin etc) for each style are not computed till they needed.
After parsing the style sheet, the rules are added one of several hash maps, according to the selector. There are maps by id, by class name, by tag name and a general map for anything that doesn't fit into those categories. If the selector is an id, the rule will be added to the id map, if it's a class it will be added to the class map etc.
This manipulation makes it much easier to match rules. There is no need to look in every declaration - we can extract the relevant rules for an element from the maps. This optimization eliminates 95+% of the rules, so that they need not even be considered during the matching process.
Hash maps are used as indexes. This is another performance optimization.
Before repainting, webkit saves the old rectangle as a bitmap. It then paints only the delta between the new and old rectangles.
Well, another performance and usability optimization.
The browsers try to do the minimal possible actions in response to a change. So changes to an elements color will cause only repaint of the element. Changes to the element position will cause layout and repaint of the element, its children and possibly siblings. Adding a DOM node will cause layout and repaint of the node. Major changes, like increasing font size of the "html" element, will cause invalidation of caches, relyout and repaint of the entire tree.
Expected optimization - the larger change the more relayout and repaint is needed.