FixBrowser

Blog

FixBrowser: Inside the rich text component (+web demo) (2025/02/10)

This blog post describes the rich text component used in FixBrowser web browser to display HTML pages and for interaction with the user.

About FixBrowser

FixBrowser is a lightweight web browser created from scratch. It is designed from ground up with privacy in mind. This is achieved by having a whitelist approach where only the resources strictly needed by the website are loaded and by having JavaScript disabled (wait! read the next paragraph before dismissing it).

Many websites are however not usable with JavaScript disabled, therefore FixBrowser contains an updated set of fix scripts that fix and even improve various websites as well as groups of websites (using the same common technology such as WordPress, Disqus forums, etc.).

FixBrowser is not usable for practical usage yet. However you can use the FixProxy tool that uses the "backend" part of the browser (everything other than the actual rendering/layouting) in a regular web browser. I've been using this proxy as my primary way of web browsing for multiple years with good results.

The approach

The complexity of implementing a web browser from scratch was greatly reduced by not supporting JavaScript. This creates a cascade effect of simplification on every component and the overall architecture.

In a regular browser the complexity is very high because JavaScript can trigger any kind of update at any time. This requires to maintain complex data structures that store the relationships between the HTML, CSS and view models:

The view model maintains it's own hierarchy by having a stacking context. It is hierarchical to allow things like having position: relative to act as an "anchor" to which nested positioned elements are relative to.

When JavaScript is not supported at all it allows to decouple all the models from each other and follow the UNIX philosophy of "do one thing and do it well". Instead of having everything intermixed together you can process each model separately. This means the processing can go in a straight way: HTML → CSS → view model.

There is no need to maintain complex data structures that allow to change HTML or CSS dynamically. This saves both CPU and memory usage and is much simpler to implement. For example applying of the CSS cascade rules can be achieved by a simple recursion. The layers can be reduced to a simple linear list instead of being hierarchical. After the processing is done, only the data structures of the view model remain.

Rich text component

The rich text component is designed as a toolkit for implementing your own representations. It can be used for a simple edit box or for a complex rendering of the web pages or anything between. More complex representations (such as tables) are achieved by nesting the rich text layers.

The core of the component is a Layer. It contains zero or more blocks (eg. paragraphs) where each block contains inline text, inline objects or floating objects. Both blocks and inline objects have attached their style information.

The inline objects are layouted into separate rows. The text is split if it's longer than a single row. Each row has the height of the tallest object present on the row and text/objects are aligned to a common baseline.

The structure of Layer

Block

The model is available in the browser/richtext/model.fix file. It actually consists of just a single class Block that contains a packed array of structures containing flows. There are two kinds of flows: text and objects. Text is handled specially as it can be word wrapped. Each flow contains an opaque reference to style, this is not interpreted by the rich text component in any way, just passed along for your purposes. The Block has also it's own style attached to it.

The provided methods add, set and remove allow to manipulate the flows. And the get_flow_* methods to retrieve information about the stored flows.

Layer

To create a Layer you need to pass an array of Blocks along with the instance of a ModelToView class. This class converts the representation in Blocks to view representation using the LineBuilder class. This class is responsible for layouting the individual rows of text and inline objects to given width. Whenever the Layer is resized this process is repeated.

You can look at an example how the ModelToView can be implemented here.

LineBuilder

It supports these operations:

Each inline part has it's own handler that is responsible for sizing, painting and handling of the events from the user.

Demo

The demo shows a test of the rich text component. It shows these features:

The model of the shown content is defined in code here.

How it is used for rendering of HTML pages

The features of the rich text component is all that is needed to support rendering full HTML pages. Some examples of how things are handled:

The result is that the rich text component can be fully developed and all edge cases resolved without any concern of the complexities of rendering HTML pages. This would be impossible with JavaScript support as that would require to handle everything in one giant interconnected mess instead of having a nice separation of concerns.

Unicode support

My general stance about Unicode support is to simplify the processing of Unicode in a way that makes it simple to use by other code. If that can't be done there is a question if that particular feature should be supported. I'm willing to not support every language out there if otherwise it would mean that the implementation complexity would spike as that has a cascade effect on everything.

Characters in Unicode are hard. They can be a single code point, or a combination of multiple code points. Many characters can be represented in multiple ways (eg. the precomposed vs decomposed characters) but they should be treated as equal. Each "true" character, as perceived from an user perspective, is called a grapheme cluster.

They should be treated as a single character, that means selection and editing actions should treat it as an opaque block. However when writing characters it needs to be temporarily handled specially as the user needs to add code points to compose the final character by using a specific input method.

Grapheme clusters

As explained before the overall approach to the design is to follow the UNIX philosophy of "doing one thing and doing it well". Mixing multiple aspects together, such as handling complexity of Unicode with rich text handling, is against this philosophy.

Instead I've came up with a simple solution (finding such simple solutions can be surprisingly hard though) that decouples these problems. FixBrowser is written in an extensible language (FixScript) that allows a wide range of extensions both directly by the language and by adjusting the native runtime. This allowed to implement a dynamic allocation of characters that represent whole grapheme clusters. It even integrates with the garbage collector (GC) to deallocate characters that are no longer used.

This way the rich text implementation can be blissfully ignorant about the whole concept and work with grapheme clusters as single characters like in the good old pre-Unicode days when you dealt with just one language/encoding.

Right-to-left languages

I have not done any implementation work on this yet but from the preliminary research it doesn't appear to be that complicated to add. It affects the direction of layouting of inline objects and how the selection is done. Something that can be directly handled by the rich text component without adding too much complexity (famous last words).

Fundraising

The project needs your help. For this initial round €5000 will be needed to be raised. It will allow me to work on FixBrowser to make it usable for actual browsing and also implement some of the additional areas based on a vote. The work will take about a year, expecting major improvements within about 6 months.

Please select additional areas that you wish to be implemented the most (read FAQ section for more information):







Extra suggestions (eg. what websites or features to support etc.):


E-mail: (optional, used solely for contacting you in relation to the donation)

Donation amount: (you will need to enter it again on PayPal website)



You can donate using PayPal account or debit/credit card (no PayPal account required).

Comments

No comments.

Add comment

Name:
Content:
Confirmation code:
 
  (type the letters shown in the picture)
 


The website was designed for modern browsers and IE4+.