By: Simon Brooke :: 24 February 2025
This essay is likely to be revised, probably several times. It is tracked on archive.org, so that you'll be able to go back through versions. I'm not promising to do serious work on this proposal by myself, but if others are interested I think it may be worth pushing forward with.
Discussion of this proposal can be found here, and, if you wish to contribute, I'd recommend that in the first instance you post to that thread.
Updated: 25th February (three times); 26th February; 27th February.
Why?
The Web has become a remarkably hostile and abusive place, in which the vast majority of websites you visit are enshittified — they spy on and track your actions in ways which are hard for us to understand or give truly informed consent to. Many commercial news sites, for example, share information about your activities on them with literally hundreds of other commercial entities.
They use mechanisms which are literally invisible to you to infer which devices you use, and therefore to recognise that a user of a given device is likely to be you, and also to track which other devices are often used in places where you are, and consequently who your family, friends, and other habitual associates are.
These mechanisms of tracking use features which were added to the web in its early, relatively innocent days, and many of which are genuinely useful for other purposes which are not abusive. The use of cookies, for example, can allow a website to remember what your preferred weather forecast locations are. The use of images can add useful diagrams or attractive pictures to a web page.
So this essay is about whether, and how, we can build a limited version of the Web which protects us from as many of the abuses as possible while retaining as many as possible of the desirable features of the modern Web.
The firewall
My idea for a Light Weight Web is a firewall implemented in the browser, which actively and as a positive choice does not implement features of the Web — including if necessary features mandated by standards — which are disproportionately used in ways which are abusive. The Light Weight Web browser would consequently not be able to be used abusively in these ways, but could interact normally with existing non-abusive websites, and interact with existing abusive websites but with limited functionality.
The things which would be disallowed include:
Only one cookie – of no more than 64 bits – per host part (DNS name), stored for at most 30 days;
Cookies are useful, because HTML itself is stateless. If you are trying to allow a user to interact with any back end data source you need to know which conversation each request is part of. So it's worth allowing one cookie per host domain; it might even be worth allowing more. My rationale for allowing one only is because the host can use the value of that one cookie as a key into a conversations database in which it can store related data.
I am not yet certain what should happen when an attempt is made to set a second cookie; it could either overwrite the first or be ignored. I'm also not sure what should be done about the names of cookies; if cookie names are stored they provide a backdoor mechanism for servers to store additional data, so my opinion is that they should probably not be.
All elements of a
set-cookie
header except theNAME=VALUE
part andMax-Age=NN
shall be ignored.Max-Age
values greater than the maximum limit (by default 30 days) shall be ignored.I currently think that, probably, whatever name the sending server sends should not be stored, and when returning a cookie to a server which has previously set one, the current value should be sent as the value of a cookie called
lww-cookie
.A limited number of 'well known' cookie names might perhaps be honoured (and thus returned), possibly including things like
session-id
,username
,account-id
.useful criticism of this limitation from David Chisnall:
64 bits is not enough for a vaguely secure system. Cookies are not under the server's control. If you are using 56-bit user ID and that's there as raw data trusted on the host, then it's easy for an attacker on any site with a non-trivial number of users to just send random numbers in the cookie and hijack existing sessions. Remember the birthday paradox.
128 bits is probably fine, since that would let you store an ephemeral UUID.
That said, a 128-bit limit require that you store all state on the server. No equivalent of local storage means that things like disconnected operation for web apps is not feasible. Maybe that's a goal, but I'd generally prefer to minimise the amount of server state (which is not under my control) rather than minimise the amount of client state (which is mine to inspect).
Cookies can only be set by and can only be read by the host whose name is visible in the address bar
This is actually a side effect of 'All resources must be fetched from the same host,' below. But it's worth stating separately, because setting third party cookies is the overwhelmingly most common and most intrusive form of web use tracking today.
All resources for a page must be fetched from the same host that is showing in the address bar;
Aside from the issue of cross-site cookies, allowing resources to be fetched cross-site is also a powerful mechanism for tracking, even when those cookies are blocked. For example, if you use web fonts from Google's font server, or JavaScript libraries from googleapis.com, Google can track all your users' interactions with your site simply by analysing their server logs.
Similarly, Google, Meta, Microsoft and Twitter (and many other abusive companies) make available 'widgets' which you can embed into your web pages to, for example, allow your users to post your content to their social media. These widgets typically embed images pulled directly from the company's own website, again allowing the companies to track all your users.
This most extreme and abusive example of this practice is the notorious 'Facebook pixel,' a transparent single pixel image that users of the page cannot even see is present. Of course, when a page fetches the Facebook pixel, Facebook will attempt to set many tracking cookies. But even if those cookies are blocked, Facebook can still track you via your IP address in their logs.
No JavaScript function can interact with any host other than the one in the address bar;
In the Light Weight Web, the address bar is precious. It shows you what entity you are interacting with, and implies that your communication is with that entity and that entity alone, that it is not being tracked by other entities. So the content of the address bar can only be changed by you directly, either by selecting a link anchor in the current page or by typing (or copy-pasting) directly into the address bar.
For the same reason, it will be impossible in a Light Weight Web client for a script in a page to move programmatically to another page, without voluntary user action.
In the same spirit,
<meta http-equiv="refresh"...>
tags will be honoured only by displaying a link to the user as part of the browser display;Make any element with a Z-axis set dismissible by the user, with a standard close box;
'Popovers' (floating elements which obscure content) which can only be dismissed by selecting 'I agree,' or some other phrase indicating consent to some demand, are a very common abusive feature of the modern web. There are non-abusive used for floating elements, but dismissibility will not interfere with those uses.
Prevent all popup windows;
Be conservative with new functionality;
New functionality is continuously being developed for the Web. It makes web browsers more and more complex, and thus harder to audit; further, this new functionality is mainly driven by corporate interests who profit from abusive uses of web technology, and consequently it must be assumed that these interesting new toys include by design new vectors for abuse.
Therefore each version of the Light Weight Web specification shall specify the versions of web standards which clients will support, and those versions will normally be substantially behind the bleeding edge.
Secure by default;
Light Weight Web browsers shall support HTTPS, only.
- Steve Holden very sensibly points out that if we're going to mandate HTTPS, we also need to mandate a set of root certificates, which raises a whole new can of worms.
Subsequent versions of the Light Weight Web specification may block additional functionality not explicitly blocked by this document. In particular, some limitation on the use(s) of client-side scripting seems desirable, and it isn't clear to me that it is desirable to have a Turing-complete scripting language available in the browser.
Limitations to client-side scripting capability might include:
- Limits on what client-side scripts can do, additional to the limit on writing to the address bar, and on fetching resources from hosts other than that stated in the address bar, described above;
- Limits on the total code size of client side scripts;
- Limits on the memory size a client side script can use.
This is all subject to further discussion.
X-LWW-Version header
Obviously, it will be beneficial for the client to communicate to the server that Light Weight Web restrictions are being applied, because when a sufficient proportion of a site's visitors are using Light Weight Web clients, it will benefit the publishers of that site to publish content which conforms to the Light Weight Web restrictions.
For this reason, conforming Light Weight Web clients must add a header X-LWW-Version
to each GET
, HEAD
and POST
request they issue. The value of this header shall be a dotted version number, initially 0.0.1
.
A server wishing to claim that the content it is serving conforms to the Light Weight Web specification may add a similar header to its responses.
Visual Indicator
The chrome of a Light Weight Web browser shall show a standardised icon on a green background when the action of loading a page does not violate and of the restrictions of the Light Weight Web, and the same icon with a strike through it on a red background when an attempt was made to violate one or more restrictions. Selecting this icon shall result in the display of a summary of restrictions violated.
Related proposals/tools/essays
Aral Balkan's Small Web proposal, and the related kitten server side development kit;
I'm greatly in sympathy with the idea of 'personal web servers,' in the spirit of the personal home pages of the early web. But we don't currently have technology sufficiently user friendly that ordinary non-technical people can set up such things for themselves, so this proposal seems to me to need a lot of work and technical underpinning. As it stands, it's entirely server side, so entirely compatible with this (client side) proposal.
Privacy Badger is a browser extension (which I use and recommend) which blocks a lot of abusive uses of web technology.
NoScript is a browser extension which blocks all client side scripting by default, enabling it only for domains trusted by the user. (I'm grateful to @Khrys@mamot.fr for pointing this one out).
John Allsopp's long and very thoughtful essay Reweirding* the Web, which serves as a high level introduction to a group of essays which lament the death of the 'old web' and call for its renewal:
Anil Dash's essay The Web We Lost;
Molly White's essay We can have a different web;
Maria Farrell and Robin Berjon's essay We Need To Rewild The Internet (fascinating to me for its parable about the invention of modern plantation forestry):
The first magnificent bounty had not been the beginning of endless riches, but a one-off harvesting of millennia of soil wealth built up by biodiversity and symbiosis. Complexity was the goose that laid golden eggs, and she had been slaughtered...
Our online spaces are not ecosystems, though tech firms love that word. They’re plantations; highly concentrated and controlled environments, closer kin to the industrial farming of the cattle feedlot or battery chicken farms that madden the creatures trapped within.
Andrew Stephens's (short) essay Save the Web by Being Nice;
Maggie Appleton's essay A Brief History & Ethos of the Digital Garden;
All of these linked essays are relevant to this proposal and worth reading.
Routes forward
The simplest way to produce a 'proof of concept' Light Weight Web browser would be to create a browser extension for an existing browser, a sort of 'Privacy Wolverine', which implements as many of the Light Weight Web restrictions as the extension API will allow (and I haven't yet investigated how many that is). The result would be a browser even heavier than the base browser, because with the extension; and it's unlikely that it would be possible to prevent other extensions from circumventing the restrictions, so not a very satisfactory solution.
The alternative is to wrap an existing rendering engine in a Light Weight Web network client, which would be doable — the network client part isn't actually very complicated, especially as it has to implement only one protocol.
Rendering engines
Anything that renders a web page needs a rendering engine. Modern rendering engines are very large, complex pieces of software, and the common and most complete ones are primarily developed by corporations whose main business is exploitative. While both Apple's WebKit and Google's Blink derive from KHTML and are thus in theory open source, auditing them to eliminate abusive code would be a very major task.
The Gecko and Servo engines were both originally products of Mozilla, a not-for-profit, but one which is mainly funded by abusive corporations. While Mozilla's Firefox browser, built on Gecko, has some privacy-respecting features, it's still as well to be wary of it, and again auditing it would be a major task.
So here are the candidates that I'm at present aware of:
Engine | Open source? | Active? | Written in | Builds? | SLOC (see below) | Summary |
---|---|---|---|---|---|---|
Amaya | MIT-style licence | No | C | No | 2,572,437 (but only 623,057 in git repo?) | Abandoned but interesting project: not merely a rendering engine but a browser and also a near-WYSIWYG editor for HTML. |
Blink | BSD and LGPLv2.1 | Yes | C++ | ? | 4,740,170 | Google's fork of Apple's fork of KHTML. A task to audit, and certainly full of all the things we want to block. |
Gecko | Mozilla Public Licence | Yes | C, C++, JavaScript | ? | 40,747,015 | Developed entirely by Mozilla after becoming a not-for-profit. Nevertheless, too big to audit. |
Lexbor | Apache-2.0 license | Yes | C | Yes | 940,137 | Relatively lightweight independent open source rendering engine. |
Litehtml | BSD-3-Clause license | Yes | C++/C | Yes | 66,512 | Not strictly a complete rendering engine, and not recommended by its author as a basis for a browser. |
KHTML | Mostly GPL3 | No | C++ | ? | 357,936 | The granddaddy of all modern commercial browser engines, but no longer maintained. |
NetSurf | GPL2 | Yes | C | Yes | 1,250,426 | However, CSS and JavaScript implementations are both substantially incomplete. |
Nyxt | No | Yes | Common Lisp | No | 30,483 | Small, and written in a language I'm comfortable with. |
Servo | MPL-2.0 license | Yes | Rust | ? | 10,425,155 | The use of Rust is interesting; it's a much more modern language than other engines are written in. |
SLOC: Source lines of code
The comparison of source lines of code here is crude. I've simply run the cloc
tool over a checkout of the source repository. So the count does include things which are not strictly code or which do not form part of the built artefact, and a more scientific count would probably result in substantially lower values; but by inspection I don't think it would change the rank ordering.
Builds?
A question mark in the build column indicates that I've not yet built the artefact, either because I haven't yet tried or because I encountered problems which were not blockers, and further tries might succeed; 'yes' indicates that it built without faults just by following the build instructions; 'no' indicates that attempting to build according to the build instructions failed.
Merits and demerits
Amaya
Surveying the scene, I'm much in sympathy with Amaya, which I used and liked back in the day. It will support much of the functionality that I think the Light Weight Web needs, and, additionally, is part of the toolkit which would make it easy for non-technical users to maintain their own websites. It isn't up to modern layout standards, but isn't that far behind.
Build failed on Debian because configure could not find the MesaGL libraries, although they are in fact present. I suspect it would build with a bit of further messing about.
KHTML
KHTML has a reputation for being well architected and written code, and the fact that Apple, Nokia, Google and Microsoft have each in turn chosen it as a base to build from may be counted as an endorsement of this. It's also a remarkably small codebase, as compared to the others.
Litehtml
Even smaller is Litehtml. I haven't yet attempted to build this, and I haven't found anything that uses it that I can test; but it is available as a library for Debian, so someone is using it!
Lexbor
My investigation of Lexbor has been cursory. It certainly deserves more examination. It builds cleanly, but as there is no actual browser as part of the build, it's not easy to evaluate.
Servo
All of the above are written in C or C++, which are not languages I think highly of. Servo is written in Rust, which is a much more modern language and should make it somewhat harder to write extremely buggy code. However, very surprisingly, it does not render CSS styled pages well. It's also, in my opinion, too big to meaningfully audit.
Nyxt
Nyxt is written in Common Lisp, and as programs written in very high level languages so often are, it's startlingly compact. However, the download instructions for Linux do not work and I have thus far failed to build it from source, so I can't (yet) evaluate it. Also, it doesn't appear to be open source, which probably rules it out. I encountered two faults during the build, one which I overcame, the second I haven't yet.
Conclusions
This is, at present, a document intended to provoke discussion: is a limited, less-abusive version of the Web worth building, and, if so, are the restrictions I propose the right ones?
Remember, a person who used a Light Weight Web browser as their daily driver could still keep a full-fat one around for when they wanted to slum it and indulge in unsafe sex with Netflix, Meta or Amazon, so creating a Light Weight Web browser doesn't limit people at all; it merely provides them with a safe and relatively hard to track browsing experience for most of the time. It also, possibly, if enough people adopt it, provides a motivation for commercial web sites to clean up their acts and pull less abusive shit.
I believe that a Light Weight Web Browser, which rendered the majority of websites adequately if not pixel-perfect, would be a not-enormous effort to build, if built on an existing rendering engine. However I do not propose, at least at present, to build anything alone. Unless other people come on board with it, this project will go nowhere.
Addendum
As I was writing this essay, Mozilla was introducing new 'Terms' for Firefox. I'm grateful to @mttaggart@infosec.exchange for the for this analysis:
have spent my night reading browser Terms and Privacy Policies. Why? Because I love you and hate myself, apparently.
So here's the deal: that "non-exclusive, royalty-free, worldwide license" you're granting to Firefox/Moz when you upload data through it? It is boilerplate language. Pretty common actually!
But not in browsers. In fact, not a single browser ToS has anything resembling this provision.
Know what does?
I wonder why Mozilla would want to use the same language those platforms do.
This is, to put it lightly, worrying. Mozilla has subsequently rowed back on it slightly in an update to a blog post, but the language stands. This seems to me to make a Light Weight Web — and one not dependent on organisations which are themselves dependent on abusers — even more vital.