Jacques Mattheij

Technology, Coding and Business

What You See Is What You Get

What you see is what you get was a way to distinguish between word processors and layout software that used ‘markup’ and those that allowed you to interactively manipulate your documents in an on-screen representation of what the final output would look like. HTML editors started out as mark-up aids and rapidly moved to a what-you-see-is-what-you-get approach where the visual representation was the final product.

Unfortunately, the final product as presented to the user these days has another aspect to it that means that what you see really is no longer what you get, you get much more than what you see and not everything that you get is desirable or even good.

I’ve been on the web since the very early days, think Peter Tattam’s Trumpet Winsock and the Mosaic web browser. In the beginning the web was mostly about information, handcrafted pages with text on them and maybe an image every now and then. Eye candy was few and far between, for the most part we were happy that stuff worked, what it looked like was secondary. A page would simply be a chunk of data that you received through the internet from a repository someone else set up called a web server and that would be that. Most pages were entirely self-contained and if they contained images those images were for the most part stored on the same server that you got the page from.

As the web became more and more successful it achieved critical mass at some point in 1995 or so and suddenly everybody wanted to be on the web, it was the year of the homepage. Still, most of these pages were simple, they contained some text, every now and then you’d see a form but that form was utilitarian and simply conveyed a bit of information on a voluntary basis to the owner of the page, nothing got sent if you didn’t click ‘submit’ beyond your previous request for the page and its content. By and large content was still supplied in as a page with some extra resources specifically crafted for that page, embedding - if it was done at all - was considered poor taste at best and bandwidth theft at worst.

That phase ended dramatically when in 98 or so a new trick allowed websites to programmatically alter the page that you were looking at already by making requests behind the scenes. This allowed all kinds of neat functionality which eventually led to the ‘single page web applications’ or ‘web apps’ that are now commonplace. In and of itself this is simply technology being used as a medium to deliver functionality and I don’t have a problem with that.

What I do have a problem with is that same technology being used to stealthily take over what seem to be otherwise innocent pages with information by embedding all manner of hostile programs (none of which are verified by the owners of the pages that embed them!). These programs take the form of ‘widgets’ and there are an endless number of them, very few of them are actually needed if you take into account the richness of the html/css/js combo already present on the page. For instance, take twitter. For many years twitter grew because of a thing called ‘rss feeds’. Twitter decided that rss feeds are a ‘hole’ in their architecture which allowed blow owners and others to embed and/or read the tweets of their users without them being in control of the data and more importantly, without having access to the consumers of that data.

For companies like twitter such control is of such overriding concern that they’ll abandon perfectly good open standards to force-feed their proprietary and privacy invading solutions upon the readers of other people’s pages. In twitters collective mind (if you can speak about the mind of a company at all) that data is theirs, not yours and you don’t get to do with it whatever you want.

I object to that, and strongly so. The RSS feeds allowed creative re-use of that data (If This Then That for instance) in ways unforeseen by the people at twitter. RSS was simple, open and easy to use but it wasn’t controllable and so it had to go.

I’ve singled out twitter here because I’ve had to jump through considerable hoops to get the ‘recent tweets’ bar on the right of this blog post to work without any calls to twitter that exposes your (the reader of this page) data to a company with whom you may not be doing business and who have no credible reason to want to know what you are reading.

Because that’s what all those widgets are about: building up profiles about users. What websites and which pages within those websites you visit, which articles you read and the words in those articles are major building blocks for profiles that allow targeted advertising, and targeted advertising is worth lots of money. So all those widgets that a webpage will load are eventually there with only one reason: to transport money from your pocket to other (presumably larger) pockets.

Advertising is both a blessing and curse. It allows customers to find out about new products and services, and it can help pay for content. For many years advertising was a dog on a leash, it was useful and it allowed services to be available for free or at a price-point that we all could afford in exchange for a minor slice of our time or attention.

But advertisers and their customers weren’t content with that. The dog slipped its leash and now we have to deal with continuous and total invasion of our privacy and with companies lilke Google, Twitter, Facebook, Yahoo and so on building ever larger datastores, effectively these companies know more about you than you probably know (or would like to admit) about yourself. Advertising, instead of being an accessory and a means to an end has come to dominate the web to the point where even your phone mostly exists to deliver targeted ads rather than useful services to your eyeballs. And all that goes for companies that you might not even have accounts with, for instance, Facebook is perfectly capable of building up a profile on a user even if that user doesn’t actually have a facebook page, with a bit of work they could probably figure out where in their user-graph that particular non-facebook user should be placed. After all, you only need 33 Bits of Information to uniquely identify a person. A couple of days of browsing gives you a very large multiple of that, after which correlating that information with the people that are in their database gives Facebook ample traction to associate you with other profiles and to figure out your relation to them. Even better if someone helpfully tags a photo that has you in it, and either identifies you or leaves your head ‘untagged’.

Widgets are an incredibly important tool in those data collection and profile building schemes. I’d love for browsers to come with an option defaulting to ‘true’ that would simply stop javascripts being loaded from external websites. Until then we have the NoScript browser extension I guess but the fact that it has to be installed drops the number of users down to single digit percentages. This is the reason google makes a browser in the first place, think about it: how suspect is it that what effectively is the worlds largest advertising agency gives you a free TV?

If you’re using Ghostery or some other ad/tracker blocker you can see how many of these externally loaded javascript programs are embedded in the various websites that you visit. The results are sometimes amazing, on some pages more than 20 different companies are being told about your visit to that page. And all of that normally totally invisible to you, the user that this is all about.

During the migration of this blog from Octopress to Hugo (a lightning fast static site generator written in ‘go’) I had a good opportunity to get rid of all those embedded bits and pieces which should kill several birds with one stone:

  • no tracking of who visits this site
  • faster page load times
  • less chance of malware slipping in under the radar

So, I don’t want your widgets, twitter, even though I like and appreciate your service. Killing off the RSS feeds was a nasty move, the web was supposed to be open, not made out of silos of other peoples’ data. And google analytics can take a hike too. Fonts can be hosted locally just as easy as they can be pulled in from a CDN, maybe at the expense of a tiny bit of bandwidth. And if you want to share this page on facebook you’ll have to cut-and-paste the url, and if that’s too much trouble then I’m ok with that too.

So, on these pages: What you see is what you get!