I was surprised that there are actually some quite interesting views and proposals in the discussions above.
If you liked this, you are probably going to like my article about web bloat.
Credit card companies should know all about phishing, right? McCann should know all about marketing, right? Combine the two in Serbia and you will get a marketing campaign that just went viral, although for the wrong reasons.
Mastercard Serbia organised a prize contest “Always with you” that asks female customers to share contents of their purse on Facebook. If you read the text carefully, it is not required to photo your card. However, the example photo clearly shows the credit card details of a fictive customer:
Lured by prizes, many customers posted photos of their private stuff. And some copied Mastercard promo — their credit card, with full details, is visible in the photo:
This is the first phishing campaign that I know that was organised by credit company itself!
The funny thing that is that nobody in Mastercard, McCann agency or legal team noticed the problem. There is a lengthy legal document explaining the conditions of the prize contest:
That document is signed by Mastercard Europe SA and McCann Ltd Belgrade, so it seems it has passed multiple levels of corporate approval. And Mastercard didn’t seem to notice the problem until six days later when a serbian security blogger wrote about it.
In my modest opinion, the lesson of this story is to be careful how you hire. I am biased because I run an employee assessment company, but smiling people with lovely résumés can still be bozos. And when you have incompetent people in the company, it doesn’t matter what formal company procedures you have in place.
P.S. As user edent from HN noticed, photo sharing of credit cards is nothing uncommon for Twitter: https://twitter.com/needadebitcard
P.P.S. As of today (May 18), entire “Always with you” campaign is deleted from Facebook.
Today, a friend sent me a funny Google job posting. Here is the highlight:
10 years of sales experience? Amazon EC2 (IaaS) only came out of beta in Oct 2008, Google App Engine (PaaS) only had a limited release in Apr 2008. Now is Feb 2017, so even if you got started selling EC2 or App Engine from the very first day, you would only have 8 years of experience.
I know you are Google, but it is a bit too high of a bar. You still haven’t invented the time machine.
After many months in the making, today we are happy to announce v1 of PXT Protocol (MIT license). This is a big thing for our small team, as we aim to provide an alternative to HTTP/HTML.
Before I dive into technical details of our unconventional approach, I must explain the rationale. Bear with me.
Notice the exponential growth. As of July 2016, the average web page is 2468 kB in size and requires 143 requests.
But computers and bandwidth are also getting exponentially faster, so what’s the problem?
Web bloat creates four “S” problems:
One can say “Problems, schmoblems! We had problems like this in the past, and we lived with them. The average web page will continue to grow.”
No, it will not. Because there is a magic limit—let’s call it the bloat inflection point:
For pages that are small and non-bloated (most pre-2010 pages), PXT only solves security and support problems. But today’s average web page will also gain big size and speed improvements. The Internet passed the bloat inflection point early this year, and nobody noticed.
Responsive webs just make it worse. The fashion now is to have one sentence per viewport and then a gigantic background image behind it.
What if a client-specific page was rendered on a server, and then
streamed to a “dumb browser” using the most efficient compression?
Like all great ideas, this sounds quite dumb. I mean, sending text as compressed images?! But I did a quick test…
Let me show you a simple non-PXT demo; you can follow it without installing any software.
The procedure is simple:
Bloat score (BS for short) is defined as:
BS = TotalPageSize / ImageSize
We can derive a nice rule from the bloat score:
You know your web is crap if the full image representation of the
page is smaller than the actual page (BS>1).
I expected some screenshots to beat full page loads, but I was wrong. Screenshots won in every case. See for yourself in the table below: Image columns contain links to comparison images.
(1366 x ?)
(1366 x ?)
(1366 x 768)
|Page||Size (kB)||# of req.||Image (kB)||BS||Image (kB)||BS||Image (kB)||BS|
Tags Slow Websites
|Vice Bootnet to|
|RTWeekly Future of|
|Betahaus Creative Problem Solving||5,100||55||3,670||1.4||871||5.9||393||13.0|
Which column should you look at? That is highly debatable:
So, depending on how aggressive you want to be with buffer size and compression, data saving for above pages varies from 3.6x to 51.7x!
But, to be honest, I cheated a bit. Images are static—the interaction part is missing. And you’ll notice in the table that I hand-picked bloated websites, they are all above average. What happens with normal websites?
For the simple interaction, let’s use a technology that’s been around since 1997. And works in IE! People drafting HTML 3.2 got annoyed with designers requesting a “designer” look and consistent display over browsers. Rounded rectangles and stuff. In a moment of despair they said f**k you, we’ll give you everything. Create a UI out of a image and then make arbitrary vector shapes over clickable areas. And so client image maps were born.
For an example of “normal” page, should we use a really popular page or a really optimized page? How about both—let’s use the most popular web page created by the smartest computer scientists: the Google SERP. SERPs are loaded over 3.5 billion times per day and they are perfect for optimization. SERPs have no images, just a logo and text. Unlike other pages, you know user behavior exactly: 76% of users click on the first five links. Fewer than 9% of users click on the next page or perform another search.
People always bash media sites for being bloated and flooded with ads. But Google SERPs increased in size from 10 kB in 1998 to 389 kB today. And content is pretty much the same, 10 links. Google.com is fast to load not because of optimization; it is fast because today you have a fast connection.
The image map for the SERP demo above has a fixed width and height, which is one of the reasons we need PXT. The first PXT request sends device viewport details, so the server knows which image to render.
But before we get into PXT, we need to ask ourselves a question…
Since the first computers were connected, there was a fight. Between the “thin” tribe and the “fat” tribe.
The thin tribe wanted to render everything on the source server and make the destination server a “dumb” terminal. Quick, simple, and zero dependency. But the fat tribe said no, it’s stupid to transfer every graphics element. Let’s make a “smart” client that executes rendering or part of the business logic on the destination server. Then you don’t need to transfer every graphics element, just the minimum data. The fat tribe always advertised three benefits of smart clients: smaller bandwidth, less latency, and that the client can render arbitrary stuff.
But, in the early days of computing, “graphics” was just plain text. Data was pretty much the same as its graphic representation, and people could live with a short latency after they pressed enter at a command line. The thin tribe won and the text terminal conquered the world. The peak of this era was the IBM mainframe, a server that can simultaneously serve thousands of clients thanks to its I/O processors. The fat tribe retreated, shaking its collective fist, saying, “Just you wait—one day graphics will come, and we’ll be back!”
They waited until the 80s. Graphics terminals become popular, but they were sluggish. Sending every line, color, or icon over the wire sucked up the bandwidth. When dragging and rearranging elements with the mouse, you could see the latency. Unlike simple text flow, graphics brought myriad screen resolutions, color depths, and DPI.
“We told you so!” said the fat tribe, and started creating smart client-server solutions. Client-servers and PCs were all the rage in the 80s. But even bigger things were on the horizon.
In 1989, a guy named Tim was thinking about how to create world wide web of information. He decided not to join the tribe but to go the middle route. His invention, HTML, would transfer only the semantic information, not the representation. You could override how fonts or colors looked in your client, to the joy of fat tribe. But for all relevant computing you would do a round trip to the server, to the delight of the thin tribe. Scrolling, resizing, and text selection were instantaneous: there was only a wait when you decided to go to the next page. Tim’s invention took the world by the storm. It was exactly the “graphics terminal” that nobody wished for but everybody needed. It was open and people started creating clients and adding more features.
The first candy was inline images. They required more bandwidth, but the designers promised to be careful and always embed the optimized thumbnail in the page. They also didn’t like the free floating text, so they started using tables to make fixed layouts.
Publishers wanted audio and video, and then they wanted ads.
Soon the web became a true fat client, and everybody liked it.
The thin tribe was acting like a crybaby: “You can’t have so many dependencies—the latest Java, latest Flash, latest Real media encoder, different styles for different browsers, it’s insane!” They went on to develop Remote desktop, Citrix XenDesktop, VNC, and other uncool technologies used by guys in grey suits. But they knew that adding crap to the client couldn’t last forever. And there is a fundamental problem with HTML…
Look at the homepages of Tim Berners-Lee, Bjarne Stroustrup, and Donald Knuth. All three together have 235 kB, less than one Google SERP. Images are optimized, most of the content is above the fold, and their pages were “responsive” two decades before responsive design became a thing. But they are all ugly. If the father of the WWW, the father of C++, and the father of computer algorithms were in an evening web development class, they would all get an F and be asked to do their homepages again.
The average Joe prefers form over content and is too lazy to write optimized code. And the average Joe includes me. A few months ago homepage of my previous startup become slightly slower. I opened the source HTML and found out that nine customer reference logos were embedded in full resolution, like this 150 kB monster. I asked a developer to optimize pages using css sprites. He complied with that, but told me he would leave 13 other requests for web chat unchanged, because they are async and provided by a third party (Olark). To be honest, I would behave the same if I were a web developer. Implementing customer features will bring us more money than implementing CSS sprites. And no web developer ever got a promotion because he spend the whole night tweaking JPEG compression from 15% to 24%. To summarize:
You can’t blame web developers for making a completely rational decision.
Web developers always get the blame for web bloat. But if a 2468 kB page weight is the average, not an exception, then it is a failure of the technology, not all the people who are using it.
The thin tribe realized that with a good compressor and good bandwidth the game changes. OnLive Game Service was launched in 2010, allowing you to stream games from the cloud. The next year, Gaikai launched their service for cloud gaming. They were not competitors for long: Sony purchased Gaikai in 2012, and all OnLive patents in 2015. They used the technology to create PlayStation Now. Today I can play more than 400 live games on Samsung Smart TV, at 30 frames per second. But I still need to wait 8.3 second to fully load the CNN homepage. Who is crazy here?
Remember main arguments of the fat tribe: smaller bandwidth, less latency, and that the client can render arbitrary stuff. Seems that with websites of 2016, thin tribe can do all of that equally good or better.
I want my web to be as snappy as PlayStation Now. That is why we need…
Which is short for PiXel Transfer protocol. Let’s see how the full stack works, all the way from a designer to an end user.
Specifically, this is how browsing happens:
Notice the heavy use of caching. If you have a page footer or logo, they are going to be transferred only once, as on the subsequent pages the server is going to send only the zone ID.
I know what you are thinking. This all looks nice for presentation, but the web is more than a display. Although it was loved by designers, one of the biggest flaws of Flash was that Flash indexing by web crawlers never worked well. So, what about the SEO?
The future of the search is optical recognition and deep learning. Google Drive has done OCR on PDF and images since 2010. Google Photos recognizes people and things, for example any bicycle in my personal photos. And YouTube does voice recognition over videos, so people can easily skip boring parts of my video. With the web becoming much more than text, why rely on text metadata at all?
With that final point, I invite you to check the PXT project page at GitHub.
UPDATE: Check the discussion on Reddit.
By definition, CAPTCHA should be easy to read by humans but hard to read by machines. Apparently, they don’t agree with that at D&B:
They put CAPTCHA in a plain HTML text, and then put an ugly background below so it can’t be read by humans.
I knew corporate developers can be of lower quality, but this is hilarious 🙂
UPDATE 2: Many people got offended, but I stick to my personal opinion: many corporations have recruitment practices that reject good programmers and attract bad ones. Good programmers don’t want to work in an environment where meetings require a tie, the development process is waterfall and the only way to increase your salary is to become a manager. But most of all, the screening of programmers should be done by technical department and not by corporate HR. And that is not hard, here you can create programming tests for Java and C# and send them to your candidates in less than 5 min.