[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Monday, August 30, 2021

The metrics game

A recent tweet by Punit Sethi about a Wordpress plugin that reduces Largest Contentful Paint (LCP) without actually improving user experience led to a discussion about faking/gaming metrics.

Core Web Vitals

Google recently started using the LCP and other Core Web Vitals (aka CWV) as a signal for ranking search results. Google's goal in using CWV as a ranking signal is to make the web better for end users. The understanding is that these metrics (Input delays, Layout shift, and Contentful paints) reflect the end user experience, so sites with good CWV scores should (in theory) be better for users... reducing wait time, frustration, and annoyance with the web.

If I've learnt anything over the last 20 years of working with the web, it's that getting to the top of a Google search result page (SRP) is a major goal for most site owners, so metrics that affect that ranking tend to be researched a lot. The LCP is no different, and the result often shows up in such "quick fix" plugins that Punit discusses above. Web performance (Page Load Time) was only ever spoken about as a sub-topic in highly technical spaces until Google decided to start using it as a signal for page ranking, and then suddenly everyone wanted to make their sites faster.

My background in performance

I started working with web performance in the mid 2000s at Yahoo!. We had amazing Frontend Engineering experts at Yahoo!, and for the first time, engineering processes on the front-end were as strong as the back-end. In many cases we had to be far more disciplined, because Frontend Engineers do not have the luxury of their code being private and running on pre-selected hardware and software specs.

At the time, Yahoo! had a performance team of one person — Steve "Chief Performance Yahoo" Souders. He'd gotten a small piece of JavaScript to measure front-end performance onto the header of all pages by pretending it was an "Ad", and Ash Patel, who may have been an SVP at the time, started holding teams accountable for their performance.

Denial

Most sites' first reaction was to deny the results, showing scans from Keynote and Gomez, which at the time only synthetically measured load times from the perspective of well connected backbone agents, and were very far off from the numbers that roundtrip was showing.

The Wall of Shame

I wasn't working on any public facing properties, but became interested in Steve's work when he introduced the Wall of Fame/Shame (depending on which way you sorted it). It would periodically show up on the big screen at URLs (the Yahoo! cafeteria). Steve now had a team of 3 or 4, and somehow in late 2007 I managed to get myself transferred into this team.

The Wall of Shame showed a kind of stock-ticker like view where a site's current performance was compared against its performance from a week ago, and one day we saw a couple of sites (I won't mention them) jump from the worst position to the best! We quickly visited the sites and timed things with a stop-watch, but they didn't actually appear much faster. In many instances they might have even been slower. We started looking through the source and saw what was happening.

The sites had discovered AJAX!

Faking it

There was almost nothing loaded on the page before the onload event. The only content was some JavaScript that ran on onload and downloaded the framework and data for the rest of the site. Once loaded, it was a long-lived single page application with far fewer traditional page views.

Site owners argued that it would make the overall experience better, and they weren't intentionally trying to fake things. Unfortunately we had no way to actually measure this, so we added a way for them to call an API when their initial framework had completed loading. That way we'd get some data to trend over time.

At Yahoo! we had the option of speaking to every site builder and to work with them to make things better. Outside though, is a different matter.

Measuring Business Impact

Once we'd started LogNormal (and continuing with mPulse), and were serving multiple customers, it soon became clear that we'd need both business and engineering champions at each customer site. We needed to sell the business case for performance, but also make sure engineering used it for their benefit rather than gaming the metrics. We started correlating business metrics like revenue, conversions, and activity with performance. There is no cheap way to game these metrics because they depend on the behaviour of real users.

Sites that truly care about performance and the business impact of that performance, worked hard to make their sites faster.

This changed when Google started using speed as a ranking signal.

With this change, sites now had to serve two users, and when in conflict, Real Users lost out to Googlebot. After all, you can't serve real users if they can't see your site. Switching to CWV does not change the situation because things like Page Load Time, Largest Contentful Paint, and Layout Shift can all be faked or gamed by clever developers.

Ungameable Metrics

This brings us back to the metrics that we've seen couldn't be gamed. Things like time spent on a site, bounce rate, conversions, and revenue, are an indication of actual user behaviour. Users are only motivated by their ability to complete the task they set out to do, and using this as a ranking signal is probably a better idea.

Unfortunately, activity, conversions, and revenue are also fairly private corporate data. Leaking this data can affect stock prices and clue competitors in to how you're doing.

User frustration & CrUX

Now the goal of using these signals is to measure user frustration. Google Chrome periodically sends user interaction measurements back to their servers, collected as part of the Chrome User Experience report (CrUX). This includes things like the actual user experienced LCP, FID, and CLS In my opinion, it should also include measures like rage clicks, missed, and dead clicks, jank while scrolling, CPU busy-ness, battery drain, etc. Metrics that only come into play while a user is interacting with the site, and that affect or reflect how frustrating the experience may be.

It would also need to have buy-in from a few more browsers. Chrome has huge market share, but doesn't reflect the experience of all users. Data from mPulse shows that across websites, Chrome only makes up, on average, 44% of page loads. Edge and Safari (including mobile) also have a sizeable share. Heck, even IE has a 3% share on sites where it's still supported.

In the chart below, each box shows the distribution of a browser's traffic share across sites. The plot includes (in descending order of number of websites with sizeable traffic for that browser) Chrome, Edge, Mobile Safari, Chrome Mobile, Firefox, Safari, Samsung Internet, Chrome Mobile iOS, Google, IE, and Chrome Mobile WebView. Box Plot of browser share across websites.

It's unlikely that other browsers would trust Google with this raw information, so there probably needs to be an independent consortium that collects, anonymizes, and summarizes the data, and makes it available to any search provider.

Using something like the Frustration Index is another way to make it hard to fake ranking metrics without also accidentally making the user experience better.

Comparing these metrics with Googlebot's measures could hint at whether the metrics are being gamed or not, or perhaps it even lowers the weight of Googlebot's measures, restricting it only to pages that haven't received a critical mass of users.

We need to move the balance of ranking power back to the users whose experience matters!

Friday, August 06, 2021

Safely passing secrets to a RUN command in a Dockerfile

There may be cases where you need to pass in secrets to a RUN command in a Dockerfile, and it's very important that these secrets not be leaked into the environment or the image. In particular, these secrets should not be stored in the image (either on disk or in the environment, not even in intermediate layers), they should not show up when using docker history.

While working this topic, I found many blog posts that point to pieces that may be used, but nothing that pulls it all together, so I decided to write this post with everything I've found. I'll provide a list of references at the end.

In my case, I needed to temporarily pass a valid odbc.ini file to my Julia code so that I could build a SysImage with the appropriate database query and result parsing functions compiled. I did not want the odbc.ini file available in the image.

Step 0: Make sure you have docker > 18.09

docker version

You most likely have a new enough version of docker, but in the odd chance that you're running a version older than 18.09, please upgrade. My tests were run on 19.03 and 20.10.

Developing the Dockerfile:

Step 1: Specify the Dockerfile syntax

At the top of your Dockerfile (this has to be the absolute first line), add the following:

# syntax=docker/dockerfile:1

This tells docker build to use the latest 1.x version of the Dockerfile syntax.

There are various docs that specify using 1.2 or 1.0-experimental. These values were valid when the docs were written, but are dated at this point. Specifying version 1 tells docker build to use whatever is latest on the 1.x tree, so you can still use 1.3, 1.4, etc. Specifying 1.2 restricts it to the 1.2.x tree.

Step 2: Mount a secret file where you need it

At the RUN command where you need a secret, --mount it as follows:

RUN --mount=type=secret,id=mysecret,dst=/path/to/secret.key,uid=1000 your-command-here

There are a few things in here, which I'll explain one by one.

type=secret
This tells docker that we're mounting a secret file from the host (as opposed to a directory or something else)
id=mysecret
This is any string you'd like. It has to match the id passed in on the docker build command line
dst=/path/to/secret.key
This is where you'd like the secret file to be accessible. Any file already at this location will be temporarily hidden while the secret file is mounted, so it's safe to use a location that your code will expect at run time.
uid=1000
This is the userid that should own the file. This defaults to 0 (root), so is useful if your command runs as a different user. You can also specify a gid

The full list of supported parameters for secret mounts is available at the buildkit github page

You can add the same --mount at different locations in your Dockerfile, and with different dst and uid values. The file is mounted only for the duration of that RUN command and not persisted to any layers.

Running docker build

Step 3: Set the environment variable

This step is optional on newer versions of Docker.

Once you're ready to run docker build, tell docker to use BuildKit

DOCKER_BUILDKIT=1

You can either put this right before running the command, or export it into your shell.

Step 4: Run docker build with your secret file
docker build --secret id=mysecret,src=/full/path/to/secret.key .

It's important to note that tilde (~) expansion does not work here. You can use an absolute or relative path, but you cannot use expansion.

That's IT!!!

Jenkins

If you run your docker builds through jenkins, you'll need a few more steps. The bulk of it is documented in this Cloudbees-CI article about injecting secrets.

Once you've gotten your secret file into Jenkins, and bound it to an environment variable in your Build Environment, you have to update the docker build command to use this variable instead.

For example, if we bound the secret file to a variable called MYSECRETFILE, then we'd change our build command to:

docker build --secret id=mysecret,src=${MYSECRETFILE} .

References

These links were very useful in figuring out this solution.

...===...