The other side of the moon

[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Wednesday, February 17, 2021

Recovering from Big Sur upgrade snafu

Apple recently pushed out a new release of MacOS called Big Sur. Unfortunately, the upgrade process is problematic. Specifically, the upgrader does not check for the required disk space before starting the upgrade, and if the target system doesn't have enough disk space (35GB or so), then the upgrade fails partway through, leaving your system in a mostly unusable state.

This is what happened to me.

My environment

  • The system was a 13" Macbook with a 128GB SSD drive. 128 is pretty small and doesn't leave much space for too many large items.
  • The system had just a single user.
  • At the start of the upgrade, the system had about 13GB of free disk space (>10%).
  • Desktop, Documents and Photos were backed up to iCloud, but Downloads weren't, and some very large photos & videos had been removed from iCloud to save space there, so they only existed locally.

Prior discussion

Mr. Macintosh has published a very detailed explanation of the issue and various ways to get around it without any data loss. This is a very good article that got me very far in my investigation. I was lucky that the latest updates had been posted just a few hours before I hit the problem myself.

Unfortunately, none of the suggested fixes worked for me.

  1. I couldn't mount the drive in Target Disk Mode as my password wouldn't work (the password still worked when logging in locally, but that took me back to the upgrade loop).
  2. I couldn't start up the system in Recovery Mode as it wanted a password, but again, wouldn't accept the password (the same password that worked when fully booting up).
  3. I couldn't access the disk when booting from an external startup disk because of the same issue.

Many posts I found online seemed to suggest that a firmware password was required, but I'd never set this up.

Single User Mode

Eventually, what showed the most promise was booting into Single User Mode and then fiddling around with all available disk devices.

Password worked for Single User Mode
  1. To start up in Single User Mode, press Cmd+S when starting up until the Apple logo shows up.
  2. The system prompts you for a password, and my password did in fact work in this mode.
  3. After signing in, you're dropped into a unix shell.
  4. There's only a basic file system mounted, which contains a limited number of unix commands and none of your data
Mount the Data partition

Once in single user mode, I had to mount my data partition. I first used the mount command to see what was already mounted. It showed that the only mounted device was /dev/disk1s1. I assumed that my Data partition would be /dev/disk1s2 and that it would have the same filesystem, and I chose a convenient mount point:

# mount -t apfs /dev/disk1s2 /System/Volumes/Data

Miraculously, this did not ask me for a password, and mounted my Data partition. I was able to look through the files and identify potential targets to remove. I also noticed that the disk was no completely full (0 bytes free). This was due to the Big Sur installer, which took up 11GB, and then added a few files, using up the entire 13GB that I had available.

Things were getting a little cumbersome here as most of the unix commands I needed to use were not on the primary partition, but on the mounted partition, so I added the appropriate folders to the unix PATH environment variable:

PATH="$PATH":/System/Volumes/Data/usr/bin

I was starting to see that choosing a 3 level deep path as my mount point perhaps wasn't a great idea. I also learned that while the screen is quite wide, the terminal environment is set to show 80 columns of text, and goes into very weird line wrapping issues if you type past that. It's even worse if you try tab completion at this point.

Transferring large files

Some of the large files & folders I identified were downloaded packages that could be removed. Unfortunately this only got me 2GB back. To get enough space back, I'd have to remove some photos and videos that weren't stored on iCloud. I figured I'd copy them over to an SD card and then could delete them.

I popped in the SD Card, and the kernel displayed some debug messages on the terminal. It told me that the card was in /dev/disk4, so I tried mounting that at a random empty directory:

# mount -t exfat /dev/disk4 /System/VM

This did not work!

No SD Cards in Single User Mode

By default, SD Cards are formatted with an EXFAT file system (the kind used by Windows and all digital cameras). Unfortunately, you cannot mount an EXFAT filesystem in Single User Mode as the exfatfs driver isn't compiled into the kernel. It's loaded up as a dynamic module when required. This only works when booting in standard mode with a kernel that allows dynamic loading. Single User Mode does not.

Reformat the SD Card

This was a brand new SD Card, so I decided to reformat it as an Apple file system. I used a different Macbook to do this, however my first attempt didn't work. It isn't sufficient to just format the SD Card, you also need to partition it, and that's where the filesystem is created.

I created a single APFS partition across the entire SD Card and then tried mounting it.

Unfortunately, now it was no longer at /dev/disk4 even though that's what the kernel debug messages said. Looking at /dev/disk* showed me that /dev/disk5s1 was a potential candidate.

# mount -t apfs /dev/disk5s1 /System/VM

Finally, this worked. I was able to copy my files over, and remove them from the Data partition. This freed up about 45GB, which allowed me to continue with the upgrade.

After the upgrade completed, I appear to have 75GB free. I haven't had a chance to check where the space has changed. I also plan to permanently use the SD Card (256GB) as an external hard drive.

Wednesday, November 18, 2020

Understanding Emotion for Happy Users

How does your site make your users feel?

Introduction

So you’ve come here for a post about performance, but here I am talking about emotion… what gives? I hope that if you haven’t already, then as this post progresses, you’ll see that performance and emotion are closely intertwined.

While we may be web builders, our goal is to run a business that provides services or products to real people. The website we build is a means of connecting people to that service or product.

The way things are…

The art and science of measuring the effects of signal latency on real users is now about 250 years old. We now call this Real User Measurement, or RUM for short, and it’s come a long way since Steve Souders’ early work at Yahoo.

Browsers now provide us with many APIs to fetch performance metrics that help site owners make sites faster. Concurrently, the Core Web Vitals initiative from Google helps identify metrics that most affect the user experience.

These metrics, while useful operationally, don’t give us a clear picture of the user experience, or why we need to optimise them for our site in particular. They don’t answer the business or human questions of, “Why should we invest in web performance?” (v/s for example, a feature that customers really want), or even more specifically, “What should we work on first?”.

Andy Davies recently published a post about the link between site speed and business outcomes…

Context influences experience,
Experience influences behaviour,
Behaviour influences business outcomes.

All of the metrics we collect and optimise for deal with context, and we spend very little time measuring and optimising the rest of the flow.

Switching Hats

Over the last decade working on boomerang and mPulse, we slowly came to the realisation that we’ve been approaching performance metrics from a developer centric view. We’d been drawing on our experience as developers – users who have browser dev tools shortcuts committed to muscle memory. We were measuring and optimising the metrics that were useful and easy to collect from a developer’s point of view.

Once we switched hats to draw on our experiences as consumers of the web, the metrics that really matter became clearer. We started asking better questions...

  • What does it mean that performance improved by 100ms?
  • Are all 100ms the same?
  • Do all users perceive time the same way?
  • Is performance all that matters?

In this post, we’ll talk about measuring user experience and its effects on behaviour, what we can infer from that behaviour, and how it affects business outcomes.

Delight & Frustration

In Group Psychology and the Analysis of Ego, Freud notes that “Frustration occurs when there is an inhibiting condition that interferes with or stops the realization of a goal.”

Users visit our sites to accomplish a goal. Perhaps they’re doing research to act on later, perhaps they want to buy something, perhaps they’re looking to share an article they read a few days ago.

Anything that slows down or prevents the user from accomplishing this goal can cause frustration. On the other hand, making their goal easy to find and achieve can be delightful.

How a user feels when using our site affects whether they’ll come back and “convert” into customers (however you may define convert).

The Link Between Latency & Frustration

In 2013, Tammy Everts and her team at Radware ran a usability lab experiment. The study hooked participants up to EEG devices, and asked them to shop on certain websites. Half the users had an artificial delay added to their browsing experience and neither group were made aware of the performance changes. They all believed they were testing the usability of the sites. The study showed that...

A 500ms connection speed delay resulted in up to a 26% increase in peak frustration and up to an 8% decrease in engagement.

Similarly in 2015, Ericsson ConsumerLab neuro research studied the effects of delayed web pages on mobile users and found that “Delayed web pages caused a 38% rise in mobile users' heart rates — equivalent to the anxiety of watching a horror movie alone.”

This may not be everyone’s cup of tea, and the real implication is that users make a conscious or unconscious decision on whether to stick around, return, or leave the site.

Cognitive Bias

Various cognitive biases affect how individual experiences affect perception and behaviour. Understanding these biases, and intervening when an experience tends negative can improve the overall experience.

Perceptual Dissonance

Also known as Sensory Dissonance, Perceptual Dissonance results from unexpected outcomes of common actions.

The brain’s predictive coding is what helps you do things like “figure out if a car coming down the road is going slow enough for you to cross safely”. A perceptive violation of this coding is useful in that it helps us learn new things, but if that violation breaks long standing “truths”, or if violations are inconsistent, it makes learning impossible, and leads to psychological stress, and frustration.

On the web, users expect websites to behave in a certain way. Links should be clickable, sites should in general scroll vertically, etc. Things like jank while scrolling, nothing happening when a user clicks a link (dead clicks), or a click target moving as the user attempts to click on it (layout shift) causes perceptual dissonance and frustration.

If these bad experiences are consistent, then users come to expect them. Our data shows that users from geographies where the internet is slower than average tend to be more patient with web page loads.

Survivorship Bias

We only measure users who can reach our site. For some users, a very slow experience is better than an unreachable site.

In 2012, after Youtube made their site lighter, Chris Zakariahs found that aggregate performance had gotten worse. On delving into the data, they found that new users who were previously unable to access the site were now coming in at the long tail. The site appeared slower in aggregate, but the number of users who could use it had gone up.

Negativity Bias

Users are more likely to remember and talk to their friends about their bad experiences with a site than they are about the good ones. We need only run a twitter search for “$BRAND_NAME slow” to see complaints about bad experiences.

Bad experiences are also perceived to be far more intense than equivalent good experiences. To end up with a neutral overall experience, bad experiences need to be balanced with more intense good experiences. A single bad experience over the course of the session makes it harder to result in overall delight.

Active Listening

Research shows that practicing Active Listening can have a big impact on countering Negativity Bias. Simply acknowledging when you’ve screwed up and didn’t meet the user’s expectations can alleviate negative perception. If we detect, via JavaScript, that the page is taking too long to transition between loading states, we could perhaps display a message that acknowledges and apologizes for things going slower than expected.

Hey, we realise that it’s taking a little longer than expected to get to what you want. You deserve better. We’re sorry and hope you’ll stick around a bit.

Users will be more forgiving if their pain is acknowledged.

Measuring Emotion

There are many ways we could measure the emotional state of users using our site. These range from active engagement to completely creepy. Naturally not all of these will be applicable for websites...

  • Use affective computing (facial analysis, EEGs, pulse tracking, etc.)
  • Ask the user via a survey popover
  • Business outcomes of behaviour
  • Behavioural analysis
Affective Computing

For website owners, affective computing isn’t really in play. Things like eye tracking, wireless brain interfaces, and other affective computing methodologies are too intrusive. They work well in a lab environment where users consent to this kind of tracking and can be hooked up to measurement devices. This is both inconvenient, and creepy to run on the web.

Ask the user

Asking the user can be effective as shown by a recent study from Wikipedia. The study used a very simple Yes/No/No Comment style dialog with randomized order. They found that users’ perceived quality of experience is inversely proportional to median load time. A 4% temporary improvement to page load time resulted in an equally temporary 1% extra satisfied users.

Area chart of two timeseries: Median loadEventEnd, and Satisfaction Ratio (positive/total). Time axis covers 1 year from Oct 2019 to Oct 2020. More details in the text preceding this image.

This method requires active engagement by the user and suffers from selection bias and the hawthorne effect.

It’s hard to quantify what kinds of experiences would reduce the effects of selection bias and result in users choosing to answer the survey, or how you’d want to design the popover to increase self-selection.

The Hawthorne effect, on the other hand, suggests that individuals change the way they react to stimuli if they know they’re being measured or observed.

Business Outcomes

Measuring business outcomes is necessary but it can be hard to identify what context resulted in an outcome. One needs to first understand the intermediate steps of experience and behaviour. Did a user bounce because the experience was bad, or did they just drop in to do some research and will return later to complete a purchase?

Behavioural analysis

Applying the results of lab based research to users actively using a website can help tie experience to behaviour. We first need to introduce some new terms that we’ll define in the paragraphs that follow.

Rage Clicks, Wild Mouse, Scrandom, and Backtracking are behavioural signals we can use. In conjunction with when in a page’s life cycle users typically expect different events to take place, they can paint a picture of user expectations and behaviour.

Correlating these metrics with contextual metrics like Core Web Vitals on one hand, and business outcomes on the other can help us tell a more complete story of which performance metrics we should care about and why.

Rage, Frustration & Confusion

To measure Rage, Frustration & Confusion, we look at Rage Clicks, Wild Mouse and Backtracking.

Rage Clicks

Rage Clicks occur when users rapid-fire click on your site. It is the digital equivalent of cursing to release frustration. We’ve probably all caught ourselves rage clicking at some point. Click once, nothing happens, click again, still nothing, and then on and on. This could be a result of interaction delays, or of users expecting something to be clickable when it isn't.

Rage clicks can be measured easily and non-intrusively, and are easy to analyse.

Fullstory has some great resources around Rage Clicks.

Wild Mouse

Research shows that people who are angry are more likely to use the mouse in a jerky and sudden, but surprisingly slow fashion.

People who feel frustrated, confused or sad are less precise in their mouse movements and move it at different speeds.

There are several expected mouse movements while a user traverses a website. Horizontal and vertical reading patterns are expected and suggest that the user is engaged in your content.

On the other hand, random patterns, or jumping between options in a form can suggest confusion, doubt, and frustration. See Churruca, 2011 for the full study.

The JavaScript library Dawdle.js can help classify these mouse patterns.

Scrandom

Scrandom is the act of randomly scrolling the page up and down with no particular scroll target. This can indicate that a user is unsure of the content, the page is too long, or is waiting for something to happen and making sure that the page is still responsive without accidentally clicking anything.

Backtracking

Backtracking is the process of hitting the back button on the web. Users who are confused or lost on your site may hit the back button often to get back to a safe space. This behaviour may manifest itself in different ways, but can often be identified with very long sessions that appear to loop.

Tie this into the Page Load Timeline

In his post on Web Page Usability, Addy Osmani states that loading a page is a progressive journey with four key moments to it: Is it happening? Is it useful? Is it usable? and Is it delightful? And he includes this handy graphic to explain it:

When did the user feel they could interact? When could they interact? Speed metrics illustrate First Paint, First Contentful Paint, Time to Interactive for a page

The first three are fairly objective. With only minor differences between browsers, it’s straightforward to pull this information out of standard APIs, and possibly supplement it with custom APIs like User Timing.

We’ve found that over 65% of users expect a site to be usable after elements have started becoming visible but before it is actually Interactive. Contrast that with 30% who will wait until after the onload event has fired.

Correlating Rage with Loading Events

Comparing the points in time when users rage click with the loading timeline above, we see some patterns.

Relative time series showing the intensity of rage clicks tied to when users first interact with a page relative to page load. We also include the First Input Delay as a separate series, and show 25th-75th percentile bands for the First Paint, Largest Contentful Paint, Visually Ready, and Interactive times relative to Page Load.
The horizontal axis on this chart is time as a relative percent of the full page load time. -50 indicates half of the page load time while +50 is 1.5x the page load time. The vertical axis indicates intensity of rage while point radius indicates probability of rage clicks at that time point. The coloured bars indicate 25th to 75th percentile ranges for the particular timer relative to full page load with the line going through indicating the median.

We see a large amount of rage between content becoming visible and the page becoming interactive. Users expect to be able to interact with the page soon after content becomes visible, and if that expectation isn’t met, it results in rage clicking.

We also see a small stream of rage clicks after the page has completed loading, caused by interaction delays.

There’s a small gap just before the onload event fires. The onload event is when many JavaScript event handlers run, which in turn result in Long Tasks, and increased Interaction Delays. What we’re seeing here is not the absence of any interaction, but survivorship bias where the interactions that happen at that time aren’t captured until later.

The horizontal axis on this chart is relative time along the page load timeline. We looked at various combinations of absolute and relative time across multiple timers, and it was clear that relativity is a stronger model, which brings us to a new metric based on relative timers...

Frustration Index

The frustration index, developed by Tim Vereecke, is a measure based on the relation between loading phases. We’ve seen that once one event occurs, users expect the next to happen within a certain amount of time. If we miss that expectation, the user's perception is that something is stopping or inhibiting their ability to complete their task, resulting in frustration.

The Frustration Index encapsulates that relationship. The formula we use is constantly under development as research brings new things to light, but it’s helpful to visit the website to understand exactly how it works and see some examples.

So how do we know that this is a good metric to study?

Correlating Rage & Frustration

It turns out that there is a strong correlation (ρ=0.91) between the intensity of rage (vertical axis) that a user expresses and the calculated frustration index (horizontal axis) of the page.

Scatter Plot showing Frustration Index on the horizontal axis and intensity of rage clicks on the vertical axis. The two variables have a pearson's correlation coefficient of 0.91.

Rather than looking at individual timers for optimization, it is better to consider all timers in cohesion. Improving one of them changes the user’s expectation of when other events should happen and missing that expectation results in frustration.

However, further to this, the formula is something we can apply client-side to determine if we’re meeting expectations, and practice active listening if we’re not.

Correlating Frustration & Business Outcomes

Looking at the correlation between Frustration Index and certain business metrics also shows a pattern.

Double Scatter Plot showing Frustration Index on the horizontal axis and bounce rate on the first vertical axis and average session duration in minutes on the second.
  • Bounce Rate is proportional to the frustration index with a sharp incline around what we call the LD50 point (for this particular site). ρb=0.65
  • Average Time spent on the site goes down as frustration increases, again sharply at first and then tapering off. ρt=-0.49
LD50

The LD50, or Median Lethal Dose is a term borrowed from the biological sciences. Buddy Brewer first applied the term to web performance in 2012, and we’ve been using it ever since.

In biology, it’s the dosage of a toxin that kills off 50% of the sample, be it tumour cells, or mice.

On the web, we think of it more in terms of when 50% of users decide not to move on in their journey. We could apply it to bounce rate, or retention rate, or any other rate that’s important to your site, and the “dose”, may be a timer value, or frustration index, or anything else. Depending on the range of the metric in question, we may also use a percentile other than the median, for example, LD25 or LD75.

This isn’t a single magic number that works for all websites. It isn’t even a single number that works for all pages on a site or for all users. Different pages and sites have different levels of importance to a user, and a user’s emotional state, or even the state of their device (eg: low battery), when they visit your site can affect how patient they are.

Column chart showing the LD25 frustration index value for users from different Geos: US:26, Germany:10, Japan:18, Australia:42, Canada:44.
Patience is also a Cultural Thing

People from different parts of the world have a different threshold for frustration.

Many of our customers have international audiences and they have separate sites customized for each locale. We find that users from different global regions have different expectations of how fast a site should be.

In this chart, looking at 5 high GDP countries (that we have data for), we see a wide distribution in LD25 value across them, ranging from a value of 10 for Germany to the 40s for Australia and Canada. It’s not shown in this chart, but the difference is even wider when we look at LD50, with Germany at 14 and Canada at 100.

So how fast should our site be?

We’ve heard a lot about how our site’s performance affects the user experience, and consequently how people feel when using our site. We’ve seen how the “feel” of a site can affect the business, but what does all of that tell us about how to build our sites?

  • How fast should we be to reduce frustration?
  • What should we be considering in our performance budgets?
  • How do we leave our users feeling happy?

I think these may be secondary questions…

A better question to start with, is:

Will adding a new feature delight or frustrate the user?

References

Acknowledgements

Thanks to Andy Davies, Nic Jansma, Paul Calvano, Tim Vereecke, and Cliff Crocker for feedback on an earlier draft of this post.

Thanks also to the innumerable practitioners whose research I've built upon to get here including Addy Osmani, Andy Davies, Gilles Dubuc, Lara Hogan, Nicole Sullivan, Silvana Churruca, Simon Hearne, Tammy Everts, Tim Kadlec, Tim Vereecke, the folks from Fullstory, and many others that I'm sure I've missed.

Monday, October 07, 2019

Implementing Spearman's Rank Correlation in SQL

In my last post, I showed how to implement Pearson's Correlation as an SQL Window function with window frame support. In this post, I'll follow up with implementing Spearman's Rank correlation co-efficient in SQL.

While Pearson's correlation looks for linear relationships between two vectors (ie, you wouldn't use it for exponential relationships), Spearman's rank correlation looks for monotonicity, or in plain english, do the two values go up & down together?

So here's the really cool part. Spearman's Rank correlation co-efficient is the Pearson's correlation co-efficient of the ranks of the two vectors. We already know how to calculate Pearson's correlation co-efficient, so what we need to do here is first calculate ranks of our vectors.

We can do this using the SQL RANK function, which also works as a window function with window frame support:

RANK() OVER (PARTITION BY <partition cols> ORDER BY x ASC) as R_X,

RANK() OVER (PARTITION BY <partition cols> ORDER BY y ASC) as R_Y,

The two important things to note here are that RANK() does not take a parameter, instead you specify what you want to rank on in the ORDER BY clause, and secondly, make sure both parameters are ordered in the same direction, ASC or DESC.

Now even though the RANK() function supports window frames, you don't want to use them here. This is so because if you're using sliding windows, each row will have a different rank depending on the window, and we won't be able to correlate an outer window.

Once we have the ranks in an inner query, we can run either the standard CORR function, or the windowed CORR that we developed in the previous post on these derived columns instead:

SELECT CORR(R_X, R_Y) FROM (
    SELECT
        RANK() OVER (PARTITION BY <partition cols> ORDER BY x ASC) as R_X,

        RANK() OVER (PARTITION BY <partition cols> ORDER BY y ASC) as R_Y
      FROM ...
)

If implementing this as a window function, then use R_X and R_Y as the inputs to the SUM() functions with an additional nested query.

I hope this was helpful, leave a comment or tweet @bluesmoon if you'd like to chat.

...===...