The other side of the moon

[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Wednesday, May 14, 2025

Ansible: Extracting multiple attributes from a list of dicts

I've been writing a bunch of ansible playbooks, and in one case I had to transform a list of dicts to extract two attributes from each dict and create a new list of dicts. i.e., given a list like this:

entities:
  - id: 123
    label: Label 1
    type: foo
    status: enabled
  - id: 234
    label: Label 2
    type: foo
    status: enabled
  - id: 345
    label: Label 3
    type: bar
    status: enabled

I need to transform it into this:

entities:
  - id: 123
    type: foo
  - id: 234
    type: foo
  - id: 345
    type: bar

I found the examples in the ansible docs to be very limited. In most cases there are no examples that show the use of additional parameters to filters. I defintely couldn't find anything that would let me extract two attributes from a list of dicts. There are examples to extract a single element using map(attribute='xxx'), but nothing to extract more than one attribute, so I had to come up with something of my own.

I ended up with two possible solutions depending on how much flexibility you have in your playbook.

1. Using loop

The easier option is to use loop and construct the new list one element at a time. You can do this if you have the option to use a set_fact block separate from where you need to use the variable.

- set_fact:
    entities_transformed: '{{ entities_transformed|d([]) + [{"id": item.id, "type": item.type}] }}'
  loop: '{{ entities }}'

This set_fact block creates a new fact called entities_transformed by repeatedly appending each transformed element to a new list.

2. As a one liner

If you need to write it all as a one liner without a set_fact, then this second approach works for you.

  entities_transformed: '{{ entities
                            | map("dict2items")
                            | map("selectattr", "key", "in", ["id", "type"])
                            | map("items2dict") }}'

This works in multiple steps and I'll explain each with what the output looks like at that stage.

map("dict2items")

This transforms the entities list into the following:

  - - key: id
      value: 123
    - key: label
      value: Label 1
    - key: type
      value: foo
    - key: status
      value: enabled
  - - key: id
      value: 234
    - key: label
      value: Label 2
    - key: type
      value: foo
    - key: status
      value: enabled
  - - key: id
      value: 345
    - key: label
      value: Label 3
    - key: type
      value: bar
    - key: status
      value: enabled
map("selectattr", "key", "in", ["id", "type"])

This strips down to the required keys:

  - - key: id
      value: 123
    - key: type
      value: foo
  - - key: id
      value: 234
    - key: type
      value: foo
  - - key: id
      value: 345
    - key: type
      value: bar
map("items2dict")

This reverses the first step giving us the following:

  - id: 123
    type: foo
  - id: 234
    type: foo
  - id: 345
    type: bar

Both options work equally well, but I prefer the second because it avoids creating additional facts and requiring loops in places where I cannot use one.

Tuesday, April 22, 2025

Fixing a system without enough RAM for a text editor

Someone on quora asked why people still use editors like emacs and vim when more modern alternatives exist.

There were so many great answers that I didn't need to answer the original question. I couldn't possibly add more to the question of why emacs or vim. Instead, I was reminded of an experience where even emacs/vim weren't options.

Sometime in the mid to late '90s, I visited my sister at university. She was in a biology lab, and they had a single 80386 PC with DOS and Windows 3.1. The computer wouldn't start Windows and they didn't know why and they asked me if I could do anything.

Since I love debugging obscure problems like this, I decided to take a look. It turned out to be a simple case of there not being enough available RAM to start Windows. The box did however have 4MB of RAM, which should have been more than enough to start Windows... except, this was a 386, and for compatibility with older 8 bit software, RAM was split into Conventional memory (640KB), System ROM (640K-1M) and Extended memory (everything above 1MB), and this box wasn't configured to use extended memory (if you remember HIMEM.SYS).

To make matters worse, AUTOEXEC.BAT loaded a bunch of programs at startup that used up a bunch of RAM, which meant that I couldn't even start the basic EDIT program to edit AUTOEXEC.BAT or CONFIG.SYS.

My only option at that point was to fall back to the absolute basics.

COPY CON AUTOEXEC.BAT
COPY CON CONFIG.SYS

The equivalent on unix would be cat. COPY CON on MS-DOS stands for COPY what's on the CONSOLE (the keyboard in this case) to the destination file, overwriting it if it exists.

(See What is copy con? for details)

And I had to be really careful with what I typed because typing in the wrong thing would mean the system might not start up, and I didn't have a boot disk on me (remember I was just visiting with no plan of actually fixing a computer), and if I did have a boot disk, none of this would have been necessary.

Anyway, I managed to build a very basic AUTOEXEC.BAT and CONFIG.SYS from memory (though I cannot remember now what I put into them), which allowed me to reboot the machine with enough RAM to start EDIT which allowed me to further edit the files to reboot with enough RAM to start Windows.

What I learnt from this is that no matter how good a system you may have access to, you need to be prepared to use the absolute minimum available tools. On DOS this was COPY CON. On unix over a slow or lossy network, you might actually have to edit a file by sending single lines of sed. In order to be prepared to do this, you need to do this a lot. It turns out that vim and emacs are really just one step above sed (well technically one step above ed which is half a step above sed) although they are extendable to have all the features of Eclipse or Visual Studio if you like, but even without those extensions, they are far more powerful.

Even while working with Eclipse, I'll find that there are times when I need to quit Eclipse, open my files in Vim, run a few commands to do things that would take me ages to do in Eclipse and then return to Eclipse. I need to use Eclipse because that's what our dev team has standardized on, and it makes it easier when screen sharing with other devs.

If you liked this post, there's a far more fun video of how the JPL team debugged and fixed an issue 15 billion miles away on Voyager 1.

Monday, March 17, 2025

On Migrating Character Encodings

Several discussions I've had with friends and colleagues recently reminded me of an incident we faced several years ago at Yahoo!

Now Yahoo! as a company was made up of many different local offices around the world, each responsible for content in their locale. Since there was a lot of user generated content, this meant users in a particular locale could easily enter content (blog posts, restaurant reviews, etc.) in their local language script.

Everyone was happy!

From about 2005 onwards, the company was looking to unify some of the platforms used around the world. For example, we had something like 4 or 5 different platforms to do ratings and reviews and it didn't make sense to have different architectures, database layouts, BCP setups, and a separate team managing each of these, so we started unifying. Building a common architecture was the easy part. I worked on several of these projects. Getting front end teams to migrate was also not terribly hard. Migrating content though, was tough because each region had content in their own locale and MySQL didn't let you set multiple character encodings on text columns.

So the i18n team started working with teams across Y! to move everything to utf8. The easy part was changing HTTP headers and <meta> tags. Content was a little harder, but doable with iconv(1) since in most cases we knew the source character encoding and the destination was always utf-8. In some cases we had to guess, but it generally worked...

...until at one point we also decided to do it for authentication.

One of the things that was localized was authentication, because it allowed users in, for example, South Korea, to use Hangul characters in their passwords. Usernames were always restricted to just alphanumeric characters and underscores (If I remember correctly).

Passwords are stored, as they should be, salted and hashed, so the character encoding of the database column was always us-ascii, which is compatible with utf-8, so no biggie..., except the input character encoding used by the browser was based on the HTTP headers or META tags of the page, and the transfer encoding was based on the enctype of the login FORM.

Prior to this move, these were all set to a character encoding that made sense locally, so Korea used EUC-KR and China used Big5, and so the hashed passwords used the byte sequences that resulted from treating the input as one of these encodings.

After the move, the user would still type in the same password, but when we converted them to bytes, we used utf-8, which resulted in a different byte sequence than the original encoding, so hashing this new sequence of bytes resulted in a different hash, and users could no longer log in. Well, only users that used non-ASCII characters in their passwords.

I forget what the actual fix was, but there were several options on the table. One was to revert the character encoding changes on the login page and to re-encode all passwords after a successful login. Another was to generate two hashes, one using utf-8 and another using the pre-migration character encoding for the region and to allow a success on either to go through.

...===...