Using iTerm Automatic Profile Switching to Make Fewer Mistakes In Production

Today I will tell you some stories of how I made mistakes in our production environment, and how I am trying to help prevent future mistakes using iTerm.

Mistakes were made

At work we are mid-journey to having more automation around our deployments, provisioning, backups, monitoring, and so forth. But at the moment, we have some things that are typically done manually. Within recent memory, I was SSHed into our QA (staging) box and for some reason wanted to rename the database. A few minutes later, someone came down and said “production’s down!” 1 (Production is the end-user visible environment, the one thing that we don’t want to be down.) I was thinking, “hmm, we haven’t changed anything recentl… wait, was I actually on the QA box?” Sure enough, what I renamed was the production database on the production environment! A minute later service was restored, but this was the most downtime this quarter during the day (a handful of minutes.)

As part of our postmortem on this issue, we identified that switching my terminal profile whenever I thought I would be in a production-like environment would be useful. For example, if I am going to be SSHing into a QA box, I might create a new profile that has a different background color. This would help disambiguate the two environments.

The other day after hours, I was switching back and forth between QA and production SSH environments to try to debug a problem on the QA side. I again thought that I had SSHed into the QA environment but I didn’t read my SSH command well enough when cycling between those environments (using Ctrl+r in the terminal will give you previous commands2). I turned off the production load balancer. Fortunately it was after hours, so I could easily revert it, but I needed a better solution.

Enough is enough

There are two problems with the profile switching approach: I need to remember to switch profiles when I am SSHing, and I need to be SSHing into the right environment for the given profile. These are error-prone enough that I don’t think the manual profile switching approach is workable long-term. Again, in a perfect world, we would have everything already automated and some way of making all of our changes through well-tested or peer-reviewed means. But there has to be a stopgap solution.

I had read a bit about automatic profile switching in iTerm after the database rename debacle. This iTerm feature provides the ability to know when we have changed servers and change the profile accordingly. At first, it seems to require shell integration, which means that you curl a script to each of your boxes to be able to use it. This seemed both potentially insecure and cumbersome as we add more servers to our environment, so I didn’t want to use it.

Triggers and automatic profile switching

Digging a bit deeper, it seems that you can also use triggers and automatic profile switching to mostly accomplish the same thing. There are two components we can work with to make this happen.

The first is a trigger. Triggers look at your terminal output and run actions when the output matches a given regular expression. There are a variety of interesting actions you can take based on a trigger, but we’ll use them to set the internal iTerm variables for username and hostname. Basically iTerm keeps track of these somewhere and you can use it to switch your profile automatically when it changes.

When the iTerm hostname or username changes you can use automatic profile switching for each profile to say when that profile should be used. If we change to a production host, then we should activate the production iTerm profile. Of course, when we exit out of that, we’d like to return to the default profile.

An example setup

Here’s a high level view of what we want to do. When we recognize something that means we are on:

  • QA box, we switch to the QA profile (dark blue background)
  • production box, we switch to the production profile (dark red background)
  • localhost, we switch the default profile (black background)

I set up the following profiles, with rationale:

Default

Triggers
  1. Set the iTerm username and host for either QA or production when we see it in an SSH prompt. The regex would match username@host-name directory_name $. If that were the prompt, this trigger would set username to username and host to host-name (the \1 pulls back the first match group of the regex.) Typically you’d have qa-web or prod-web or something like that as your hostname. You would want to match those for the next two parameters since you need the QA and production profiles to be based on the hostname (see below.)

    • Regular Expression: ^(\w+)@([\w\-]+):.*\$
    • Action: “Report User & Host”
    • Parameters: \1@\2
    • Instant: yes (explanation in its own section below)
  2. Set the iTerm host to a QA-host when we recognize that it is a QA Rails prompt:

    • Regular Expression: ^Loading qa environment \(Rails [\d\.]+\)$
    • Action: “Report User & Host”
    • Parameters: @some-qa-host
    • Instant: not needed
  3. Set the iTerm host to a production host when we recognize that it is a production Rails prompt. Similar to the previous trigger, but substitute production for instances of QA.

Automatic Profile Switching

Automatically switch to this profile when the hostname changes to our local host (hydration, in the case of my computer.3)

QA / Production

These are basically identical to each other, except for the automatic profile switching hostname. I copied these from the default profile and then changed the background color and name. The specific colors you use are not important as long as you can clearly differentiate the colors between environments and the production color strikes some sort of fear into you when you see it.

Trigger

When you see my special local prompt character (♦), set the iTerm host to the local machine name (hydration), since we want to switch back to the default profile at that point.

  • Regular Expression:
  • Action: “Report User & Host”
  • Parameters: @hydration
  • Instant: yes (explanation in its own section below)

Note: Having some sort of special local prompt is important to being able to use this approach. My guess is that you have customized your local prompt in some way so that you can either see the hostname in it or have some characters or patterns that are not typically encountered.

Automatic Profile Switching

Automatically switch to this profile when the iTerm hostname changes to the environment that we want. We would use qa-web for the QA profile, or prod-web for the production profile.

Testing

I usually work slowly and try to get one environment working first, and then try to get switching back to my default environment after that. You’ll know when you have things hooked up correctly when the colors change.

At first I was testing by actually SSHing into the boxes, but this was a bit slower than needed. Since iTerm does this matching based on looking at your terminal output, you can just echo a test string and you should be able to see the profile change (or flash for a little bit if you have switching back to the default profile configured.)

Instant or not?

“Instant” in the trigger definition refers to whether iTerm will wait for a newline before checking the output or not. Generally if something is in an interactive prompt, you probably want instant. If you don’t have instant enabled, then your profile won’t change until the second time the prompt is loaded because a newline won’t be provided until you press return/enter to finish inputting your command. I’d imagine that using instant is slightly slower since it constantly looks at the output, so I’d recommend not using it unless you are in an interactive prompt situation.

Wrapping up

I think that the iTerm documentation is not yet perfect for this feature, so setting this up for my environment took a little time. But now that it’s written up, hopefully you can see how a setup like this works and can customize it for your environment with less effort. It’s not a perfect solution, but it has already been helpful. Also, it’s just cool to see your background color change when you run a command. I’d say the fifteen minute investment is worth the effort to not do something silly in a live server.


  1. See earlier note about having insufficient monitoring. If someone physically tells you your service is down or broken before you know about it, you don’t have enough monitoring in place! 

  2. Searching through previous history is especially awesome with fzf. I highly recommend it. 

  3. It subtly reminds me to drink more water. 

Categories: main

« Squashing Intermittent Tests With ntimes Using a Redlock Mutex to Avoid Duplicate Requests »

Comments