Why AWS EC2 experience is broken for me - and what I'm doing about it

Nov 3, 2022 in CloueKeep

Amazon introduced Elastic Compute Cloud (EC2) into the world on August 25, 2006. As of this writing, that’s more than 16 years ago. And I’ve been working with EC2 instances from almost the beginning - most often as a Developer, sometimes as a DevOps Engineer, and occasionally as a Data Scientist.

When it was introduced, it was a tremendous innovation. Suddenly, you were able to create a server on demand - seemingly out of thin air. And you only paid for when you were using it. Terms like “cloud bursting” became common. The idea was you spun up extra resources when you needed them. Then spun them down when you didn’t.

But while it’s been an enormous success for AWS, I’ve always found the actual experience of using EC2 instances to be rather clanky. Improvements have been made over the past 16 years, but there are certain aspects of the experience that I consider to be broken.

How I (would like to) use EC2

Keep in mind, of course, that the way I use EC2 is very much in a non-production capacity. Let me explain how I generally work, and how I would like to be able to use EC2 instances.

Like everyone else, during each day, I am switching between various workflows. I may spend time engaged in meetings, making work plans, drawing up architectural diagrams, researching on a topic, responding to various emails/chats, or writing a blog article. Or I may be iterating rapidly on some tech stack. And in some of these times, I need an EC2 instance. I often find it useful for CI/CD pipeline development, where I am trying to work out some automation against a pristine environment. But I use it also for integration tests when I’m working on some SaaS component, and for exploring ‘big’ data sets. Sometimes I even use it for plain old development - though that has gotten less frequent thanks to Docker.

In other words, I would like to spin up an EC2 instance in an ad hoc manner, whenever my workflow directs me there. And that’s great (or at least it should be), because this is “cloud bursting” - and that’s what the cloud is really good for!

What’s the problem?

Now, the problem with EC2 instances that are brought up ad hoc is that they often end up continue running even when not in use. And since AWS doesn’t know (or care to know) if you’re actively using an EC2 instance or not, you end up having to pay for those idle cycles.

But “what’s the problem?” you might say. “Just turn off the EC2 instance when you’re done using it, and turn it back on when you need it again.”

Well, it’s not so easy.

First - there’s the human fallibility aspect. I’ve personally come across instances that I thought I had shut down but have been running for days. When one is switching between multiple workflows, it’s easy to lose track of even which instances you may have brought up.

Second - there is the fact that once you shut it down, you need to start it up when you need it again. And there is unnecessary friction in starting EC2 instances. (Nothing irks people like me more than unnecessary friction…) You have to login to the console on your web browser (and of course your Ops folks have mandated MFA on your account), navigate to the EC2 panel, find your instance, hit ‘start’, and wait while it picks up an IP address. That may not sound like a lot, but it always felt like 30+ seconds of work that should only take 3 seconds. And that feelings gets worse every time you have to do it.

Finally - there is the whole cost management aspect. Even if you are good about shutting down EC2 instances, and don’t mind the tedium in spinning them back on - you are still left in the dark as to just how much you have “cloud bursted”. I.e., it’s hard to stay within a budget when you are not clear on just how much you’re spending. Yes, it’s possible to figure out how much you’ve spent on ad hoc EC2 instances - but AWS doesn’t make it easy.

What can DevOps do?

If you’re a DevOps person, and you’ve read this far, you may be thinking you have solved these problems. A DevOps solution might go something like this:

Work with Developers and Data Scientists to come up with EC2 instance “templates” (i.e., base image plus initialization script). Build and maintain these with IaaC.
Create a set of IAM users (for all the Developers and Data Scientists) and assign restricted policy allowing access to the above EC2 instances.
Train the Developers and Data Scientists to access the console, update their passwords, and use the EC2 dashboard.
Create a Cost Explorer report that focuses on the above EC2 instances - and provide access to the IAM users.
(Optionally) Create AWS Budgets covering the above EC2 instances to notify the above IAM users if the budget is forecasted to be exceeded.
(Optionally) Additionally train the Developers and Data Scientists to create/download access keys, install/configure the AWS command line client, and use these to stop/start EC2 instances.
(Optionally) Shut down instances on some schedule, or when some (CloudWatch) metric falls below a threshold.

I’ve done all of these to various extents. I think it’s a lot of work (both upfront and ongoing), and far from perfect.

Easier way

I’ve decided that there must be an easier way to handle this. With a little inspiration and some help from my friends, I have been putting together a solution.

It’s a Slack App that addresses the problems I described above. I’m calling it Savi. When you add Savi to your Slack channel, you will be able to do the following:

Ask Savi to start/stop EC2 instance(s) that’s been configured for your channel.
Savi will keep track of EC2 instances that it started (on your behalf). From time to time, or when asked, Savi will slack a report of what instances are running, and how much you’ve spent on them.
Savi will automatically shut down EC2 instances that you may have forgotten to shut down (e.g., on schedule.)

I would still use IaaC to build EC2 instances, but I now make sure to use Savi to start it when I need to. With these capabilities, it fixes the various ways that I consider the EC2 user experience to be broken.

Call for Action

I’ve been incrementally building Savi for several months now - it’s still early stage, but I have reached the stage where I can share Savi with others. So, if you, like me, think this is the way to go for using EC2 instances in an ad-hoc manner, please give it a try.

To try Savi:

Note that while I hope to eventually add a paid tier to Savi, I will keep the single user tier completely free. I.e., if you are using Savi for your own personal use, it will always be free to you.

And if you do give it a try, please drop me a note and let me know what you think!