Amazon introduced Elastic Compute Cloud (EC2) into the world on August 25, 2006. As of this writing, that’s more than 16 years ago. And I’ve been working with EC2 instances from almost the beginning - most often as a Developer, sometimes as a DevOps Engineer, and occasionally as a Data Scientist.
When it was introduced, it was a tremendous innovation. Suddenly, you were able to create a server on demand - seemingly out of thin air. And you only paid for when you were using it. Terms like “cloud bursting” became common. The idea was you spun up extra resources when you needed them. Then spun them down when you didn’t.
But while it’s been an enormous success for AWS, I’ve always found the actual experience of using EC2 instances to be rather clanky. Improvements have been made over the past 16 years, but there are certain aspects of the experience that I consider to be broken.
How I (would like to) use EC2 #
Keep in mind, of course, that the way I use EC2 is very much in a non-production capacity. Let me explain how I generally work, and how I would like to be able to use EC2 instances.
Like everyone else, during each day, I am switching between various workflows. I may spend time engaged in meetings, making work plans, drawing up architectural diagrams, researching on a topic, responding to various emails/chats, or writing a blog article. Or I may be iterating rapidly on some tech stack. And in some of these times, I need an EC2 instance. I often find it useful for CI/CD pipeline development, where I am trying to work out some automation against a pristine environment. But I use it also for integration tests when I’m working on some SaaS component, and for exploring ‘big’ data sets. Sometimes I even use it for plain old development - though that has gotten less frequent thanks to Docker.
In other words, I would like to spin up an EC2 instance in an ad hoc manner, whenever my workflow directs me there. And that’s great (or at least it should be), because this is “cloud bursting” - and that’s what the cloud is really good for!
What’s the problem? #
Now, the problem with EC2 instances that are brought up ad hoc is that they often end up continue running even when not in use. And since AWS doesn’t know (or care to know) if you’re actively using an EC2 instance or not, you end up having to pay for those idle cycles.
But “what’s the problem?” you might say. “Just turn off the EC2 instance when you’re done using it, and turn it back on when you need it again.”
Well, it’s not so easy.
First - there’s the human fallibility aspect. I’ve personally come across instances that I thought I had shut down but have been running for days. When one is switching between multiple workflows, it’s easy to lose track of even which instances you may have brought up.
Second - there is the fact that once you shut it down, you need to start it up when you need it again. And there is unnecessary friction in starting EC2 instances. (Nothing irks people like me more than unnecessary friction…) You have to login to the console on your web browser (and of course your Ops folks have mandated MFA on your account), navigate to the EC2 panel, find your instance, hit ‘start’, and wait while it picks up an IP address. That may not sound like a lot, but it always felt like 30+ seconds of work that should only take 3 seconds. And that feelings gets worse every time you have to do it.
Finally - there is the whole cost management aspect. Even if you are good about shutting down EC2 instances, and don’t mind the tedium in spinning them back on - you are still left in the dark as to just how much you have “cloud bursted”. I.e., it’s hard to stay within a budget when you are not clear on just how much you’re spending. Yes, it’s possible to figure out how much you’ve spent on ad hoc EC2 instances - but AWS doesn’t make it easy.
What can DevOps do? #
If you’re a DevOps person, and you’ve read this far, you may be thinking you have solved these problems. A DevOps solution might go something like this:
- Work with Developers and Data Scientists to come up with EC2 instance “templates” (i.e., base image plus initialization script). Build and maintain these with IaaC.
- Create a set of IAM users (for all the Developers and Data Scientists) and assign restricted policy allowing access to the above EC2 instances.
- Train the Developers and Data Scientists to access the console, update their passwords, and use the EC2 dashboard.
- Create a Cost Explorer report that focuses on the above EC2 instances - and provide access to the IAM users.
- (Optionally) Create AWS Budgets covering the above EC2 instances to notify the above IAM users if the budget is forecasted to be exceeded.
- (Optionally) Additionally train the Developers and Data Scientists to create/download access keys, install/configure the AWS command line client, and use these to stop/start EC2 instances.
- (Optionally) Shut down instances on some schedule, or when some (CloudWatch) metric falls below a threshold.
I’ve done all of these to various extents. I think it’s a lot of work (both upfront and ongoing), and far from perfect.
Update — April 2026 #
A few things have happened since I wrote this in 2022.
I built that Slack App. I called it Savi, and it still exists today (install link — it’s serverless and runs free, so I just leave it up). I no longer think a Slack App is the right shape for this problem, though. The friction I was trying to eliminate — clicks through the EC2 console — just got displaced into setup. You have to install Savi into your workspace, hand it an IAM role, configure which channels can use it. For what Savi ultimately does (start and stop EC2 instances from chat), that’s too much friction up front.
I’ve also automated this for a real client using the AWS Instance Scheduler — an ophthalmology practice that needed their imaging server up Mon–Fri 7am–6pm Saskatchewan time and off otherwise. That works great for predictable schedules. It does not work for the ad-hoc “I just need a box for the next two hours” case I originally described. And it requires someone like me to set up, which means it doesn’t generalize to every developer who wants to spin up a quick instance.
Where I am now: I think the actual answer in 2026 is a chat interface — but with an AI agent rather than slash commands. Something like:
“Spin up that GPU instance. Shut it down in two hours if you don’t hear from me.”
“What’s still running from yesterday?”
“Stop everything I’m not actively using.”
The pieces all exist now: AWS APIs, LLMs that handle intent reliably, a small persistent memory of which instances belong to whom. Stitch them together and the original friction problem goes away on both sides — no console, no IaC change for ad-hoc work, and (critically) no human-in-the-loop required to keep costs sane.
I’ll write more about this when I have something to show.