Moving away from AWS lambda/SQS/SNS/Aurora, worth it?
Considering the opportunity, I am assessing the cost of doing on-premise deployments and maybe partially or fully moving away from AWS.
Our current stack is EC2/lambda/SQS/SNS/Aurora/S3 & minor networking setups. AWS deployments made via serverless V3. All the code is in Node.js. NGINX is used for routing.
The main benefits I see in moving away from AWS are: 1. Ease of deployment, simply do an rsync on all servers and run migrations on the sharded database. 2. No vendor lock-in 3. Cost saving, we have minor costs for now but the bill is steadily increasing.
And my main fears are:
1. Managed services: SQS/SNS/lambda/Aurora are managed for autoscaling. From experience, is it really necessary or does a bigger server do the trick? 2. Actual migration effort: we are a lean team but we found that migrating away from other services (Cognito, DynamoDB) was easier than expected. 3. Worse service: can SQS/SNS/lambda easily be replaced without feature loss? I am looking at RabbitMQ.
If anyone did a similar migration, how did it go for you? Also I am talking about on-premise but is it the best solution to mitigate the risk of service interruption?
The problem I have seen with on premises is the inability to scale up. e.g. you have a server room and you don't have enough power or AC or space to add another server. You hit a problem at some point where the cost is not linear anymore - suddenly you need a new building or some bit of super expensive equipment. Since you may not be making money at the time this happens the company might get very reluctant to blow the cash to expand just to handle 5 more users and will start suggesting that the software needs some complicated change to fit the available hardware. This might turn the software into a pile of complicated crap.
In your case your customer probably thinks they have limited requirements so running it themselves is going to be no problem. It's going to be essentially a testing problem for you. For ever afterwards you will need a way to check that your system has not grown any hard dependencies on AWS and will still work in the client's setup.
Every update to the system is going to require some special consideration for that customer. You won't have their data on hand to check that your migration procedure is perfect. They may decide that they don't want your latest upgrade but that means that one day there will be a big, scary jump and you won't be able to monitor it and check that it went ok.
I think it's worth trying to ensure that you're solving the correct problem - if you simply made the effort to test your code on an alternate cloud service all the time to show that you have a recovery plan ... would that be enough to make the sale?
All your points are valid and can be included in the contract: - We'll be the ones to choose the cloud provider, - We'll take servers that are big enough, - The client should upgrade their server according to our specs, - Upgrades are mandatory.
Any idea what other (simpler) recovery plans we can have besides on-premise?
Firstly, I'd test what level of on-prem they are really asking for, because there are various shades of this:
- servers actually in their office
- their own servers in a datacenter (DC) near their office
- someone else's servers of which they are the only users, in a DC anywhere
- deployment into their own AWS account
Which one they actually need has an effect the kind of options available to you.
> From experience, is it really necessary or does a bigger server do the trick?
In our experience big servers do indeed to the tick. And if your costs are minor on Lambda/SQS/etc, then I imagine any new server is going to have 10x-100x the capacity you actually need.
> 3. Worse service: can SQS/SNS/lambda easily be replaced without feature loss?
Broadly speaking: yes. SQS can go to Redis/Valkey queues (ideally), or something heavier if needed. SNS also has good options, depending on the features you are using (FanOut -> Redis Streams, or Kafka; Email -> pick your provider). Lambda I would just replace with Kubernetes pods, but I'm a Kubernetes guy so hey.
> Also I am talking about on-premise but is it the best solution to mitigate the risk of service interruption?
I think it depends on what you are deploying. It will be easier for you to centralise more most likely. What we (business hat on now) would suggest would be a Kubernetes-managed cluster of physical servers, with the ability to isolate clients to their own server(s). This runs your enterprise clients separately from your public deployment, makes it easy to scale to new enterprise customers, easy to scale when individual customers grow, and provides easy migration of clients between servers in cases of hardware failure. You could also offer your customers a HA option which runs your software over three servers, with HA failover of services. Our company would then run it all for you, but you'd still have full access. (Removes business hat)
[1]: https://lithus.eu
I can tell that the on-premises will be deployed on AWS accounts. We can manage the resources ourselves.
We have few fanouts that can be refactored. So Redis/Valkey for SQS is OK, hopefully it can also cover our SNS use case.
I am afraid Kubernetes is overkill for our lambda needs.
If we manage to bundle our whole app in one server and have only 1-2 clients on -premise, do you still suggest Kubernetes or a simpler rsync on all servers is enough?
Also, should we have a separate database instance for each client, or a Postgres cluster sharded? The latter seams more manageable.
rsync is nice and simple. Personally I'd say at least use Ansible, with its built-in rsync support. Then you can do more than just copy files.
> Also, should we have a separate database instance for each client, or a Postgres cluster sharded? The latter seams more manageable.
Depends on the size. Run postgres separately for your on-prem clients. For your cloud clients, I'd say keep them on the same server until you start to get over 100GB-1TB of data, then start to think about sharding. RDS gets super expensive, so sharing too early may be uneconomical.
> I am afraid Kubernetes is overkill for our lambda needs.
For just Lambda, I agree. But if running everything outside of AWS (i.e. racking servers) then it shines, because then you have your app, postgres, valkey, etc, all balanced between servers.