hello everyone I have a small service I have been developing as a side gig for the last year and I want to self-host since I got some new hardware and I want to dive in the world of self-hosting and DevOps more (I’m a dev)
for now, I have it setup in a couple of EC2 instances, but things are getting expensive and since I got some new workstations from a friend that they were decommissioning (they are moving to a new place and have a bigger budget for a data center) I got 2 Lenovo Workstation each with Ryzen 9 5900X 32GB RAM and 1TB NVMe
I want to set up HA for this service and maybe add some other computer as a NAS, so my question is how should I go about doing this I was thinking proxmox on both nodes and setup HA, but I think that needs a common Data store (probably will be using the NAS for this) and how should I go about setting up the HTTP server (I have some stuff that also runs in docker containers I was thinking having one HTTP server for managing all that traffic with DNS and stuff) and monitoring any help is much appreciated
Think of backup solutions / separate storage as well for all your data.
Forgive my ignorance. What is “HA” in this context?
HA involves many factors: service uptime, link uptime, db uptime, etc. I’d probably put a reverse proxy in front and use the servers as upstream. web servers tend to be more reliable, so in your case a single instance ought to suffice.
Aside from actual HA tools, your most important asset in this stage is a uptime check service that pings your server every n seconds, a reliable backup/restore procedure, and a one-button deployment strategy.
Shits can and will probably happen. What are you going to do when it does? And how fast can you respond? I say this because you most likely won’t get HA right in the first, second, or third time, unless you already have tons of experience behind you. Embrace failure and plan accordingly.
true in my current setup on EC2 i have two Postgres dbs one is just replicating, but I had an incident when a bug i wrote in my spring app eat all the available RAM and the VM got stuck, and I lost about 6 hours worth of user data , so that’s why I’m thinking maybe HA could help if a VM in one node is stuck or blocked or something of that kind the hypervisor will spin another one on a different node or am i wrong here
If you had monitoring, you wouldn’t have taken 6 hours to catch it.
I’d say learn HA anyway because it’s a good skill, but that doesn’t prevent you from having the other parts I mentioned. I say this because, again, unless you are experienced with HA, there will be edge cases where it’s not going to do what you though it would do, and your service will be down all the same. Monitoring/alerting and one-click/shell script install will be much more valuable in the short-mid term.
true that i did have uptimekuma for simple monitoring but the 6 hour down time was me, i was unreachable i was inside a factory doing some troubleshooting with no service and forgot to ask for wifi creds 🤦♂️(was stressed that day ) as for rollback and install i run those through github actions for CI/CD
but I think that needs a common Data store (probably will be using the NAS for this)
Note that this setup could be a single point of failure. For true HA, you might want to consider deploying two storage boxes with replication (SAN or NAS), or configure a hyper-converged infrastructure using solutions like VMware VSAN, Ceph, or StarWind VSAN.
true was thinking about using True NAS for this, but don’t know if True NAS has anyway of doing replication with another node
Can you service run as HA without hypervisor HA? Like is it a webservice with a database backend?
TrueNAS enterprise with the cluster mirroring for HA iSCSI for your hypervisor storage?
I’m getting out of the mom and pop web and mail hosting business and taking a very slow and long shutdown myself by moving out of co-location and into my basement. So just TrueNAS core, ESXi vsphere essentials to do the vmware snapshots for that end of the business.
https://geek-cookbook.funkypenguin.co.nz/ (it has chapters about a distributed data store etc.)
thanks much appreciated