If it’s for a single low frequency workflow that GPU is enough, but you will be limited to small models which are mostly useless unless fine-tuned for your use case. If it’s serving users for the entire company with a big enough model to be useful you would need 192-384GB of VRAM. So a server between $20k and $40k.
The server will require maintenance, and somebody will have to develop the workflow and integration with your data.
It’s also important to know what they want to do, a basic embedding model for semantic search would work, agents not so much.
If it’s for a single low frequency workflow that GPU is enough, but you will be limited to small models which are mostly useless unless fine-tuned for your use case. If it’s serving users for the entire company with a big enough model to be useful you would need 192-384GB of VRAM. So a server between $20k and $40k.
The server will require maintenance, and somebody will have to develop the workflow and integration with your data.
It’s also important to know what they want to do, a basic embedding model for semantic search would work, agents not so much.