Hello internet users. I have tried gpt4all and like it, but it is very slow on my laptop. I was wondering if anyone here knows of any solutions I could run on my server (debian 12, amd cpu, intel a380 gpu) through a web interface. Has anyone found any good way to do this?
There is an easy way with OpenWebUI but LLM are mostly accelerated by CUDA or ROCm. CPU acceleration is slow, but you can try it
kobold.cpp is easy to use, fast and I like it.
If you’re interested in more relevant Lemmy communities:
(another option: text-generation-webui has several backends bundled. Maybe one of those works for you.)
text-generation-webui is kind of the standard from what I’ve seen to run it with a webui, but the vram stuff here is accurate. Text LLMs require an insane amount of vram to keep a conversation going.
I tried Huggingface TGI yesterday, but all of the reasonable models need at least 16 gigs of vram. The only model i got working (on a desktop machine with a amd 6700xt gpu) was microsoft phi-2.
Thanks to this post, and the other comments in here, I’ve discovered that the ultimate ui for ai-models may well be
https://github.com/ParisNeo/lollms-webui
and on HuggingFace ( that name is aweful: to me it is the creepy-horrible FaceHugger, from the movie Alien, that I saw so many decades ago ) TheBloke has some models which are smaller
https://huggingface.co/TheBloke/
so you can choose a model that will actually-work on your hardware.
I think Llama-2 for brainstorming & CodeLlama-instruct for learning programming examples seems to be the cleanest pair, from what I’ve read, and he’s got GGUF versions with different quantizations, so you can choose what will actually-fit on your hardware.
There are other models on huggingface which seem very useful, like
- whisper-large-v3 for speech-to-text,
- whisperspeech for text-to-speech,
- sdxl-turbo for image-making ( for some copyright-free subjects to practice drawing with ), and so-on…
Some models require GPU, not all.
Damn things moved fast!