Self hosted LLM

HumanPerson@sh.itjust.works · 9 months ago

Self hosted LLM

Morethanevil@lemmy.fedifriends.social · 9 months ago

There is an easy way with OpenWebUI but LLM are mostly accelerated by CUDA or ROCm. CPU acceleration is slow, but you can try it

h3ndrik@feddit.de · edit-2 9 months ago

kobold.cpp is easy to use, fast and I like it.

If you’re interested in more relevant Lemmy communities:

(another option: text-generation-webui has several backends bundled. Maybe one of those works for you.)

Scrubbles@poptalk.scrubbles.tech · 9 months ago

text-generation-webui is kind of the standard from what I’ve seen to run it with a webui, but the vram stuff here is accurate. Text LLMs require an insane amount of vram to keep a conversation going.

passepartout@feddit.de · 9 months ago

I tried Huggingface TGI yesterday, but all of the reasonable models need at least 16 gigs of vram. The only model i got working (on a desktop machine with a amd 6700xt gpu) was microsoft phi-2.

Paragone@lemmy.world · 9 months ago

Thanks to this post, and the other comments in here, I’ve discovered that the ultimate ui for ai-models may well be

https://github.com/ParisNeo/lollms-webui

and on HuggingFace ( that name is aweful: to me it is the creepy-horrible FaceHugger, from the movie Alien, that I saw so many decades ago ) TheBloke has some models which are smaller

https://huggingface.co/TheBloke/

so you can choose a model that will actually-work on your hardware.

I think Llama-2 for brainstorming & CodeLlama-instruct for learning programming examples seems to be the cleanest pair, from what I’ve read, and he’s got GGUF versions with different quantizations, so you can choose what will actually-fit on your hardware.

There are other models on huggingface which seem very useful, like

whisper-large-v3 for speech-to-text,
whisperspeech for text-to-speech,
sdxl-turbo for image-making ( for some copyright-free subjects to practice drawing with ), and so-on…

Some models require GPU, not all.

Damn things moved fast!