South Korea says it's uncovered evidence that DeepSeek has secretly been sharing data with ByteDance, the parent company of popular social media app TikTok.
This is the web chat client/app, just like OpenAI sharing data with Microsoft, or Copilot doing the same. If you self host these LLMs your data stays within your LAN.
I dont really use LLMs so I didn’t even realize there were versions with different weights and stuff. I was using 7b, but found it pretty useless. Pretty sure I’m not going to be able run 32B on my rig. lmao.
Locally? Arcee 14B and the 14B Deepseek distill are currently the best models that fill fit.
I’d recommend hosting them with TabbyAPI instead of ollama, as they will be much faster and more VRAM efficient. But this is more fuss.
Honestly, I would just try free APIs like Gemini, Groq, and such through open web ui, or use really cheap APIs like openrouter. Newer 14B models are okay, but they’re definitely lacking that “encyclopedic intelligence” larger models have.
I’m not @brucethemoose@lemmy.world However here’s a pretty barebones how to article to get you started. Just know it can be as complicated as you like. For starters you may want to stick to the 7b and 14b models like mistral:7b and phi4:14b as they’ll fit easily on your card and will allow you to test the waters.
I use 32b and the 672b side by side. The performance hit is around 20% and I keep all my data local. I am not conflating the two however self hosting works for me just fine. Your usecase is your own certainly. However I’d rather take the performance hit for the added data privacy.
Also it’s nice to he able to set my own weights and further distil R1
I have a local python expert a local golang expert and both have my local gitlab repository and I’ve tied their respective Ollama keys to my VSCode IDE.
With the distilled models I have, I’ve been able to build and troubleshoot pretty complicated apps in Golang and Python. However, these distilled models are very specialized and will not do things like write me a story about a duck made out of duct tape or properly summarize articles. There are absolutely limits to my workflow and setup. But I’m pretty happy with it.
I’m moving to self host all my streaming stuff. Switching from local-only plex to self hosting all my media (spotify, google photos, LLMs) and tools behind a reverse proxy so i can access outside my home. It’s pretty sweet and a good learning experience using reverse proxies
Edit: Plus fuck these technofeudal lords who enclose access to markets, information, and culture.
This is the web chat client/app, just like OpenAI sharing data with Microsoft, or Copilot doing the same. If you self host these LLMs your data stays within your LAN.
You can’t practically self-host Deepseek R1.
Look, I use the 32B distil on my 3090 every day, but it is not the same thing as full R1. And people need to stop conflating the two.
And (theoretically) API usage through one of many R1 providers is private.
I dont really use LLMs so I didn’t even realize there were versions with different weights and stuff. I was using 7b, but found it pretty useless. Pretty sure I’m not going to be able run 32B on my rig. lmao.
guess ill continue being an LLMless pleb.
There are plenty of free LLM APIs you can use with something like Open Web UI, on any machine. I still use them myself.
Have you got any recs? I’ve got a 3080 in my machine atm
Locally? Arcee 14B and the 14B Deepseek distill are currently the best models that fill fit.
I’d recommend hosting them with TabbyAPI instead of ollama, as they will be much faster and more VRAM efficient. But this is more fuss.
Honestly, I would just try free APIs like Gemini, Groq, and such through open web ui, or use really cheap APIs like openrouter. Newer 14B models are okay, but they’re definitely lacking that “encyclopedic intelligence” larger models have.
I’m not @brucethemoose@lemmy.world However here’s a pretty barebones how to article to get you started. Just know it can be as complicated as you like. For starters you may want to stick to the 7b and 14b models like mistral:7b and phi4:14b as they’ll fit easily on your card and will allow you to test the waters.
If you’re on Windows https://doncharisma.org/2024/11/23/self-hosting-ollama-with-open-webui-on-windows-a-step-by-step-guide/
If you’re using Linux https://linuxtldr.com/setup-ollama-and-open-webui-on-linux/
If you want a container https://github.com/open-webui/open-webui/blob/main/docker-compose.yaml
I use 32b and the 672b side by side. The performance hit is around 20% and I keep all my data local. I am not conflating the two however self hosting works for me just fine. Your usecase is your own certainly. However I’d rather take the performance hit for the added data privacy.
Also it’s nice to he able to set my own weights and further distil R1
I have a local python expert a local golang expert and both have my local gitlab repository and I’ve tied their respective Ollama keys to my VSCode IDE.
Depends for sure. I usually try the 32B first, but give really “hard” queries to some API model.
With the distilled models I have, I’ve been able to build and troubleshoot pretty complicated apps in Golang and Python. However, these distilled models are very specialized and will not do things like write me a story about a duck made out of duct tape or properly summarize articles. There are absolutely limits to my workflow and setup. But I’m pretty happy with it.
I’m moving to self host all my streaming stuff. Switching from local-only plex to self hosting all my media (spotify, google photos, LLMs) and tools behind a reverse proxy so i can access outside my home. It’s pretty sweet and a good learning experience using reverse proxies
Edit: Plus fuck these technofeudal lords who enclose access to markets, information, and culture.
!selfhosted@lemmy.world to the win!
Love to see it.
Edit: replied to the wrong comment