The Huggingface page has examples of how to use it: https://huggingface.co/ibm-granite/granite-8b-code-instruct
The Huggingface page has examples of how to use it: https://huggingface.co/ibm-granite/granite-8b-code-instruct
My point is that using “grokking” in ML is not a Musk/Twitter/Whatever-his-Ai-company-is-named invention, it predates their use.
Yes the original researchers reused a pre-existing meaning, which has been in internet for a while before. I did not know it came from Heinlein and I did not know its full meaning. I remember seeing it first, more than a decade ago, in a text that explained without any explanation that an isolated unknown word can easily be groked from context. Demonstrating it immediately. To me (and I guess to those researchers) “grok” means “understanding from context” which is particularly appropriate in the context.
BTW Elon was not the only one to reuse this word. Another company named Groq, totally unrelated to Musk as far as I know, designs AI acceleration chips.
Grokking is actually a concept in ML, when a model’s loss start suddenly lower far after it is considered to have overfit. That notion was named by researchers, I’ll let people decide if it is aptly named, but Elon likely just took it from there.
I really want this lemmy community to grow and strive but for that thing, I thought it was too important to not post it on the biggest community out there, so I made a post on /r/localllama to incite a collective response. Feel free to collaborate of cross-post/copy the message here: https://old.reddit.com/r/LocalLLaMA/comments/1b7iwxi/we_should_make_a_collective_rlocallama_answer_for/
I read the questions asked there and it is clear that it comes from people who have done their homeworks and are positive about open models already. Answering their questions in depth enough is pretty involved and would probably take me 1-2 days to bring up citations and articles.
It could be interesting to make a collaborative answer.
I don’t understand how we are supposed to file a comment?
Note that he did not confirm is was mistral-medium. He says that’s a retrained llama2-70B model, but hints that it is not the fully trained one. Sounds a bit like damage control but is not a 100% confirmation of the claim.
Nice! It feels like a direct answer to Karpathy comment on Mistral, where he said it is nice to call it “open weight” but not “open source” because we still don’t know the dataset and the training code. LLM360 seem to be fully open source by that definition and releases even the checkpoints!
Performance wise, a bit lagging (under a Llama2 of the same size) but all the tools are there to improve it!
Does Walmart have a monopoly on kinder chocolate? The idea is to have several distributors each with as complete a catalog as possible. Having such a shattered offers between platforms makes it very noncompetitive against any piracy solution.
It would probably be more effective to put an explicit mention in the system prompt. “Your interlocutor is a <gendered term> and will be greatly offended to be refered to as a boy or a man.”