LLMs

Memory Considerations

Since co-occurrence matrices are square, they grow exponential with the number of entities being embedded. For 50k entities and a 32-bit data format, a dense matrix will already be at 10GB. 100k entities puts it at 40GB.

If you are trying to embed even more entities than that or have limited RAM available, you may need to use a... See more

What I've Learned Building Interactive Embedding Visualizations

Query the RAG anyway and let the LLM itself chose whether to use the the RAG context or its built in knowledge

Query the RAG but only provide the result to the LLM if it meets some level of relevancy (ie embedding distance) to the question

Run the LLM both on it's own and with the RAG response, use a heuristic (or another LLM) to pick the best answer

r/LocalLLaMA - Reddit

Humans are bad at coming up with search queries. Humans are good at incrementally narrowing down options with a series of filters, and pointing where they want to go next. This seems obvious, but we keep building interfaces for finding information that look more like Google Search and less like a map.

All information tools have to give users some wa... See more

thesephist.com • Navigate, don't search

The quality of dataset is 95% of everything. The rest 5% is not to ruin it with bad parameters.

After 500+ LoRAs made, here is the secret

Jail-Breaked & Offline Appliances: It’s becoming increasingly clear that we’ll be able to interact with everyday appliances and devices with natural language. As locally run LLMs become more efficient and powerful, the prospects of having a conversation with your coffee machine in the morning aren’t unreasonable. After all, who wants to tinker with... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

How can we make interacting with conversational models feel more natural?

Every conversational interface to a language model adopts the same pattern:

A chat history sidebar, with each conversation lasting just a few turns

New sessions always begin in a brand-new thread

Every user query must always elicit exactly one response

None of these assumptions ar... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

📦 Service Deployment - Ray Serve (https://lnkd.in/eAV-Y6RN)

🧰 Data Transformation - Ray Data (https://lnkd.in/e7wYmenc)

🔌 LLM Integration - AIConfig (https://lnkd.in/esvH5NQa)

🗄 Vector Database - Weaviate (https://weaviate.io/)

📚 Supervised LLM Fine-Tuning - HuggingFace TLR (https://lnkd.in/e8_QYF-P)

📈 LLM Observability - Weights & Biases Tra... See more

linkedin.com • feed updates

Why is Discord such a good GTM for AI applications?

Text interface. Most users are just generating images, videos, and audio in these Discord servers. Prompts are easily expressible in simple text commands. It’s why we’ve seen image generation strategies like Midjourney (all-in-one) flourish in Discord while more raw diffusion models haven’t grown a... See more

Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]

One thing that is still confusing to me, is that we've been building products with machine learning pretty heavily for a decade now and somehow abandoned all that we have learned about the process now that we're building "AI".

The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has ... See more

Ask HN: What are some actual use cases of AI Agents right now? | Hacker News

You are assuming that the probability of failure is independent, which couldn't be further from the truth. If a digit recogniser can recognise one of your "hard" handwritten digits, such as a 4 or a 9, it will likely be able to recognise all of them.

The same happens with AI agents. They are not good at some tasks, but really really food at others.