ODSC East: RAG, the bad parts (and the good!)

Announcement, slides. Did you miss the talk? Check out the write-up.

At ODSC East 2024 I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with Haystack, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.

Some resources mentioned in the talk:

Haystack: open-source LLM framework for RAG and beyond: https://haystack.deepset.ai/
Build and evaluate RAG with Haystack: https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines
Evaluating LLMs with UpTrain: https://docs.uptrain.ai/getting-started/introduction
Evaluating RAG end-to-end with RAGAS: https://docs.ragas.io/en/latest/
Semantic Answer Similarity (SAS) metric: https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html
Answer Correctness metric: https://docs.ragas.io/en/latest/concepts/metrics/answer_correctness.html
Perplexity.ai: https://www.perplexity.ai/

Plus, shout-out to a very interesting LLM evaluation library I discovered at ODSC: continuous-eval. Worth checking out especially if SAS or answer correctness are too vague and high level for your domain.