Announcement, slides. Did you miss the talk? Check out the write-up.
At ODSC East 2024 I talked about RAG: how it works, how it fails, and how to evaluate its performance objectively. I gave an overview of some useful open-source tools for RAG evalution and how to use them with Haystack, and then offered some ideas on how to expand your RAG architecture further than a simple two-step process.
Some resources mentioned in the talk:
- Haystack: open-source LLM framework for RAG and beyond: https://haystack.deepset.ai/
- Build and evaluate RAG with Haystack: https://haystack.deepset.ai/tutorials/35_model_based_evaluation_of_rag_pipelines
- Evaluating LLMs with UpTrain: https://docs.uptrain.ai/getting-started/introduction
- Evaluating RAG end-to-end with RAGAS: https://docs.ragas.io/en/latest/
- Semantic Answer Similarity (SAS) metric: https://docs.ragas.io/en/latest/concepts/metrics/semantic_similarity.html
- Answer Correctness metric: https://docs.ragas.io/en/latest/concepts/metrics/answer_correctness.html
- Perplexity.ai: https://www.perplexity.ai/
Plus, shout-out to a very interesting LLM evaluation library I discovered at ODSC: continuous-eval. Worth checking out especially if SAS or answer correctness are too vague and high level for your domain.