Name: Embeddings, Honestly
Availability: InStock

This edition intentionally uses chapter-specific research bases rather than repeating the same source list everywhere.

Key Takeaways

Embedding sources in this appendix are grouped by the job they do: representation, retrieval, evaluation, security, multimodal work, and governance.

The source map is not decorative bibliography. It is the evidence trail behind the book's production claims.

Use the appendix to trace each chapter back to the exact local and external sources already cited in the manuscript.

Read this beside why RAG pipelines fail in month three, Retrieval That Survives Contact, and the Embeddings, Honestly overview when you turn the chapter into a production retrieval review.

This edition intentionally uses chapter-specific research bases rather than repeating the same source list everywhere. The source map exists so a reader can trace the book's claims back to the materials already used in the chapters, without pretending every chapter depends on every citation equally.

The representation sources explain what embeddings and vector indexes actually do. OpenAI Embeddings and Sentence Transformers ground the model side. Pinecone Semantic, Pinecone VectorDB, Faiss, HNSW, and ANN Benchmarks ground the index side. They support the book's repeated distinction between a learned representation and a production decision.

The retrieval sources explain why dense search alone is rarely enough. Weaviate Hybrid, Weaviate Docs Hybrid, BM25, Lucene OpenAI, and Elastic Rerank are the basis for the book's dense, sparse, hybrid, fusion, and reranking chapters. The practical claim is simple: retrieval quality comes from combining signals, not from worshiping one signal.

The RAG and evaluation sources support the chapters on chunking, lost evidence, evaluation, and failure analysis. RAG Survey, Lost Middle, RAGAS, RAG Eval Survey, ERAG, RAGDB, and Semantic Recall are cited where the manuscript needs measurement, not adjectives. They keep the book honest about recall, context position, cost, and whether the expected evidence was actually found.

The security and governance sources support the retrieval-boundary chapters. OWASP LLM Top 10, OWASP Prompt Injection, and NIST AI RMF are used when the book talks about permissions, tenant boundaries, source policy, prompt injection, and risk management. They are not there to decorate the argument. They are there because retrieval can fail before generation begins.

The relationship and multimodal sources support the chapters where vector neighborhoods are not enough. CLIP grounds the multimodal discussion. GraphRAG and HybridRAG ground the argument that relationships, paths, and graph structure sometimes carry the evidence vectors flatten.

Local context also matters. The site research pass used the existing chapter manuscripts in src/content/book-chapters/embeddings-deep/, the book overview at /books/embeddings-deep/, the RAG cluster links in alpeshseo/context/internal-links-map.md, and the voice/SEO guardrails in alpeshseo/context/brand-voice.md, style-guide.md, seo-guidelines.md, and target-keywords.md. No new live citations were added during this pass. The appendix only organizes sources already present in the book directory or local context.

Appendix A: Source Map