The baseline pipeline above is correct but not production-hardened. Three additions matter most for production: metadata filtering to prevent cross-tenant data leakage, index statistics monitoring to catch ingestion failures, and a sidecar audit log.
**Namespace enforcement.** If your application is multi-tenant, make the namespace a required parameter and assert it is set before every `index.query()` call. The most common production bug in multi-tenant RAG is a forgotten namespace parameter that exposes Tenant A's documents to Tenant B's queries — there is no automatic cross-namespace isolation if you pass `namespace=""`.
```python
def safe_retrieve(
query: str,
index,
tenant_id: str, # required — no default
top_k: int = 5,
filter: dict | None = None,
) -> list[dict]:
"""Retrieve with mandatory namespace isolation."""
if not tenant_id or tenant_id == "":
raise ValueError("tenant_id must be a non-empty string for namespace isolation")
namespace = f"tenant:{tenant_id}"
return retrieve(query, index, namespace=namespace, top_k=top_k, filter=filter)
```
**Index statistics.** Pinecone's `describe_index_stats()` returns vector count per namespace. Run it after every ingestion batch and alert if the expected count doesn't match. Pinecone upserts are eventually consistent — a count check immediately after upsert may undercount by a few vectors. Wait 1-2 seconds before counting in tests.
```python
def check_index_health(index, expected_namespace_counts: dict[str, int]) -> None:
stats = index.describe_index_stats()
for namespace, expected in expected_namespace_counts.items():
actual = stats.namespaces.get(namespace, {}).get("vector_count", 0)
if actual < expected:
print(f"WARNING: namespace '{namespace}' has {actual} vectors, expected {expected}")
else:
print(f"OK: namespace '{namespace}' has {actual} vectors")
```
**Hybrid retrieval option.** For workloads where exact keyword matching matters (code snippets, product SKUs, proper nouns), augment the Pinecone dense retrieval with a BM25 pass. Pinecone Serverless does not natively support sparse vectors on the serverless tier as of June 2026 (sparse+dense hybrid is available on Pinecone pod-based indexes). For serverless hybrid, run a separate BM25 index (Elasticsearch, Typesense, or the `rank_bm25` Python library in memory) and fuse results with Reciprocal Rank Fusion. See hybrid search BM25 + dense tutorial for the complete RRF implementation.