Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

LangSmith Trace Quotas 2026: Plans, Limits, Pricing, and How to Stay Under Budget

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

If you're evaluating observability platforms for a new agent project, start with our agent observability 2026 state of the market overview — it covers the full landscape of LLM tracing tools (LangSmith, Langfuse, Helicone, Arize, Weights & Biases Weave, and others) with head-to-head comparisons on pricing, integration depth, and enterprise feature sets. This page goes deep on LangSmith specifically, with exact quota numbers and strategies for operating within each plan's limits.

LangSmith — documented at https://docs.smith.langchain.com/ — is the observability and evaluation platform built by the LangChain team. It integrates most deeply with the LangChain and LangGraph ecosystems, offering automatic tracing when you instrument your chains and agents with the LangSmith tracer. Every LLM call, tool invocation, chain execution, and retrieval step is captured as a nested trace, giving you a detailed view of what your agent did and why. That observability capability is exactly what makes LangSmith traces accumulate fast — which is why understanding the quota system is critical before you go to production.

The rest of this page covers the Developer, Plus, and Enterprise plan limits in detail, trace size constraints, strategies for reducing trace volume, and a competitive comparison with Langfuse and Helicone. For cost modeling on the LLM calls your agents are making, see also our agent framework decision matrix 2026, our Claude API cost calculator, and our Claude API rate limits 2026 reference.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

LangSmith plan limits — June 2026

Feature
Plan
Traces/month
Retention
Seat price
Developer (free)5,000 traces/month14-day retention$0/seat
Plus (paid)50,000 traces/month included90-day retention$39/seat/month
Plus overagePay-as-you-goSame 90-day retention$0.50 per 1,000 traces over 50k
EnterpriseCustomUp to 1-year+ retentionNegotiated per seat
Max trace size (all plans)20 MB per traceHard limit; exceeding it drops the trace
Max run size within trace10 MB per runEach nested run step is individually capped at 10 MB
Trace ingestion rate limit~100 traces/second per orgSoft limit; contact LangSmith for higher throughput
Feedback storageIndefinitely on paid plansHuman feedback and scores stored without additional retention cap
Dataset sizeNo hard limitUI Explorer caps display at 500 rows; API access unlimited
Human annotation queueIncluded on Plus/EnterpriseNot available on Developer free plan
LLM-as-judge evaluatorsPlus and Enterprise onlyCustom evaluators powered by LLM; not available on Developer
API rate limit (read)10 req/s Developer, 100 req/s PlusLangSmith API read rate limits for dashboard and export calls

Sources, fetched 2026-06-21: https://docs.smith.langchain.com/, https://www.langchain.com/langsmith (pricing page). Verify current plan limits at smith.langchain.com/settings/plans.

LangSmith's tracing model: what counts as a trace

The most important concept for managing LangSmith quota is understanding what constitutes a single trace. **A trace is a root run — one top-level chain execution, agent invocation, or LLM call — along with all the nested child runs it spawns.** A complex agent run that makes 10 LLM calls, 5 tool invocations, 3 vector store retrievals, and 2 code executions is still counted as ONE trace against your monthly quota. This is fundamentally different from counting individual LLM calls — the trace is the conversation-level unit, not the model-call unit.

This distinction has major practical implications for quota management. If you're running a document Q&A application where each user query triggers one agent invocation (which internally makes 3 LLM calls for retrieval-augmented generation), each user query consumes one trace — not three. A developer running automated test suites that invoke 100 agent chains per test run generates 100 traces per run, regardless of how many LLM calls each chain makes internally. **Understanding this 1-trace-per-root-run accounting is essential for predicting your monthly consumption accurately.**

What inflates trace counts is the creation of new root runs rather than nested child runs. LangChain's map/reduce patterns — where a chain fans out to process multiple documents in parallel — can create separate root runs for each document if not configured correctly. A map chain processing 50 documents where each document gets its own root run generates 50 traces. The same operation where the parallel steps are nested children of a single root run generates 1 trace. **Review your chain architecture for any patterns that spawn separate root runs unintentionally** — these are the hidden trace multipliers that blow through Developer plan quotas in days.

The LangSmith tracer auto-instruments LangChain components — LLMChain, AgentExecutor, RunnableSequence, LangGraph StateGraph, and their constituent steps — by default. This automatic instrumentation is convenient but means that every invocation of a traced component, even during development and testing, counts as a trace. **Disable tracing in development scripts that run repeatedly** (test harnesses, batch processing scripts, prompt iteration loops) to preserve your monthly quota for the production monitoring where it actually matters.

Traces at https://docs.smith.langchain.com/ are organized by Project, with each project getting its own trace count. **Creating separate projects for production, staging, and development is a best practice** — it allows you to isolate quota consumption by environment, pause tracing in non-production environments during bulk operations, and get clean production-only analytics without dev/test noise contaminating your dashboards.

Trace metadata (tags, metadata fields, feedback scores) does not directly count toward your trace quota — it's stored as attributes on the trace record. However, adding rich metadata to traces does increase the trace's stored size, which matters for the 20 MB per-trace size limit. Large metadata objects (embedding vectors, raw retrieved documents, full API responses stored as metadata) can push traces toward the size limit. Store references (IDs, URLs, keys) in metadata rather than full objects wherever possible.


Developer plan: 5,000 traces/month and why it runs out fast

LangSmith's Developer plan gives 5,000 traces per month with 14-day retention at no cost. The math is straightforward: 5,000 traces ÷ 30 days = ~167 traces per day. **A developer actively iterating on a multi-step agent chain who runs 10 test invocations per hour for a workday (8 hours × 10 = 80 traces/day) hits the monthly limit in about 62 days.** That sounds comfortable — but one busy week of automated testing or prompt regression runs can burn through weeks of quota.

The 14-day retention window is the Developer plan's most limiting characteristic for practical debugging. If you trace a production issue and don't investigate it within 14 days, the trace data is gone. For real applications with intermittent bugs (errors that only surface under specific conditions, once or twice a week), 14-day retention is often insufficient. **The ability to go back 30, 60, or 90 days to find the trace for a specific user complaint is one of the most compelling reasons to upgrade from Developer to Plus.**

The Developer plan also lacks LLM-as-judge evaluators — the capability to automatically score your agent's outputs using another LLM (e.g., using Claude Sonnet 4.6 or GPT-5.4 as an evaluation judge). This means Developer plan users can only do manual annotation or rule-based evaluation. For any team building an application with quality requirements beyond 'I eyeballed it', the absence of automated LLM evaluators on the free plan is a significant capability gap.

Human annotation queues are also absent on the Developer plan. This means there's no structured workflow for reviewing flagged traces, assigning them to team members, and recording pass/fail judgments. For solo developers prototyping alone, this is fine. For any team with more than one person responsible for quality monitoring, the human annotation queue in Plus becomes a necessary collaboration tool.

**Upgrade trigger signals for the Developer plan**: you're hitting the 5,000-trace ceiling before month-end; you've needed trace data older than 14 days to debug a production issue; you want to run automated LLM-as-judge evaluations on a dataset; or you have more than one person on the team who needs to review traces. Any one of these signals indicates the Developer plan is no longer appropriate. The $39/month Plus plan cost is typically less than one hour of developer time spent debugging a production issue without adequate trace history.


Plus plan: 50,000 traces and the overage math

The LangSmith Plus plan provides 50,000 traces per month per seat at $39/seat/month, with 90-day retention and access to LLM-as-judge evaluators and human annotation queues. Overage beyond 50,000 traces is billed at $0.50 per 1,000 additional traces — a relatively low per-trace rate that makes Plus a reasonable starting point for production deployments.

**Overage math for a realistic production team**: suppose you have 3 engineers on Plus ($117/month) running an agent application with 10 production agents, each generating 100 traces per day. That's 3,000 traces/day × 30 days = 90,000 traces/month. The Plus plan's included 50,000 traces covers the first 50,000; the remaining 40,000 traces cost $0.50/1,000 = $20 in overage. Total monthly cost: $117 + $20 = $137. For a production application monitoring, that's excellent value. **The Plus plan scales reasonably well into moderate production workloads before overage costs become punishing.**

Where overage costs can surprise teams is in evaluation workflows. Running LLM-as-judge evaluations on a dataset of 1,000 examples generates 1,000+ traces (one per evaluation call). If your team runs evaluations daily as part of a CI/CD pipeline, that's 30,000 evaluation traces per month — a meaningful fraction of your 50,000 included quota dedicated to evals rather than production monitoring. **Create a separate LangSmith project for evaluation runs** and track its trace consumption separately from your production project to understand the eval-vs-production split.

The 90-day retention on Plus is a substantial upgrade from the Developer plan's 14 days. 90 days covers most realistic debugging windows — a customer complaint from 2 months ago can still be traced. For longer-lived applications (annual subscription customers, B2B SaaS with multi-month contracts), even 90 days may not be sufficient, which is where Enterprise's custom retention (up to 1 year or more) becomes necessary.

LLM-as-judge evaluators on Plus use your LangSmith account's underlying LLM API keys to run evaluations. If you configure Claude Sonnet 4.6 as your evaluation judge ($3/M input, $15/M output), each evaluation call is billed against your Anthropic account separately from your LangSmith plan cost. **Budget both the LangSmith trace overage AND the LLM evaluation API costs** when projecting total observability spend. A 1,000-example eval suite with 500-token prompts and 200-token verdicts costs approximately $2.10/run on Sonnet 4.6 — add that to your monthly LangSmith cost projection.

The Plus plan's API read rate limit (100 requests/second vs the Developer plan's 10 req/s) matters for teams building custom dashboards, running large data exports, or integrating LangSmith data into their own analytics pipelines. The 10x higher read rate limit on Plus enables automated reporting and data pipeline integrations that simply aren't feasible at the Developer plan's 10 req/s limit. See https://docs.smith.langchain.com/ for the full API reference and rate limit documentation.


Enterprise plan: custom retention, SSO, and compliance

The LangSmith Enterprise plan targets organizations with compliance requirements, large team sizes, or trace volumes that make per-seat pricing economically challenging. Core Enterprise differentiators: custom trace quotas (negotiated based on expected volume), retention periods up to 1 year or longer, SSO/SAML integration, audit logs for compliance, SOC 2 certification, and dedicated support. **For regulated industries (healthcare, finance, legal) where data retention policies are mandated, Enterprise is the only viable option — the 90-day Plus retention cap may violate compliance requirements.**

The practical threshold for an Enterprise conversation with LangSmith is typically one of: monthly LangSmith spend approaching $5,000+ (at which point custom pricing often yields better unit economics than per-seat Plus rates); a compliance requirement for data residency or retention beyond 90 days; a need for on-premises or private cloud deployment; or SSO requirements from your IT/security team. Enterprise pricing is custom and not publicly listed — initiate contact through https://www.langchain.com/langsmith.

On-premises deployment is available for Enterprise customers through LangGraph Platform, LangChain's on-premises agent infrastructure product. This means the trace data never leaves your cloud environment — important for customers handling sensitive data (PHI, PII, financial records) who cannot send that data to LangSmith's cloud. The on-premises option requires running your own LangSmith instance, which adds infrastructure overhead but enables full data sovereignty.

Data residency options are an important Enterprise feature for international organizations. LangSmith's cloud infrastructure is US-based by default. Enterprise customers can negotiate data residency in the EU (for GDPR compliance) or other regions. If your users are in Europe and your application processes personal data (which virtually all production applications do), EU data residency is likely a requirement — and one that only Enterprise can satisfy.

Enterprise also includes priority support and dedicated customer success management. For production deployments where a LangSmith outage or data issue would materially impact your application, having a direct escalation path with SLA guarantees is worth the premium. The Developer and Plus plans use self-service support with community Slack and documentation as the primary support channels — fine for prototyping, insufficient for production environments with uptime requirements.


Trace size limits: 20 MB per trace and how to stay under

Every trace in LangSmith has a hard 20 MB size limit, with individual runs nested within the trace capped at 10 MB each. **Exceeding these limits causes the trace — or the specific run that exceeds 10 MB — to be silently dropped.** You won't get an error in your application; you'll simply notice that certain traces aren't appearing in the LangSmith dashboard. This silent failure mode makes the 20 MB limit one of the most important constraints to actively monitor.

Large language model context windows are the primary driver of traces approaching the 20 MB limit. A single Claude Opus 4.7 or GPT-5.5 call with a 100,000-token input prompt stores all of that token content in the trace by default. At approximately 4 characters per token, 100,000 tokens = 400,000 characters = ~400 KB just for the prompt content. Five such calls in a single agent trace totals 2 MB from prompt content alone — still well under 20 MB, but adding tool results, retrieved documents, and output text can push complex traces toward the ceiling.

The highest-risk traces are those involving large tool results or bulk document retrieval. A file_search-style tool that returns 50 full document chunks at 2,000 tokens each returns 100,000 tokens of tool result content. Stored in the trace as raw text, that's ~400 KB per tool invocation. A multi-step agent that makes 5 such retrieval calls stores 2 MB of raw retrieved content in the trace. **The fix is to store document IDs and metadata in tool results rather than full document text** — your application still has the full text (it's in your vector store), but the trace stays small.

LangSmith provides output_keys filtering — the ability to configure which keys from your chain outputs get stored in the trace and which are excluded. For chains that return large intermediate objects (full retrieved documents, embedding vectors, raw database results), configure output_keys to exclude those large fields and store only the final answer and key metadata. This is the most targeted way to reduce trace size without changing your application logic.

Sampling is a complementary strategy to trace size reduction. At high production volumes, tracing only 10–20% of requests (chosen randomly or by sampling on error status) dramatically reduces both trace count and total trace size storage. Configure sampling in LangChain using the LANGCHAIN_TRACING_SAMPLING_RATE environment variable or by disabling the tracer on specific chain invocations using .with_config({'run_name': None, 'tags': [], 'callbacks': []}). Always trace errors and exceptions regardless of sampling rate — the traces you care most about are the ones where something went wrong, not the successful runs.

For production deployments, implement trace size monitoring as part of your observability pipeline. Log the estimated size of each trace (you can estimate from the size of the inputs/outputs being stored) and alert when any trace approaches 15 MB — giving you headroom to investigate before hits are silently dropped. The LangSmith API at https://docs.smith.langchain.com/ includes endpoints for retrieving run details that you can use to build this monitoring.


Reducing trace volume: sampling, filtering, and selective tracing

Managing trace volume is the difference between the LangSmith Developer plan lasting a month or a week. The good news: with intentional configuration, you can capture all the observability value you need while consuming a fraction of the raw trace count. The key strategies are sampling (only trace a percentage of calls), environment filtering (don't trace dev/test), selective error-only tracing, and project isolation.

**Sampling in production** is the most impactful lever. For a stable, well-tested application, you don't need to trace every call — you need to trace enough calls to catch anomalies, and to trace all errors. A 10% sampling rate on successful calls + 100% tracing on errors captures the distribution of normal behavior and the full error tail. LangChain supports this via the LANGCHAIN_TRACING_SAMPLING_RATE environment variable (values 0.0–1.0). Pair sampling with an error-always rule using a custom callback that re-enables full tracing when an exception is detected.

Environment filtering is the simplest quick win. If you're running automated tests, prompt evaluation scripts, or development scripts locally, disable tracing by setting LANGCHAIN_TRACING_V2=false in your .env.local or test configuration. These development-time invocations produce traces with no diagnostic value for production monitoring — every trace they consume is wasted quota. Reserve tracing for production and staging environments where real user behavior is being captured.

Project isolation allows you to budget independently for different environments and use cases. Create separate LangSmith projects for production (full-rate tracing for real user sessions), staging (full-rate tracing for QA), development (disabled or minimal), and evaluation (LLM-as-judge eval runs). Monitor each project's trace consumption separately. If your evaluation project is consuming 60% of your monthly quota, that's a signal to reduce eval frequency or move eval traces to a lower-priority project that you're willing to let go over quota.

The 'trace on error only' pattern is appropriate for very high-volume, very stable production pipelines where the P99 success rate is high (>99.5%) and you're primarily interested in debugging failures. In this pattern, you disable the LangSmith tracer by default and enable it only when an exception is caught, by re-running the chain with the tracer attached using .with_config({'callbacks': [langsmith_tracer]}). This approach requires capturing the failing input, re-running it (which adds latency), and hoping the error is deterministic. It works well for batch processing jobs; less well for real-time applications where re-running the input is expensive or not reproducible.

For very high-volume applications (millions of daily agent calls), consider a tiered tracing strategy: use a lightweight local trace store (Langfuse self-hosted, or a custom structured logging pipeline) for raw high-volume telemetry, and forward only the most interesting traces (errors, slow calls, flagged outputs) to LangSmith for in-depth analysis and human annotation. This hybrid approach captures volume telemetry cheaply while preserving LangSmith quota for the traces where LangSmith's analysis features add the most value.


LangSmith vs Langfuse vs Helicone for teams on tight quotas

LangSmith is not the only observability platform for LLM applications — and for teams on tight budgets or with specific technical requirements, Langfuse or Helicone may be better fits. The comparison across these three tools covers integration depth, free tier generosity, self-hosting options, and feature breadth.

**LangSmith** (https://docs.smith.langchain.com/): deepest integration with LangChain and LangGraph, with automatic tracing for all LangChain components. LangSmith's evaluation framework (datasets, LLM-as-judge, human annotation queues) is the most mature in the market for LangChain-based applications. The Developer free tier is relatively stingy (5,000 traces/month with 14-day retention) compared to alternatives. No self-hosting option for the observability backend outside of Enterprise. Best fit: teams already committed to the LangChain/LangGraph ecosystem who need production-grade evaluation workflows.

**Langfuse** (https://langfuse.com/docs): open-source, self-hostable, and available on a generous cloud free tier. The Langfuse Cloud free tier includes 50,000 events per month (note: events, not traces — complex traces generate multiple events) with 14-day retention. The open-source version is free to self-host with no trace limits — if you can run a Docker container, you can have unlimited traces with unlimited retention. Langfuse has first-class integrations for LangChain, OpenAI SDK, Anthropic SDK, and a growing list of frameworks. Its evaluation features are less mature than LangSmith's but advancing rapidly. **Best fit: teams that want unlimited traces via self-hosting, or teams working with multiple LLM providers (not exclusively LangChain).**

**Helicone** focuses on request-level logging for OpenAI and Anthropic API calls, with a simpler UX targeted at developers who want a quick setup rather than a full observability platform. The Helicone free tier includes 10,000 requests per month with no retention limit on the free tier. Helicone's strength is dead-simple integration (a proxy URL swap, no SDK changes), cost tracking, and rate-limit monitoring. It does not have deep support for complex multi-step agent traces — it logs individual LLM calls, not the full agent execution tree. **Best fit: teams that primarily want cost tracking and rate-limit monitoring for direct LLM API calls, with minimal setup friction.**

**Recommendation matrix by team situation**: (1) If you're using LangGraph or complex LangChain pipelines and need evaluation workflows: LangSmith Plus. (2) If you're cost-constrained and technically capable of running Docker: Langfuse self-hosted (unlimited traces, free). (3) If you're on multiple LLM providers and want the most generous cloud free tier: Langfuse Cloud (50k events/month free). (4) If you're building a simple RAG app and just want cost tracking: Helicone free tier. (5) If you're in a regulated industry needing data residency: LangSmith Enterprise or Langfuse self-hosted in your own cloud.

The operational cost of self-hosting Langfuse is modest: a small Postgres database ($20–50/month on Railway, Render, or Supabase), a Langfuse server container (can run on a $10/month Fly.io or Railway instance), and some one-time setup time. For a team generating 500,000+ traces per month where LangSmith Plus overage would cost $225+/month, self-hosted Langfuse pays for itself in the first month. The tradeoff is operational overhead — you own the infrastructure, backups, and upgrades.

One nuance in the Langfuse vs LangSmith comparison: **LangSmith's human annotation queue and its tight integration with LangGraph's agent checkpointing are genuinely differentiated features** that Langfuse doesn't match. If your team's workflow involves product managers or domain experts annotating agent outputs to build golden datasets, LangSmith's collaboration features are worth the premium. If observability is purely an engineering function and annotation is done programmatically via LLM-as-judge, the case for Langfuse's cost advantage is stronger.


Using LangSmith's evaluations and datasets without blowing quota

LangSmith's evaluation system — datasets, LLM-as-judge evaluators, and human annotation workflows — is one of its strongest features and a key reason teams choose Plus over alternatives. But eval runs are traces too, and an active evaluation program can consume a substantial fraction of your monthly quota if not managed carefully.

**The core math**: each example in a dataset evaluation run generates at least one trace — the chain execution against that example. If you also run an LLM-as-judge evaluator, that judge call generates an additional trace. A 200-example dataset evaluated with LLM-as-judge = 400 traces per evaluation run. Running this evaluation daily for a month = 12,000 evaluation traces. On the Developer plan's 5,000-trace monthly budget, that's impossible. On the Plus plan's 50,000-trace budget, it consumes 24% of your monthly quota — significant, but manageable if you're intentional about it.

Strategy 1: **run evaluations in a dedicated project** and track its quota usage separately from production. This gives you visibility into the eval-vs-production split and lets you throttle eval frequency when the eval project is consuming too much quota. LangSmith's Projects feature makes this straightforward — create an 'evaluations' project and direct all eval runs there.

Strategy 2: **schedule evaluations weekly, not daily.** Unless you're making daily code changes that could affect output quality, daily evaluations are overkill. Weekly evaluations on a 200-example dataset consume 400 traces × 4 weeks = 1,600 traces/month — versus 400 × 30 = 12,000 for daily evaluations. This 7.5x reduction in eval trace consumption leaves far more quota for production monitoring. Use CI/CD-triggered evaluations (evaluate on every merge to main, not every commit) rather than scheduled evaluations for even better quota efficiency.

Strategy 3: **cache LLM-as-judge responses for stable prompts.** If your evaluation prompt template and your reference dataset haven't changed since the last eval run, many of the LLM-as-judge verdicts will be identical to the previous run. Implement a simple cache: hash the (eval prompt + dataset example + model output) tuple, look up the hash in a local cache before calling the evaluator, and only make a fresh LLM call when the hash is new. This can reduce evaluator LLM API costs by 60–80% on stable datasets while providing accurate evaluation scores.

Strategy 4: **use smaller representative datasets.** A 50-example golden dataset that's carefully curated to represent the distribution of real user inputs often gives you more reliable quality signal than a 500-example dataset with noisy or redundant examples. 50 examples × LLM-as-judge = 100 traces per eval run; 500 examples × 500 traces per run. The 10x smaller dataset is 10x cheaper to evaluate and faster to iterate on. Invest time in dataset curation rather than dataset size — the goal is representative coverage, not raw volume.

The integration of LangSmith evaluations into CI/CD pipelines is documented at https://docs.smith.langchain.com/ and supports gating on evaluation scores before merging. The typical pattern: run a 50-example evaluation on every PR, use an LLM-as-judge evaluator to score outputs, and fail the CI check if the average score drops below a threshold. This keeps quality gates automated and cheap (50 traces per PR check) while catching regressions before they reach production. Configure the CI check to evaluate only when LLM-related code changes are detected (not for documentation or infrastructure changes) to further reduce unnecessary evaluation runs.

Setting up LangSmith for production without hitting quota limits

  1. 1

    Instrument with sampling from day one

    Set LANGCHAIN_TRACING_SAMPLING_RATE to 0.1 (10%) in your production environment configuration from the first deployment. This means 1 in 10 agent invocations gets traced, giving you a representative sample of normal behavior across your user base without consuming quota on every call. Always trace errors regardless of sampling rate by wrapping your chain execution in a try/except block that re-enables the LangSmith tracer when an exception is raised. This 'sample happy paths, trace all errors' pattern is the most cost-efficient observability strategy for stable production applications. Adjust the sampling rate up to 20–30% if you're seeing rare anomalies that aren't being captured at 10%.

  2. 2

    Configure output filtering to exclude large intermediate objects

    Before deploying to production, audit what your chains are storing in LangSmith traces by reviewing a few development traces and checking the size of stored inputs/outputs. Identify any large objects: full retrieved documents, embedding vectors, raw database query results, API responses with extensive metadata. Configure output_keys filtering for each chain to exclude these large fields and store only the final answer, key decision points, and metadata like user_id, session_id, and tool_call_counts. This typically cuts average trace size by 60–80%, both reducing storage burden and keeping you well under the 20 MB per-trace hard limit. Implement output filtering before your first production traffic, not after — retrofitting it is harder than starting clean.

  3. 3

    Separate projects by environment

    Create at minimum three LangSmith projects: production (LANGSMITH_PROJECT=production), staging (LANGSMITH_PROJECT=staging), and evaluation (LANGSMITH_PROJECT=evaluations). Set LANGCHAIN_TRACING_V2=false in your local development environment and in test configuration files to prevent dev/test invocations from consuming quota. Each environment's project gets its own trace count in the LangSmith dashboard, so you can see exactly how much of your monthly quota is production monitoring versus staging versus evaluations. If your evaluation project is eating more than 30% of your monthly quota, that's a signal to reduce evaluation frequency or dataset size.

  4. 4

    Set up evaluation on a budget

    Create a golden dataset of 50 representative examples — not 500. Curate these examples to cover the critical paths through your application: the most common user query types, the most error-prone edge cases, the highest-stakes outputs. Configure an LLM-as-judge evaluator using Claude Sonnet 4.6 ($3/M input, $15/M output) with a concise evaluation prompt (under 300 tokens) that scores outputs on your most important quality dimension. Schedule this evaluation to run weekly, triggered on merges to main rather than on a time schedule. This generates 50 × 2 = 100 traces per evaluation run × 4 runs/month = 400 evaluation traces/month — less than 1% of your Plus plan's 50,000-trace quota, while giving you a weekly quality signal to catch regressions.

  5. 5

    Monitor your monthly usage proactively

    Set a recurring calendar reminder for the 25th of each month to check smith.langchain.com/settings/usage. At 25 days into the month, you have 5 days remaining — enough time to pause non-critical tracing (staging environment, scheduled evaluations) if you're approaching your plan's included quota. If you're regularly hitting 80%+ of your monthly quota before the 25th, it's time to either reduce trace volume (more aggressive sampling, fewer evaluations) or upgrade to a higher plan. Don't wait until you hit the ceiling — on the Developer plan, traces start failing silently when the quota is exhausted, which is worse than having no observability at all. Add a Slack or PagerDuty alert to notify your team when usage crosses 70% of monthly quota.

Frequently Asked Questions

How many traces does LangSmith's free plan include?

LangSmith's Developer (free) plan includes 5,000 traces per month with 14-day retention. There is no overage on the free plan — once you hit 5,000 traces, additional traces are silently dropped until the month resets. This limit is lower than most competitors: Langfuse Cloud's free tier provides 50,000 events/month, and Helicone's free tier provides 10,000 requests/month. The LangSmith Developer plan is designed for initial prototyping and small-scale development work. Production applications generating more than 167 traces per day will hit the ceiling before month-end and should plan for the Plus plan ($39/seat/month) from the start.

What counts as a trace in LangSmith?

A trace in LangSmith is one root run — a single top-level chain execution, agent invocation, or LLM call — plus all the nested child runs it generates. A complex agent that makes 10 LLM calls, invokes 5 tools, and performs 3 retrievals in a single execution is counted as ONE trace, not 18. This is fundamentally different from counting individual LLM API calls. What inflates trace counts is the creation of separate root runs: map/reduce patterns that spawn parallel root runs (instead of nested children), test scripts that invoke chains independently rather than as sub-chains, and evaluation runs (each dataset example generates its own root run). Review your chain architecture for unintentional root run proliferation — this is the most common cause of unexpectedly fast quota consumption.

What is the maximum trace size in LangSmith?

LangSmith enforces a 20 MB hard limit per trace and a 10 MB limit per individual run nested within a trace. Exceeding either limit causes the trace or run to be silently dropped — your application continues running but the trace doesn't appear in LangSmith. The most common causes of large traces are: storing full retrieved documents as tool results instead of document IDs and metadata; capturing raw LLM prompts with 100,000-token context windows; and retaining embedding vectors in trace metadata. Fix: use output_keys filtering to exclude large intermediate objects from storage, store references (IDs, URLs) instead of full content in tool results, and configure your LangSmith tracer to exclude embedding vector outputs from all retrieval steps.

How much does LangSmith Plus cost?

LangSmith Plus costs $39 per seat per month. Each seat includes 50,000 traces per month with 90-day retention, access to LLM-as-judge automated evaluators, and human annotation queues. Traces beyond 50,000 per month are billed at $0.50 per 1,000 additional traces — so a team generating 70,000 traces in a month pays $39 + $10 (for the 20,000 overage traces) = $49 that month. Multi-seat billing: a 5-person team on Plus pays $195/month for the combined 250,000-trace included quota (pooled or per-seat, confirm with LangSmith sales). Check current pricing at smith.langchain.com/settings/plans — pricing has evolved as LangSmith has grown and may have changed since this publication date.

Can I self-host LangSmith to avoid trace limits?

LangSmith does not offer a fully open-source self-hostable version — unlike Langfuse, which is Apache-licensed and can be self-hosted with no trace limits on your own infrastructure. LangSmith's on-premises deployment is available only through the Enterprise plan (negotiated pricing, typically $5,000+/month) via LangGraph Platform. For teams that want unlimited traces via self-hosting, Langfuse (https://langfuse.com/docs) is the open-source alternative: it runs on a Postgres database and a lightweight server container, costs $20–50/month in infrastructure, and imposes no trace or retention limits. The tradeoff is that Langfuse has less deep integration with LangGraph's agent checkpointing and a less mature evaluation workflow compared to LangSmith.

How does LangSmith compare to Langfuse on pricing?

LangSmith's free Developer plan provides 5,000 traces/month with 14-day retention. Langfuse's open-source self-hosted version is free with unlimited traces and unlimited retention if you provide your own infrastructure (typically $20–50/month). Langfuse Cloud's free tier provides 50,000 events/month — note that events and traces are not identical, as complex LangSmith traces may log as multiple Langfuse events. LangSmith's pricing edge over Langfuse is its deeper LangChain/LangGraph integration and more mature human annotation workflows. Langfuse's pricing edge over LangSmith is the open-source self-hosting option and the more generous cloud free tier. For teams not tied to the LangChain ecosystem, Langfuse's cost structure is typically more favorable. For teams using LangGraph for complex agent orchestration, LangSmith's integration depth often justifies the premium.

What retention period does LangSmith Plus offer?

LangSmith Plus offers 90-day trace retention — a significant improvement over the Developer plan's 14-day window. 90 days covers most practical debugging scenarios: a customer complaint from 2 months ago can still be traced, regression bugs that surface intermittently over weeks can be investigated historically, and A/B test traces from the past quarter can be compared. For longer retention requirements — annual customer contracts, compliance mandates for 1+ year retention, audit log requirements — the Enterprise plan is required. Enterprise offers custom retention periods up to 1 year and beyond, negotiated as part of the contract. Retention clocks reset with activity: accessing or annotating a trace refreshes its 90-day window on Plus.

Does running LangSmith evaluations count toward my trace quota?

Yes. Each evaluation run in LangSmith is a trace, and each example in a dataset evaluation generates at least one trace. If you run an LLM-as-judge evaluator on each example, that judge call generates an additional trace. A 200-example dataset with LLM-as-judge scoring = 400 traces per evaluation run. Running this daily for a month = 12,000 evaluation traces — 24% of the Plus plan's 50,000-trace monthly quota. To manage evaluation quota impact: create a dedicated 'evaluations' project to track consumption separately; reduce evaluation frequency to weekly (triggered on main branch merges rather than scheduled daily); use smaller representative datasets (50 well-chosen examples vs 500 noisy ones); and cache LLM-as-judge verdicts for examples that haven't changed since the last evaluation run.

Fewer bad traces start with better prompts.

Our AI Prompt Generator writes structured, observable prompts — named steps, clear tool calls, minimal ambiguous outputs — that make your LangSmith traces cleaner and your evals more reliable. 14-day free trial, no card.

Browse all prompt tools →