Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Qdrant Cloud Quotas and Limits (2026): Free, Standard, Hybrid Cloud, and Enterprise

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Qdrant is an open-source vector database written in Rust, and Qdrant Cloud is its managed offering. Unlike Pinecone's unit-based billing model, Qdrant Cloud prices on cluster resources — primarily RAM — which makes capacity planning feel more like provisioning a traditional database than buying API credits. The implication is that your primary planning input is not query volume but total vector footprint: how much RAM is needed to hold your vectors and their associated payload (metadata) data in memory for fast retrieval. This page documents every published limit and cost figure for Qdrant Cloud as of June 2026.

The most important number to understand before choosing a Qdrant Cloud tier is how much RAM your vector collection will require. A 768-dimensional float32 vector occupies approximately 3 KB of raw storage. One million such vectors require roughly 3 GB of RAM for the vector data alone, not counting payload storage or index overhead. At 1 billion vectors, the requirement is approximately 3 TB of RAM — a figure that moves the workload into Enterprise territory or, more practically, into quantization strategies that reduce memory footprint by 4-16x. The vector DB cost calculator can model Qdrant Cloud cluster costs alongside Pinecone serverless costs for the same workload.

Qdrant Cloud's managed offering is one of three deployment models available for Qdrant: fully managed Cloud clusters, Hybrid Cloud (data on your infrastructure, control plane managed by Qdrant), and self-hosted (you manage everything). Each model makes different trade-offs on control, cost, and operational overhead. This page covers all three, with particular attention to the limits that affect production RAG systems. For comparisons against Pinecone and Weaviate, see the Pinecone vs Weaviate vs Qdrant comparison. For parallel Pinecone limit documentation, see Pinecone quota tiers. All numbers here are sourced from qdrant.tech/pricing and qdrant.tech/documentation/cloud/ and should be verified before procurement.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Qdrant Cloud Tier Comparison (June 2026)

Feature
Dimension
Free
Starter
Standard
Hybrid Cloud
Enterprise / Private Cloud
RAM1 GB~2 GB (entry)2–64 GB+ on-demandCustomer infra (unlimited)Custom / negotiated
vCPU0.5~1Scales with clusterCustomer infraCustom
Nodes11Multi-node availableMulti-node on customer infraMulti-node, HA
RegionsSingle regionSingle regionMultiple regions availableCustomer-definedCustomer-defined
Inactivity policyPaused after 7 daysNo inactivity pauseNo inactivity pauseNo inactivity pauseNo inactivity pause
SupportCommunity onlyEmail supportStandard SLA + ticketDedicated supportEnterprise SLA + dedicated
SLA (uptime)NoneBest effortStandard SLANegotiatedCustom SLA + remedies
Data locationQdrant cloudQdrant cloudQdrant cloudCustomer cloud/on-premCustomer infra (air-gapped)
RBACNoNoBasic API keyFull RBACFull RBAC
Entry pricingFree~$25-35/mo$25-100+/moFrom ~$499/moCustom contract

Sources: qdrant.tech/pricing, qdrant.tech/documentation/cloud/. Prices in USD, approximate — Qdrant's configurator generates exact pricing based on RAM/CPU/node selections. All figures as of June 2026; verify before procurement as cloud pricing changes frequently.

Free Tier Mechanics: 1GB RAM, 0.5 vCPU, and the 7-Day Inactivity Pause

Qdrant Cloud's free tier provides a single-node cluster with 1 GB of RAM and 0.5 vCPU, deployed in a single region. This is a meaningful resource allocation for a vector database — substantially more than a toy experiment allows, but not enough for a production workload with more than a few hundred thousand typical vectors. At 768 dimensions and float32 precision, 1 GB of RAM comfortably holds approximately 300,000–350,000 vectors when accounting for collection index overhead. That is sufficient for proof-of-concept work, tutorials, and small internal tools.

Like Pinecone's Starter tier, Qdrant Cloud's free cluster is paused after seven consecutive days of inactivity. A paused cluster does not respond to API requests until it is manually resumed through the Qdrant Cloud console. Your collection data, vectors, and payload fields are preserved — pausing is not deletion. The cluster typically resumes within 30-60 seconds of the resume action. Importantly, the definition of 'activity' in Qdrant Cloud's inactivity timer appears to include both read (search) and write (upsert/update) operations. A passive cluster that receives no API traffic will pause regardless of whether it was recently created.

The prevention strategy is the same as for Pinecone: a lightweight keep-alive cron job that issues a simple search request to your collection on a schedule shorter than seven days. A five-day interval provides comfortable margin. Unlike production databases where idle connections consume resources, a single Qdrant search query with k=1 against a small collection completes in milliseconds and consumes negligible compute. The free tier is appropriate for development environments, personal projects, and initial prototyping. Any application serving real users should be on a paid tier — not primarily because of the inactivity pause, but because of the absence of an SLA and the single-node, single-region architecture that makes the free tier unsuitable for reliability-sensitive workloads.


RAM-Based Capacity Planning: How Much Memory Does Your Collection Need?

Because Qdrant Cloud bills on cluster RAM rather than query volume or storage GB, the foundational planning question is: how much RAM will my collection require? The core calculation starts with vector dimensions and precision. A float32 vector stores each dimension as a 4-byte floating-point number. A 768-dimensional float32 vector therefore occupies 768 x 4 = 3,072 bytes, approximately 3 KB. One million such vectors require approximately 3 GB of RAM for raw vector data. However, the real memory footprint is higher once you add HNSW graph index overhead (typically 1.5-2x the raw vector size for dense HNSW indexes) and payload (metadata) storage.

A realistic memory estimate for one million 768-dimensional float32 vectors with typical RAG metadata (document ID, chunk index, short text snippet, categorical filters totaling a few hundred bytes per vector) is 8-12 GB of RAM including index overhead. This puts the workload at the upper edge of the Standard entry tier or into a Standard multi-GB configuration. For one billion vectors of the same type, the naive estimate is approximately 3 TB of RAM — a number that makes quantization not optional but necessary. Qdrant's documentation provides more precise memory formulas accounting for HNSW parameters (m and ef_construction) that affect the graph index size, and it is worth running those calculations against your specific embedding model dimensions before cluster sizing.

Payload storage (Qdrant's term for metadata) does not have a hard per-vector byte limit in the same way Pinecone does, but it contributes to total RAM consumption and therefore to cluster sizing. Teams storing large payloads — full document text, HTML, structured JSON blobs — will find their RAM needs dominated by payload rather than vector data. The recommended architecture is the same as for Pinecone: store only identifiers and short filterable fields in Qdrant payload, and retrieve full content from an external document store. The build RAG with Pinecone tutorial covers this pattern in detail; the architecture applies equally to Qdrant.


Quantization Tiers: Scalar, Product, and Binary — Memory Savings and Quality Trade-offs

Qdrant supports three quantization strategies that reduce the RAM footprint of stored vectors in exchange for a manageable reduction in recall quality. Scalar quantization (SQ, also called int8 quantization) reduces each float32 value (4 bytes) to an int8 value (1 byte), cutting vector storage memory to 25% of the unquantized size — a 4x reduction. Recall quality at int8 quantization is generally excellent, with most benchmarks showing less than 1-2% loss on standard datasets like BEIR when combined with Qdrant's rescoring mechanism (which re-ranks the approximate results using full-precision vectors for a small set of top candidates).

Product quantization (PQ) segments each vector into subvectors and encodes each subvector as a centroid index. The compression ratio depends on the number of segments and centroid bits, but PQ can achieve 8-16x memory reduction. The trade-off is a larger recall degradation than scalar quantization — typically 5-15% depending on dataset characteristics, compression ratio, and whether rescoring is enabled. PQ is appropriate when you have reached the limits of scalar quantization and need further memory reduction, and when your application can tolerate slightly lower recall in exchange for a significantly cheaper cluster.

Binary quantization encodes each vector dimension as a single bit, achieving a 32x reduction from float32 (from 4 bytes to 0.125 bytes per dimension). This is the highest compression ratio available in Qdrant and comes with the most significant quality impact — recall losses of 10-20% or more on general text embedding models. However, binary quantization has shown surprisingly good results on specific high-dimensional embedding models, particularly those from Cohere and some OpenAI models, where the bit-level structure preserves semantic similarity more reliably. Qdrant's documentation recommends testing binary quantization against your specific embedding model and dataset before relying on general benchmark figures. As a rough guide: use scalar quantization by default, evaluate product quantization when a 4x memory reduction is insufficient, and treat binary quantization as an experimental option for specific high-dimensional models. The Pinecone vs Weaviate vs Qdrant comparison includes a section on quantization support across these three databases.


Hybrid Cloud Architecture: Your Data, Qdrant's Control Plane

Qdrant Hybrid Cloud is the deployment model that sits between fully managed cloud clusters and self-hosted Qdrant. In Hybrid Cloud, you run the Qdrant data plane (the actual vector database nodes) on your own infrastructure — AWS, GCP, Azure, or bare-metal Kubernetes — while Qdrant's cloud control plane handles cluster provisioning, monitoring, and management operations through a secure connection to your environment. This means your vector data never leaves your infrastructure, but you retain the managed service experience for operational tasks.

The Hybrid Cloud model is compelling for organizations with data residency requirements, compliance mandates, or existing cloud commitments that make running the data tier on Qdrant's shared infrastructure problematic. A healthcare company with PHI in their RAG corpus, a financial services firm with regulatory restrictions on off-premises data storage, or an enterprise with significant AWS reserved capacity they want to utilize — all of these organizations might find Hybrid Cloud more appropriate than standard managed cloud clusters. The entry price for Hybrid Cloud is approximately $499/month as of June 2026, though this figure should be verified with Qdrant's sales team as it reflects a minimum commitment rather than a per-resource price.

The practical operational model for Hybrid Cloud is: Qdrant Cloud's UI and API work identically to the standard managed offering from the developer's perspective. You create collections, run searches, and manage payloads through the same SDK and REST API. The difference is that the Kubernetes workloads running those operations are in your VPC, not Qdrant's. This means your infrastructure team needs to maintain the Kubernetes environment where Qdrant runs — cluster health, node capacity, network policies. Qdrant provides a Helm chart and deployment documentation for the data plane components. Teams that lack Kubernetes operational experience should weigh whether the compliance benefit of Hybrid Cloud outweighs the added operational complexity relative to the fully managed standard tiers.


Enterprise / Private Cloud: Air-Gapped, Full RBAC, and Custom SLA

Qdrant's Enterprise tier, also described in their documentation as Private Cloud, extends Hybrid Cloud with the option for fully air-gapped deployments. In a fully air-gapped configuration, there is no connection to Qdrant's control plane at all — the deployment is entirely customer-managed, including orchestration, upgrades, and monitoring. This is the appropriate model for government classified workloads, financial trading systems with strict network isolation requirements, and other environments where any external network dependency is unacceptable.

Enterprise includes full RBAC (role-based access control), which allows fine-grained permission assignment at the collection or API key level. This matters for multi-team organizations where different teams own different collections and should not have read or write access to each other's data. Standard and Starter tiers use API key-based access control without collection-level permission scoping. Enterprise SLA terms include uptime guarantees with financial remedies, dedicated customer success and support contacts, and custom contract terms including negotiated data processing agreements for GDPR compliance.

Enterprise pricing is entirely custom and negotiated. Qdrant does not publish Enterprise pricing, and third-party estimates vary widely. Organizations evaluating Enterprise should engage Qdrant's sales team with a clear picture of their vector count, query volume, deployment environment, and compliance requirements before requesting pricing — the contract structure depends heavily on these inputs. As a general observation, fully managed Enterprise vector database contracts across the industry (Pinecone, Qdrant, Weaviate) typically start in the low five figures per year for the smallest deployments and scale from there.


Self-Hosted Qdrant vs Qdrant Cloud: The Real Trade-offs

Because Qdrant is fully open source (Apache 2.0 license), self-hosting is a genuine option for teams with infrastructure capacity. The self-hosted path gives you unlimited collections, unlimited RAM (bounded only by your hardware), and no inactivity policies. For teams with existing Kubernetes infrastructure or cloud spend they want to direct toward their own resources rather than a managed service, self-hosted Qdrant on a properly sized EC2 instance or GKE node pool can be significantly cheaper than Qdrant Cloud at medium to large scale. The catch is operational overhead: you are responsible for upgrades, backups, monitoring, and availability.

The self-hosted versus managed trade-off converges on one question: what is your team's operational bandwidth, and what is the cost of an unplanned outage? Qdrant Cloud managed tiers abstract away cluster management, provide automated backups, handle version upgrades, and include monitoring. For early-stage teams where engineering time is the scarce resource, the managed service cost is almost always worth it — even if the raw compute cost is higher than self-hosting. For larger organizations with dedicated platform engineering teams, self-hosting on a Kubernetes cluster they already operate can reduce costs substantially at scale. A concrete comparison at 16 GB RAM: a Qdrant Cloud Standard cluster at that size runs approximately $150-250/month. An equivalent AWS EC2 r6g.xlarge (32 GB RAM, 4 vCPU) runs approximately $130-160/month reserved — but this is compute cost only. Add storage (EBS gp3 for snapshots and payload persistence), networking (data transfer within and across AZs), monitoring (CloudWatch or a third-party APM), and the engineering hours for upgrades and incident response, and the self-hosted option's actual cost easily equals or exceeds the managed option at this scale. Self-hosting at 16 GB RAM saves money only if you already operate Kubernetes and have the capacity to absorb Qdrant maintenance without dedicated headcount. See the RAG architecture decision tree for guidance on when to choose self-hosted vs managed as part of a broader system design.

A hybrid approach that many teams use during growth: start on Qdrant Cloud free tier for development, graduate to Qdrant Cloud Standard for early production, and evaluate the self-hosted or Hybrid Cloud option when monthly Qdrant Cloud bills reach a threshold where the infrastructure investment pays off in under 12 months. This deferral strategy avoids premature infrastructure investment without locking you into the managed service indefinitely. The migration path from Qdrant Cloud to self-hosted is straightforward because the Qdrant API and data format are identical between the two — there is no proprietary lock-in at the data layer. The RAG architecture decision tree covers the self-hosted vs managed decision in more depth.


Sourcing, Verification, and Live Data Caveats

All figures in this document are sourced from qdrant.tech/pricing and qdrant.tech/documentation/cloud/, last verified in June 2026. Qdrant Cloud's pricing is configured through an interactive cluster configurator that shows exact monthly costs based on RAM, CPU, and node count selections — the figures presented here for Starter and Standard tiers are illustrative ranges, not fixed prices, because the actual price depends on the exact cluster configuration you select.

The RAM-to-vector capacity estimates (3 KB per 768-dim float32 vector, 3 GB RAM per 1M vectors) are derived from Qdrant's official documentation and memory calculation guides. These are starting estimates, not guaranteed numbers — actual memory consumption depends on HNSW parameters, payload size, quantization settings, and Qdrant version. Teams making sizing decisions for production clusters should use Qdrant's memory calculator tool rather than relying solely on this document's estimates.

Qdrant's changelog at github.com/qdrant/qdrant and their blog at qdrant.tech/blog are the best sources for limit changes and pricing updates. The quantization options (scalar, product, binary) and their characteristics have been stable across recent Qdrant versions, but performance characteristics improve with each release. The Hybrid Cloud entry price of approximately $499/month was cited in Qdrant's sales materials as of June 2026 but is a minimum commitment figure subject to change. Enterprise pricing is not published and was not included in specific dollar terms in this document deliberately.

A note on how Qdrant's limit documentation differs from Pinecone's: Pinecone publishes explicit per-plan limits as a structured table at docs.pinecone.io/reference/quotas-and-limits — number of indexes, namespaces per index, metadata byte cap, all enumerated. Qdrant's limits are primarily emergent from resource allocation rather than policy caps. There is no 'max collections per cluster' in a policy document because Qdrant doesn't enforce one — your practical limit is RAM. This makes Qdrant's documentation harder to survey but the limits more predictable in practice: you know exactly why you hit a constraint (RAM) and exactly how to resolve it (scale up or quantize). For a side-by-side documentation comparison with Pinecone's explicit limit table, see Pinecone quota tiers. For a broader comparison including Weaviate's limit model, see the Pinecone vs Weaviate vs Qdrant comparison.

Sizing a Qdrant Cloud Cluster for Your RAG Workload

  1. 1

    Estimate RAM from vector count and embedding dimensions

    Start with your expected vector count and embedding model. Calculate raw vector memory: vector_count x (dimensions x 4 bytes). Add estimated payload memory (metadata per vector x vector count). Add HNSW index overhead: multiply raw vector memory by 1.5 to 2x for a conservative estimate. Sum these three figures to get your base RAM requirement. Add 20-30% headroom for growth. For a worked example: 1 million vectors at 768 dimensions = 3 GB raw vectors + ~0.5 GB payload + ~5 GB HNSW overhead = ~8.5 GB total RAM before headroom. Select a Qdrant Cloud cluster with at least 10-12 GB RAM for this workload.

  2. 2

    Choose the right quantization level for your workload

    If your RAM estimate exceeds your target cluster tier's RAM, evaluate quantization before scaling to a larger cluster. Start with scalar quantization (int8): it reduces vector RAM by 4x with minimal recall loss and is appropriate as a default for most RAG workloads. Enable Qdrant's rescoring feature alongside scalar quantization to recover most of the recall loss. If scalar quantization is insufficient, evaluate product quantization with rescoring. Reserve binary quantization for high-dimensional models (1536+ dimensions) where you have validated it preserves recall quality for your specific embedding model. Always measure recall on a representative sample of your actual queries before deploying quantization changes to production.

  3. 3

    Pick the right deployment model: Cloud, Hybrid Cloud, or self-hosted

    Choose Qdrant Cloud managed for most workloads — zero operational overhead, simple pricing, no Kubernetes expertise required. Choose Hybrid Cloud if you have data residency requirements (PHI, financial data, GDPR Article 46 DPA), an existing cloud commitment you want to leverage, or a compliance audit that requires data to stay within your infrastructure. Choose self-hosted if you have platform engineering capacity, are at a scale where managed costs are materially higher than infrastructure costs, and can absorb the operational responsibility for upgrades and availability. Do not choose self-hosted to avoid Qdrant Cloud costs during early development — the time cost of cluster management typically exceeds the money saved until you are at substantial scale.

  4. 4

    Configure inactivity prevention on the free tier

    If you are using Qdrant Cloud's free tier for development, set up a keep-alive mechanism before the first week of usage. Create a scheduled task (cron, GitHub Actions, or a cloud function) that runs every five days and issues a simple search request to your collection. A search with limit=1 and a random query vector is sufficient — the goal is to register API activity, not to retrieve meaningful results. Store the keep-alive script in your repository alongside your application code so it is not forgotten when teammates inherit the project. Document the free tier's inactivity policy in your project README so the team understands why the keep-alive exists and does not delete it.

  5. 5

    Migrate from self-hosted Qdrant to Qdrant Cloud

    Migrating from a self-hosted Qdrant instance to Qdrant Cloud is straightforward because the API and data format are identical. The recommended approach: (1) create a Qdrant Cloud cluster with matching configuration (same dimensions, distance metric, and HNSW parameters as your self-hosted collection); (2) snapshot your self-hosted collection using Qdrant's snapshot API; (3) upload the snapshot to Qdrant Cloud using the snapshot restore endpoint; (4) validate collection counts and run sample queries against both instances to confirm consistency; (5) update your application connection strings; (6) decommission the self-hosted instance after a validation window. For collections over a few million vectors, the snapshot transfer can take significant time depending on network bandwidth between your self-hosted environment and Qdrant Cloud's ingest endpoint.

Frequently Asked Questions

How many collections can I create in a Qdrant Cloud cluster?

Qdrant does not enforce a hard limit on the number of collections per cluster. The practical limit is determined by available RAM — each collection carries memory overhead for its HNSW index and payload storage, independent of how many vectors it contains. A cluster with 4 GB of RAM supporting one large collection has less margin for additional collections than a cluster with the same RAM split across smaller collections. For development and testing, many teams create dozens of collections on a single cluster without issues. For production, the recommended practice is one collection per logical workload with namespacing for internal partitioning.

What is the maximum payload (metadata) size per vector in Qdrant?

Qdrant does not publish a hard per-vector payload size limit equivalent to Pinecone's 40KB cap. Payload fields are stored alongside vectors and contribute to total RAM consumption. Practically, very large payloads (multi-kilobyte text blobs per vector) will inflate your RAM requirements substantially and are discouraged for the same reason as in other vector databases: store identifiers in the vector database and retrieve full content from a separate document store. Qdrant's payload supports arbitrary JSON fields including nested objects and arrays, but overly large payloads will degrade performance as they increase memory pressure and serialization overhead.

What happens if my Qdrant Cloud free cluster is paused?

A paused free cluster returns errors for all API requests. Your data is not deleted. Resume the cluster from the Qdrant Cloud console — the Resume button appears on the cluster detail page when the cluster is in a paused state. Resume typically takes under a minute. After resuming, the inactivity timer resets to zero, and the cluster will pause again after another seven days of no activity. If you need your cluster always available, upgrade to any paid tier — inactivity pausing applies only to the free tier.

How does scalar quantization affect recall quality in Qdrant?

Scalar quantization compresses each float32 value (4 bytes) to int8 (1 byte), reducing vector memory by 4x. Without rescoring, recall@10 typically drops by 2-5% depending on the embedding model and dataset. With Qdrant's rescoring feature enabled — which re-ranks approximate quantized results using full-precision vectors for the top candidates — recall quality is usually within 1% of unquantized retrieval. Rescoring adds a small amount of latency but is almost always worth enabling. Qdrant's documentation includes benchmark figures for scalar quantization with and without rescoring on several standard datasets.

How is Hybrid Cloud different from Enterprise Private Cloud?

In Hybrid Cloud, Qdrant's control plane connects to your infrastructure to manage the data plane. There is a network connection between Qdrant's management systems and your cluster, which means Hybrid Cloud does not satisfy truly air-gapped requirements. In Enterprise Private Cloud (fully air-gapped), there is no connection to Qdrant's control plane — you manage cluster operations entirely through internal tooling. Private Cloud requires more operational investment because Qdrant's managed service cannot reach your environment to assist with upgrades or issue diagnosis. The choice depends on your security requirements: data residency satisfied by Hybrid Cloud, full network isolation requires Private Cloud.

Is Qdrant's RAM-based pricing cheaper than Pinecone's serverless pricing?

It depends strongly on your query volume and vector count. Qdrant Cloud charges for allocated RAM regardless of query volume — a 4 GB cluster costs roughly the same whether you run 100 queries or 1 million queries per month. Pinecone serverless charges per read unit and per write unit, which is cheaper at low query volumes but higher at high query volumes. For a workload with 1 million vectors and 10,000 daily queries, Pinecone serverless read unit costs are very modest, making Pinecone potentially cheaper. For a workload with 1 million vectors and 500,000 daily queries, Pinecone's read unit costs may exceed a comparable Qdrant RAM allocation. Use the vector DB cost calculator to model your specific scenario.

Can I resize a Qdrant Cloud cluster without data loss?

Qdrant Cloud Standard tier clusters support on-demand resizing. You can increase RAM and CPU allocation through the cluster configuration UI without recreating the cluster or re-upserting data. Downsize operations (reducing RAM below the cluster's current usage) will fail to protect data integrity. The free tier cluster cannot be resized — it is a fixed 1 GB RAM, 0.5 vCPU configuration. To increase capacity from the free tier, you upgrade to a paid plan and select a new cluster configuration, which requires creating a new cluster and migrating your data.

Estimate Your Qdrant Cloud Cluster Size Before You Deploy

Use the vector DB cost calculator to compare Qdrant Cloud RAM-based pricing against Pinecone serverless read/write unit pricing for your actual workload. Enter your vector count, dimensions, and expected daily query volume to get a side-by-side monthly cost estimate.

Browse all prompt tools →