ScrapingTest logoscrapingtest

Apify

Marketplace breadth wins. Amazon at 80 percent SR and a $7.04/1K blended cost are the trade-offs.

Published 2026-06-05 · 2627-word independent review · ScrapingTest Research

────────────────────────────────────────────────────────────────────────────────────────────────────

Verdict

Grade: B+. Excellent breadth and AI-agent fit. Amazon weakness, high latency, and compute-unit unpredictability hold back an A.

Best for

  • AI agent and RAG pipelines (hosted MCP at mcp.apify.com, RAG Web Browser, Website Content Crawler with LLM-friendly markdown)
  • Long-tail and niche targets where a pre-built community Actor already exists in the 37,223-Actor marketplace
  • Teams that want to deploy custom scrapers on managed infrastructure via the open-source Crawlee SDK
  • Workflows that need scheduling, datasets, key-value storage, and webhooks bundled with the scraper
  • Buyers willing to trade per-request predictability for per-Actor flexibility (pay-per-event or pay-per-result)

Avoid if

  • You need a single flat per-request URL endpoint with deterministic cost. Compute-unit pricing is hard to forecast.
  • Your workload is Amazon-heavy at sustained load. We measured 80 percent SR on amazon.com with 14.5s avg and 21.8s p90 response times.
  • You need low-latency scraping (sub-3s p50). Apify's blended avg response is 19.5s across our 20 domains.
  • You want pay-only-for-success billing. Apify charges compute-units regardless of HTTP outcome on most Actors.

What we found in the lab

Apify finished #4 overall in our 5-pass audit with a 95.25 percent success rate averaged across 20 domains (97.14 percent under the original audit's stricter calculation). Structured-data coverage hit every one of the 20 targets through dedicated marketplace Actors. The platform was operational on 20/20 with no GENUINE FAILs recorded after lock-in. Only three other providers (Scrape.do, BrightData, and a couple of infrastructure-only players) achieved zero unrecoverable failures. The reason is structural. Apify's 37,223-Actor marketplace (per apify.com/store header at time of writing) means almost every benchmark domain has at least one pre-built scraper, often more than one, and authors are incentivized to maintain them because they earn rental fees.

The numbers degrade when you go domain by domain. Six of 20 targets fell below 100 percent SR: amazon.com (80 percent, 16/20 trials), g2.com (75 percent), walmart.com (80 percent), trustpilot.com (85 percent), youtube.com (95 percent), and idealista.com (90 percent). Apify's average response time across all 20 domains was 19.5 seconds, the highest of any top-5 provider in the audit. Long-tail Actor runs drove that number: booking.com (52.4s avg, 78.6s p90), tripadvisor.com (54.4s avg, 81.5s p90), and walmart.com (56.0s avg, 84.0s p90). Cost-per-thousand varied wildly. trustpilot.com came in at $0.47/1K. reddit.com hit $44/1K. The 20-domain blended average was $7.04/1K, roughly 6 to 30 times what flat-rate competitors charge.

Apify wins on coverage and AI-agent ergonomics. It loses on latency and cost predictability. If you're building a price-monitoring pipeline that needs sub-5s response, you're paying a 4 to 10 times latency tax for the marketplace breadth. If you're feeding an LLM agent that just needs structured JSON back from a niche site, Apify is often the only place that has a ready-made Actor.

Per-domain breakdown

DomainSRNotes
amazon.com80% (16/20)Weakest of the e-commerce set. 14.5s avg, 21.8s p90. Slow even when it works. $5/1K via marketplace Amazon Actor. Apify did not make the top-5 of our 500-trial Amazon stress test.
g2.com75% (15/20)DataDome wall hits Apify Actors on G2 the same way it hits flat-API providers. $13.75/1K is the second-highest cost in the Apify row. Community Actors charge premium for hard targets.
reddit.com100% (20/20)Works perfectly but at $44/1K, the most expensive single domain we measured for Apify. Reddit Actor authors charge a premium because the target's API restrictions push them to browser rendering.
trustpilot.com85% (17/20)Anti-bot tightening. $0.47/1K is Apify's cheapest domain. That combination (cheap and flaky) suggests an HTTP-only Actor that struggles when Trustpilot escalates.
walmart.com80% (16/20)56.0s avg, 84.0s p90, by far the longest response time we measured. Apify's Walmart Actor is doing a full browser pass to clear the bot challenge.
indeed.com100% (20/20)Works at $13.25/1K, third-most expensive domain. 20.8s avg response, long but reliable. Jobs Actors are well-maintained on Apify.
instagram.com100% (20/20)100 percent at $2.60/1K, 6.6s avg. One of Apify's strongest wins versus competitors that block Instagram entirely (ScraperAPI, Firecrawl). A real differentiator.
linkedin.com100% (20/20)100 percent at $4/1K, 8.9s avg. Zyte returns 451 on LinkedIn (account policy). Firecrawl returns 403. Apify Actors handle it cleanly. Another marketplace-breadth win.

Amazon 550-trial stress test

Apify did not qualify for our 500-trial Amazon stress test. The stress benchmark targeted only the five providers that hit 100 percent SR on amazon.com in the original 30-trial audit (Scrape.do, ScrapingBee, ScrapingDog, Decodo, Zyte). Apify scored 80 percent (16/20 trials) on amazon.com in the audit, which dropped it out of the top-5 stress tier. The audit numbers tell the story. 14,542ms average response time and 21,813ms p90 mean that even on successful runs, Apify is 3 to 7 times slower than Scrape.do (3.5s p50) or Decodo (3.1s p50), and you're paying $5/1K versus their $0.11 to $0.50/1K. Per the amazon-stress REPORT.md, Amazon's new defense (the Akamai bot-manager `bm-ver` interstitial, a 2.2KB shell page with meta-refresh) hit Zyte and ScrapingDog's search endpoints. Apify Actors that egress through datacenter ASNs are equally exposed. The residential-backed marketplace Actors evade it but cost considerably more per result. Reviews `/product-reviews/B07FZ8S74R` are now login-walled across all providers including Apify, per the same report. For Amazon-heavy workloads, our data says go elsewhere.

Pricing deep dive

Apify's pricing has three stacked layers that compound. First, the platform subscription tier (Free $0, Starter $29/mo, Scale $199/mo, Business $999/mo) sets your monthly platform credit allowance, RAM ceiling, and concurrency limit. Second, compute-unit consumption is metered as memory times runtime times CPU per Actor run, billed at $0.20/CU on Free/Starter, $0.16/CU on Scale, $0.13/CU on Business. Third, per-Actor rental or pay-per-result fees are set independently by each marketplace author (typical: $20/mo rental or $8 per 1,000 results). On top of that you pay separately for proxies (residential starts at $8/GB on Free/Starter, steps down to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier; SERPs $2.50/1K dropping to $1.70/1K at Business; datacenter $0.60 to $1.00 per IP after the included pool), storage ($1.00 per 1,000 GB-hours on Free/Starter), and external data transfer ($0.20/GB). The headline 'starts at $29/month' leaves a lot out: $29 buys $29 of platform credits and a 32 GB / 32 concurrency ceiling. Anyone running a serious scraping workload will burn through that before consuming a third of the monthly cycle. Existing scrapingtest content listed Starter at $39/mo. The live pricing page shows $29/mo as of this fetch. Flagging the divergence for record.

Plans

PlanPriceVolumeConcurrencyWhat unlocks
Free$0$5/mo platform credits25 concurrent runs8 GB RAM max, community support, 5 datacenter proxy IPs included
Starter$29/mo (pricing page; existing scrapingtest content showed $39, flagging divergence)$29 in platform credits + pay-as-you-go beyond32 concurrent runs32 GB RAM max, $0.20/CU compute, chat support, 30 datacenter IPs included then $1/IP, residential $8/GB
Scale$199/mo$199 in platform credits + PAYG128 concurrent runs128 GB RAM max, $0.16/CU compute (-20%), priority chat support, 200 datacenter IPs then $0.80/IP, residential $7.50/GB, SERPs at $2/1K
Business$999/mo$999 in platform credits + PAYG256 concurrent runs256 GB RAM max, $0.13/CU compute (-35%), dedicated account manager, 500 datacenter IPs then $0.60/IP, residential $7/GB, SERPs $1.70/1K
EnterpriseCustomCustom credit allowanceCustom (>256)Custom SLA, volume compute discount, dedicated infrastructure available on request

Cost multipliers

Apify's effective cost per request stacks five meters simultaneously. (1) **Compute unit** equals memory_GB times runtime_hours times cpu_share, billed at $0.20/CU (Free/Starter), $0.16/CU (Scale), $0.13/CU (Business). 1 CU equals 1 GB RAM times 1 hour. (2) **Residential proxy** starts at $8/GB on Free/Starter, drops to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier. That's smaller than competitors like Bright Data offer at equivalent commitments. (3) **Datacenter proxy** uses an included pool, then $1/IP (Starter), $0.80 (Scale), or $0.60 (Business) per additional IP. (4) **SERPs proxy** is $2.50/1K (Free/Starter), $2.00/1K (Scale), $1.70/1K (Business). (5) **Third-party Actor fees** are set independently per Actor by community authors. Typical rental is $20/mo flat. Typical pay-per-result is $8 per 1,000 results. Outliers range from free (Apify-built native Actors) to $50+/mo for premium niche Actors. Concurrent runs add $5/run beyond plan ceiling. Extra Actor RAM costs $1/GB. Priority support add-on is $100. Personal training is $150/hour. Storage charges layer separately at $1.00 per 1,000 GB-hours on Free/Starter (less on higher tiers), with external data transfer at $0.20/GB.

Hidden costs (not on the pricing card)

Effective cost per workload

Amazon product monitoring, 100K requests/month
Real cost: ~$500/mo (100K times $5/1K from our audit) + Starter $29 base = $529/mo headline. Reality is higher because 20 percent of trials fail and consume CUs without returning data, so cost-per-successful-result is closer to $6.25/1K = $625/mo
Why: Our audit measured Apify Amazon at $5/1K and 80 percent SR. Compute consumed on the 20 percent failed trials still bills, inflating effective cost by about 25 percent. Going to a residential-backed Actor would push proxy cost up significantly.
Reddit thread scraping, 50K requests/month
Real cost: ~$2,200/mo on Apify (50K times $44/1K) versus ~$25/mo on a flat-rate provider that supports Reddit
Why: Reddit is the worst-case domain for Apify in our data. $44/1K is the headline number. The Reddit Actor is browser-rendering each thread, which compounds memory times runtime to roughly 20 CUs per 100 pages. For Reddit-heavy workloads, Apify is the wrong tool.
Long-tail niche site scraping, 50 distinct domains, 2K pages each = 100K req/mo
Real cost: ~$700/mo blended (~$7/1K average) + Starter $29 = ~$729/mo
Why: This is Apify's sweet spot. The 20-domain blended average of $7.04/1K applies. No competitor can cover 50 distinct niche sites without custom code, so the per-request premium buys real productivity. Cheaper than building 50 custom scrapers in-house.
RAG pipeline ingesting 200K pages/month via Website Content Crawler
Real cost: ~$40 to $200/mo just for the crawler ($0.20 to $1.00/1K pages on raw HTTP, $0.50 to $5/1K on headless browser per Apify docs) + plan base
Why: Apify ships this Actor specifically for LLM ingestion. Markdown output, JSON-LD, semantic structure preserved. At raw-HTTP $0.20/1K pages it's price-competitive with Firecrawl ($1/1K) and cheaper than Bright Data SERPs. Real cost lives in the headless-browser tier when pages need JS.
AI agent making 10K MCP tool calls/month via mcp.apify.com
Real cost: Pricing not specified on the MCP docs page. Billed against your platform credit allowance, so effective cost depends entirely on which Actors the agent ends up calling.
Why: Per docs.apify.com/platform/integrations/mcp: 'Monitor your API usage through Apify Console to stay within your plan limits.' The MCP server itself is free. The underlying Actor runs consume CUs and any per-result fees. Agent costs are hard to forecast. Apify's transparency here is weaker than competitors who quote flat per-call rates.

Features deep dive

Core features

Actor Marketplace (37,223 Actors)

Containerized serverless programs running on Apify's cloud. The store at apify.com/store currently shows 'Browse 37,223 Actors' in its header (up from the ~22,000 cited in older content). Categories span social media, AI, agents, lead generation, e-commerce, and SEO tools. Both Apify-built native Actors and third-party community Actors. Source: apify.com/store

Our take: Breadth is the product. Our audit measured Apify at 20/20 dedicated-endpoint coverage on the benchmark. Only Bright Data matched it. Where flat-API providers fall back to a generic /scrape endpoint, Apify usually has a domain-specific Actor with already-encoded selectors. Quality is uneven (Actor authors set their own maintenance cadence), but the floor is higher than starting from scratch.

Crawlee SDK (open source)

JavaScript and Python crawler framework. 'Build reliable web scrapers. Fast.' (crawlee.dev tagline). Provides HTTP-based crawling, Cheerio DOM parsing, Puppeteer/Playwright browser automation, JSDOM/LinkeDom, plus AI-powered StagehandCrawler. Includes proxy management, session handling, request queueing, data storage. Deployable on AWS Lambda, GCP Cloud Run, and the Apify Platform. Source: crawlee.dev

Our take: Crawlee is useful even if you never touch Apify's cloud. It's the de facto modern Node.js scraping framework, replacing aged libraries like Apify SDK v2 and Scrapy-for-JS. Free, MIT-licensed, no lock-in. A maintainer can build a custom scraper in Crawlee, deploy on their own infra, and only push to Apify cloud when they need the integrations.

Hosted MCP Server (mcp.apify.com)

Apify's Model Context Protocol server exposes Actor discovery and invocation as tools to AI agents. Supports Claude Desktop, Cursor, VS Code Copilot agent mode, local stdio via Node.js. Two auth modes: OAuth browser flow and bearer token. SSE transport deprecation announced for April 1, 2026. Source: docs.apify.com/platform/integrations/mcp

Our take: Apify is one of the only scraping platforms that ships a real hosted MCP server. Decodo has an open-source one on GitHub. Scrape.do, Bright Data, and most others have nothing. For agentic workflows where the LLM picks which Actor to call based on the user query, this is the path of least resistance. Pricing is the weak spot: MCP docs don't quote per-call rates, just 'monitor your usage'.

Storage primitives (datasets, key-value, request queue)

Three storage types: Datasets (append-only structured records, browsable and exportable to CSV/JSON/XLSX), Key-Value Stores (arbitrary binary blobs), and Request Queue (URL queue with deduplication). Storage charged at $1.00 per 1,000 GB-hours on Free/Starter, less on higher tiers. Source: docs.apify.com/platform/storage

Our take: More plumbing than a flat-API competitor gives you. The dataset is where Actor output lands by default and where downstream consumers (webhooks, integrations, SDKs) read from. Useful if you want to decouple scrape-time from consume-time. Wasteful if you just want a synchronous request/response.

Schedules and webhooks

Cron-style scheduler for periodic Actor runs. Webhooks fire on Actor lifecycle events (run started, succeeded, failed). Plus integrations into Zapier, Make, n8n, LangChain, LlamaIndex, Pinecone, Milvus, OpenAI Assistants. Source: docs.apify.com/platform/schedules and docs.apify.com/platform/integrations

Our take: Standard for a managed platform. Worth noting because most flat-API competitors (ScraperAPI, ScrapingBee, ZenRows, Scrape.do at the base tier) leave scheduling to you. Apify's scheduler was reliable in our experience. We never lost a run during the audit period.

Proxy pool

Apify operates three proxy categories: Datacenter, Residential, and Google SERPs Proxy. None publish a total IP count. Per docs.apify.com/platform/proxy/datacenter-proxy, datacenter proxies use intelligent rotation: 'For each HTTP/S request, the proxy takes the list of all available IP addresses and selects the one used the longest time ago for the specific hostname.' Persistent sessions are supported with a 26-hour lifespan that resets on each use, so a daily-accessed session effectively never expires. Datacenter IPs are billed as included pools per plan tier (5 IPs on Free, 30 on Starter, 200 on Scale, 500 on Business) plus $0.60 to $1.00 per additional IP, with country-level geo-targeting (no specific country list published). Residential proxies use a 1-minute sticky session model. 'A single IP address is assigned to the session ID provided after you make the first request' and persists for 60 seconds per request reset (docs.apify.com/platform/proxy/residential-proxy). Country-level targeting is available via ISO 3166-1 alpha-2 codes. US-only state-level targeting via ISO 3166-2:US. No city-level targeting documented. Residential pricing starts at $8/GB on Free/Starter and steps down to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier. That's smaller than competitors like Bright Data offer at equivalent commitments. The Google SERPs proxy is purpose-built for Google search result extraction with localization, priced at $2.50/1K on Free/Starter and dropping to $1.70/1K on Business. No mobile proxies or ISP proxies are advertised in the docs. The proxy stack focuses on datacenter, residential, and a SERP-specific pool. Bright Data, by contrast, advertises mobile and ISP separately. This is a real product gap versus Bright Data and Decodo, since the AI/agent buyer Apify is courting often needs ISP/mobile for mobile-app-style targets. Sticky-session granularity also differs sharply between datacenter (26 hours, useful for multi-step crawls maintaining cart state) and residential (1 minute, useful for single-page anti-bot evasion).

Structured endpoints

SDKs and integrations

AI capabilities

Apify is one of the most AI-fluent providers in the scraping API category. Where most competitors bolt on a single LLM-extraction endpoint, Apify ships a layered AI stack. A hosted MCP server at mcp.apify.com (one of the only production-grade MCP endpoints in scraping). AI-purpose-built Actors (RAG Web Browser for agent search, Website Content Crawler for LLM-friendly markdown ingestion at $0.20 to $5 per 1,000 pages depending on rendering tier). The open-source Crawlee SDK with an AI-powered StagehandCrawler crawler type. First-party integrations into LangChain, LlamaIndex, Pinecone, Milvus, Qdrant, and OpenAI Assistants. The bet is clear. Apify wants to be the data layer underneath every AI agent that touches the web. Pricing transparency is the weak link. The MCP docs say 'Monitor your API usage through Apify Console to stay within your plan limits' rather than quoting per-call rates, and underlying Actor compute-unit costs cascade unpredictably.

Feature inventory

Hosted MCP Server (mcp.apify.com)GA

Production MCP server letting AI agents discover and invoke any Apify Actor as a tool. Supports OAuth and bearer-token auth. Compatible with Claude Desktop, Cursor, and VS Code Copilot's agent mode out of the box. URL parameters let you scope which tools are exposed. Source: docs.apify.com/platform/integrations/mcp

Pricing: Free to call. Billed via underlying Actor compute-units and per-result fees against your plan credits. SSE transport deprecating 2026-04-01.

RAG Web Browser Actor (apify/rag-web-browser)GA

'Queries Google Search, scrapes the top N pages using a full web browser, and returns their content as clean Markdown for further processing by an LLM.' Designed specifically for agent search-and-retrieve. Supports JS-heavy and static sites. Integrates with OpenAI Assistants, GPTs, Claude, RAG pipelines. Source: apify.com/apify/rag-web-browser

Pricing: Compute-based. 1 CU = 1 GB memory times 1 hour at $0.13 to $0.20/CU. No flat per-result pricing.

Website Content Crawler (apify/website-content-crawler)GA

Deep crawler that outputs plain text, Markdown, or HTML formatted for LLM ingestion. Integrates with LangChain, LlamaIndex, Pinecone, Qdrant for direct vector DB pipelines. Apify's flagship 'feed a website to your LLM' Actor. Source: apify.com/apify/website-content-crawler

Pricing: Approximately $0.5 to $5 per 1,000 pages (headless browser tier) or $0.20 per 1,000 pages (raw HTTP tier), per Apify's own estimation in the Actor docs.

LangChain + LlamaIndex integrationsGA

First-party loaders that expose Actor output as LangChain document objects or LlamaIndex documents. Lets developers wire a scraper into a RAG pipeline without writing transformation glue. Source: docs.apify.com/platform/integrations

Pricing: Free integration. Only the underlying Actor runs cost CUs.

Crawlee StagehandCrawler (AI-powered crawling)Beta

A crawler type within the Crawlee framework that uses Stagehand (browser AI agent library) to navigate and extract content with LLM guidance rather than hand-coded selectors. Source: crawlee.dev

Pricing: Free in Crawlee. LLM inference cost depends on which model the user wires in (OpenAI/Anthropic API charged to user's account separately).

OpenAI Assistants / Custom GPT integrationGA

Apify Actors callable as Custom GPT actions and OpenAI Assistants tools. The RAG Web Browser is the canonical example. Drop it into a GPT and it gains web search plus content retrieval. Source: apify.com/apify/rag-web-browser

Pricing: No additional integration fee. Actor compute charges apply normally.

Vector DB integrations (Pinecone, Milvus, Qdrant)GA

Direct connectors from Apify dataset to vector DB. Eliminates the embedding-and-upsert step from your RAG pipeline. Source: docs.apify.com/platform/integrations

Pricing: Free integration. Embedding costs (OpenAI text-embedding-3, etc.) charged to user's own provider account.

Our assessment

This is the most differentiated AI story in scraping APIs right now. Apify is one of maybe three providers (alongside Decodo with its open-source MCP and Bright Data with its various AI scrapers) where the AI features are real product offerings rather than marketing veneer. The hosted MCP server alone is a real reason to pick Apify for an agent-first build, because most competitors haven't shipped MCP at all and the ones that have offer self-hosted-only options. The RAG Web Browser and Website Content Crawler are well-designed for their stated purpose. They output the formats LLMs actually want (clean Markdown, semantic structure preserved) rather than dumping raw HTML and asking the LLM to parse. The pricing model undermines the story. An AI agent doesn't know in advance how many CUs its Actor calls will consume, and the MCP docs explicitly punt on quoting per-call rates. For an LLM-judging-cost workflow this is a real problem. Competitors quoting flat $1/1K or $5/1K per request let the agent reason about cost. Apify's compute-units force the agent to either over-provision or accept budget surprises. The features work. The pricing transparency around them is the gap.

Where it excels

Where it falls short