Apify
Marketplace breadth wins. Amazon at 80 percent SR and a $7.04/1K blended cost are the trade-offs.
Published 2026-06-05 · 2627-word independent review · ScrapingTest Research
Verdict
Grade: B+. Excellent breadth and AI-agent fit. Amazon weakness, high latency, and compute-unit unpredictability hold back an A.
Best for
- AI agent and RAG pipelines (hosted MCP at mcp.apify.com, RAG Web Browser, Website Content Crawler with LLM-friendly markdown)
- Long-tail and niche targets where a pre-built community Actor already exists in the 37,223-Actor marketplace
- Teams that want to deploy custom scrapers on managed infrastructure via the open-source Crawlee SDK
- Workflows that need scheduling, datasets, key-value storage, and webhooks bundled with the scraper
- Buyers willing to trade per-request predictability for per-Actor flexibility (pay-per-event or pay-per-result)
Avoid if
- You need a single flat per-request URL endpoint with deterministic cost. Compute-unit pricing is hard to forecast.
- Your workload is Amazon-heavy at sustained load. We measured 80 percent SR on amazon.com with 14.5s avg and 21.8s p90 response times.
- You need low-latency scraping (sub-3s p50). Apify's blended avg response is 19.5s across our 20 domains.
- You want pay-only-for-success billing. Apify charges compute-units regardless of HTTP outcome on most Actors.
What we found in the lab
Apify finished #4 overall in our 5-pass audit with a 95.25 percent success rate averaged across 20 domains (97.14 percent under the original audit's stricter calculation). Structured-data coverage hit every one of the 20 targets through dedicated marketplace Actors. The platform was operational on 20/20 with no GENUINE FAILs recorded after lock-in. Only three other providers (Scrape.do, BrightData, and a couple of infrastructure-only players) achieved zero unrecoverable failures. The reason is structural. Apify's 37,223-Actor marketplace (per apify.com/store header at time of writing) means almost every benchmark domain has at least one pre-built scraper, often more than one, and authors are incentivized to maintain them because they earn rental fees.
The numbers degrade when you go domain by domain. Six of 20 targets fell below 100 percent SR: amazon.com (80 percent, 16/20 trials), g2.com (75 percent), walmart.com (80 percent), trustpilot.com (85 percent), youtube.com (95 percent), and idealista.com (90 percent). Apify's average response time across all 20 domains was 19.5 seconds, the highest of any top-5 provider in the audit. Long-tail Actor runs drove that number: booking.com (52.4s avg, 78.6s p90), tripadvisor.com (54.4s avg, 81.5s p90), and walmart.com (56.0s avg, 84.0s p90). Cost-per-thousand varied wildly. trustpilot.com came in at $0.47/1K. reddit.com hit $44/1K. The 20-domain blended average was $7.04/1K, roughly 6 to 30 times what flat-rate competitors charge.
Apify wins on coverage and AI-agent ergonomics. It loses on latency and cost predictability. If you're building a price-monitoring pipeline that needs sub-5s response, you're paying a 4 to 10 times latency tax for the marketplace breadth. If you're feeding an LLM agent that just needs structured JSON back from a niche site, Apify is often the only place that has a ready-made Actor.
Per-domain breakdown
| Domain | SR | Notes |
|---|---|---|
| amazon.com | 80% (16/20) | Weakest of the e-commerce set. 14.5s avg, 21.8s p90. Slow even when it works. $5/1K via marketplace Amazon Actor. Apify did not make the top-5 of our 500-trial Amazon stress test. |
| g2.com | 75% (15/20) | DataDome wall hits Apify Actors on G2 the same way it hits flat-API providers. $13.75/1K is the second-highest cost in the Apify row. Community Actors charge premium for hard targets. |
| reddit.com | 100% (20/20) | Works perfectly but at $44/1K, the most expensive single domain we measured for Apify. Reddit Actor authors charge a premium because the target's API restrictions push them to browser rendering. |
| trustpilot.com | 85% (17/20) | Anti-bot tightening. $0.47/1K is Apify's cheapest domain. That combination (cheap and flaky) suggests an HTTP-only Actor that struggles when Trustpilot escalates. |
| walmart.com | 80% (16/20) | 56.0s avg, 84.0s p90, by far the longest response time we measured. Apify's Walmart Actor is doing a full browser pass to clear the bot challenge. |
| indeed.com | 100% (20/20) | Works at $13.25/1K, third-most expensive domain. 20.8s avg response, long but reliable. Jobs Actors are well-maintained on Apify. |
| instagram.com | 100% (20/20) | 100 percent at $2.60/1K, 6.6s avg. One of Apify's strongest wins versus competitors that block Instagram entirely (ScraperAPI, Firecrawl). A real differentiator. |
| linkedin.com | 100% (20/20) | 100 percent at $4/1K, 8.9s avg. Zyte returns 451 on LinkedIn (account policy). Firecrawl returns 403. Apify Actors handle it cleanly. Another marketplace-breadth win. |
Amazon 550-trial stress test
Apify did not qualify for our 500-trial Amazon stress test. The stress benchmark targeted only the five providers that hit 100 percent SR on amazon.com in the original 30-trial audit (Scrape.do, ScrapingBee, ScrapingDog, Decodo, Zyte). Apify scored 80 percent (16/20 trials) on amazon.com in the audit, which dropped it out of the top-5 stress tier. The audit numbers tell the story. 14,542ms average response time and 21,813ms p90 mean that even on successful runs, Apify is 3 to 7 times slower than Scrape.do (3.5s p50) or Decodo (3.1s p50), and you're paying $5/1K versus their $0.11 to $0.50/1K. Per the amazon-stress REPORT.md, Amazon's new defense (the Akamai bot-manager `bm-ver` interstitial, a 2.2KB shell page with meta-refresh) hit Zyte and ScrapingDog's search endpoints. Apify Actors that egress through datacenter ASNs are equally exposed. The residential-backed marketplace Actors evade it but cost considerably more per result. Reviews `/product-reviews/B07FZ8S74R` are now login-walled across all providers including Apify, per the same report. For Amazon-heavy workloads, our data says go elsewhere.
Pricing deep dive
Apify's pricing has three stacked layers that compound. First, the platform subscription tier (Free $0, Starter $29/mo, Scale $199/mo, Business $999/mo) sets your monthly platform credit allowance, RAM ceiling, and concurrency limit. Second, compute-unit consumption is metered as memory times runtime times CPU per Actor run, billed at $0.20/CU on Free/Starter, $0.16/CU on Scale, $0.13/CU on Business. Third, per-Actor rental or pay-per-result fees are set independently by each marketplace author (typical: $20/mo rental or $8 per 1,000 results). On top of that you pay separately for proxies (residential starts at $8/GB on Free/Starter, steps down to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier; SERPs $2.50/1K dropping to $1.70/1K at Business; datacenter $0.60 to $1.00 per IP after the included pool), storage ($1.00 per 1,000 GB-hours on Free/Starter), and external data transfer ($0.20/GB). The headline 'starts at $29/month' leaves a lot out: $29 buys $29 of platform credits and a 32 GB / 32 concurrency ceiling. Anyone running a serious scraping workload will burn through that before consuming a third of the monthly cycle. Existing scrapingtest content listed Starter at $39/mo. The live pricing page shows $29/mo as of this fetch. Flagging the divergence for record.
Plans
| Plan | Price | Volume | Concurrency | What unlocks |
|---|---|---|---|---|
| Free | $0 | $5/mo platform credits | 25 concurrent runs | 8 GB RAM max, community support, 5 datacenter proxy IPs included |
| Starter | $29/mo (pricing page; existing scrapingtest content showed $39, flagging divergence) | $29 in platform credits + pay-as-you-go beyond | 32 concurrent runs | 32 GB RAM max, $0.20/CU compute, chat support, 30 datacenter IPs included then $1/IP, residential $8/GB |
| Scale | $199/mo | $199 in platform credits + PAYG | 128 concurrent runs | 128 GB RAM max, $0.16/CU compute (-20%), priority chat support, 200 datacenter IPs then $0.80/IP, residential $7.50/GB, SERPs at $2/1K |
| Business | $999/mo | $999 in platform credits + PAYG | 256 concurrent runs | 256 GB RAM max, $0.13/CU compute (-35%), dedicated account manager, 500 datacenter IPs then $0.60/IP, residential $7/GB, SERPs $1.70/1K |
| Enterprise | Custom | Custom credit allowance | Custom (>256) | Custom SLA, volume compute discount, dedicated infrastructure available on request |
Cost multipliers
Hidden costs (not on the pricing card)
- Third-party marketplace Actor rental fees, typically $20/month per Actor or $8 per 1,000 results, charged on top of platform compute units. Not included in any subscription tier. Source: apify.com/pricing add-ons section and the per-Actor pages in apify.com/store.
- Residential proxy bandwidth starts at $8/GB on Free/Starter and steps down to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier. A single browser-rendered page on a JS-heavy site (Walmart, Booking) can consume 5 to 15 MB of residential traffic, putting effective per-page residential cost at $0.035 to $0.12 before any compute charge. Source: apify.com/pricing residential row.
- Concurrent run overage at $5 per additional run beyond plan ceiling. A team needing 40 parallel Actor runs on the Starter (32-cap) plan pays an extra $40/month for concurrency alone, separate from compute. Source: apify.com/pricing add-ons.
- External data transfer at $0.20/GB on top of compute and proxy. Large dataset exports (CSV/JSON to S3 or webhook destinations) accumulate this charge silently. Source: apify.com/pricing data transfer row.
- Storage at $1.00 per 1,000 GB-hours on Free/Starter. A 50 GB dataset kept for a month costs about $36 in storage alone, independent of compute or proxy. Source: apify.com/pricing storage row.
- Compute-unit math is opaque before you run an Actor. Memory times runtime is set by Actor author defaults, not by the buyer. A cheap Actor on a slow proxy can consume 4 to 5 times the CUs of a fast one for the same page. Source: docs.apify.com/platform/actors compute-units explanation.
Effective cost per workload
Features deep dive
Core features
Containerized serverless programs running on Apify's cloud. The store at apify.com/store currently shows 'Browse 37,223 Actors' in its header (up from the ~22,000 cited in older content). Categories span social media, AI, agents, lead generation, e-commerce, and SEO tools. Both Apify-built native Actors and third-party community Actors. Source: apify.com/store
Our take: Breadth is the product. Our audit measured Apify at 20/20 dedicated-endpoint coverage on the benchmark. Only Bright Data matched it. Where flat-API providers fall back to a generic /scrape endpoint, Apify usually has a domain-specific Actor with already-encoded selectors. Quality is uneven (Actor authors set their own maintenance cadence), but the floor is higher than starting from scratch.
JavaScript and Python crawler framework. 'Build reliable web scrapers. Fast.' (crawlee.dev tagline). Provides HTTP-based crawling, Cheerio DOM parsing, Puppeteer/Playwright browser automation, JSDOM/LinkeDom, plus AI-powered StagehandCrawler. Includes proxy management, session handling, request queueing, data storage. Deployable on AWS Lambda, GCP Cloud Run, and the Apify Platform. Source: crawlee.dev
Our take: Crawlee is useful even if you never touch Apify's cloud. It's the de facto modern Node.js scraping framework, replacing aged libraries like Apify SDK v2 and Scrapy-for-JS. Free, MIT-licensed, no lock-in. A maintainer can build a custom scraper in Crawlee, deploy on their own infra, and only push to Apify cloud when they need the integrations.
Apify's Model Context Protocol server exposes Actor discovery and invocation as tools to AI agents. Supports Claude Desktop, Cursor, VS Code Copilot agent mode, local stdio via Node.js. Two auth modes: OAuth browser flow and bearer token. SSE transport deprecation announced for April 1, 2026. Source: docs.apify.com/platform/integrations/mcp
Our take: Apify is one of the only scraping platforms that ships a real hosted MCP server. Decodo has an open-source one on GitHub. Scrape.do, Bright Data, and most others have nothing. For agentic workflows where the LLM picks which Actor to call based on the user query, this is the path of least resistance. Pricing is the weak spot: MCP docs don't quote per-call rates, just 'monitor your usage'.
Three storage types: Datasets (append-only structured records, browsable and exportable to CSV/JSON/XLSX), Key-Value Stores (arbitrary binary blobs), and Request Queue (URL queue with deduplication). Storage charged at $1.00 per 1,000 GB-hours on Free/Starter, less on higher tiers. Source: docs.apify.com/platform/storage
Our take: More plumbing than a flat-API competitor gives you. The dataset is where Actor output lands by default and where downstream consumers (webhooks, integrations, SDKs) read from. Useful if you want to decouple scrape-time from consume-time. Wasteful if you just want a synchronous request/response.
Cron-style scheduler for periodic Actor runs. Webhooks fire on Actor lifecycle events (run started, succeeded, failed). Plus integrations into Zapier, Make, n8n, LangChain, LlamaIndex, Pinecone, Milvus, OpenAI Assistants. Source: docs.apify.com/platform/schedules and docs.apify.com/platform/integrations
Our take: Standard for a managed platform. Worth noting because most flat-API competitors (ScraperAPI, ScrapingBee, ZenRows, Scrape.do at the base tier) leave scheduling to you. Apify's scheduler was reliable in our experience. We never lost a run during the audit period.
Proxy pool
Apify operates three proxy categories: Datacenter, Residential, and Google SERPs Proxy. None publish a total IP count. Per docs.apify.com/platform/proxy/datacenter-proxy, datacenter proxies use intelligent rotation: 'For each HTTP/S request, the proxy takes the list of all available IP addresses and selects the one used the longest time ago for the specific hostname.' Persistent sessions are supported with a 26-hour lifespan that resets on each use, so a daily-accessed session effectively never expires. Datacenter IPs are billed as included pools per plan tier (5 IPs on Free, 30 on Starter, 200 on Scale, 500 on Business) plus $0.60 to $1.00 per additional IP, with country-level geo-targeting (no specific country list published). Residential proxies use a 1-minute sticky session model. 'A single IP address is assigned to the session ID provided after you make the first request' and persists for 60 seconds per request reset (docs.apify.com/platform/proxy/residential-proxy). Country-level targeting is available via ISO 3166-1 alpha-2 codes. US-only state-level targeting via ISO 3166-2:US. No city-level targeting documented. Residential pricing starts at $8/GB on Free/Starter and steps down to $7.50/GB on Scale and $7/GB on Business, a roughly 12.5 percent volume discount at the top tier. That's smaller than competitors like Bright Data offer at equivalent commitments. The Google SERPs proxy is purpose-built for Google search result extraction with localization, priced at $2.50/1K on Free/Starter and dropping to $1.70/1K on Business. No mobile proxies or ISP proxies are advertised in the docs. The proxy stack focuses on datacenter, residential, and a SERP-specific pool. Bright Data, by contrast, advertises mobile and ISP separately. This is a real product gap versus Bright Data and Decodo, since the AI/agent buyer Apify is courting often needs ISP/mobile for mobile-app-style targets. Sticky-session granularity also differs sharply between datacenter (26 hours, useful for multi-step crawls maintaining cart state) and residential (1 minute, useful for single-page anti-bot evasion).
Structured endpoints
Amazon (multiple Actors: apify/amazon-product-scraper, junglee/amazon-scraper, others)Walmart (multiple community Actors)eBay (multiple community Actors)Google Maps (apify/google-maps-extractor)Google Search (apify/google-search-scraper)Instagram (multiple Actors covering profiles, posts, hashtags, reels)TikTok (clockworks/free-tiktok-scraper and others)LinkedIn (multiple paid Actors)YouTube (channel, video, search Actors)Booking.com (apify/booking-scraper and community variants)TripAdvisor (multiple Actors)Zillow, Idealista, real-estate listingsReddit (apify/reddit-scraper-lite and others)Indeed, LinkedIn Jobs (jobs-vertical Actors)X / Twitter (multiple Actors)Plus 37,000+ long-tail niche sites. Coverage is the real differentiator versus flat-API competitors.
SDKs and integrations
- JavaScript/TypeScript SDK (apify-client-js)
- Python SDK (apify-client-python)
- Crawlee framework (JS + Python, open-source, crawlee.dev)
- MCP server (mcp.apify.com) for Claude Desktop, Cursor, VS Code Copilot agent mode
- LangChain integration (Actor to LangChain tools, vector store loaders)
- LlamaIndex integration (document loaders backed by Actors)
- Pinecone, Milvus, Qdrant integrations for direct vector DB ingestion
- OpenAI Assistants and Custom GPTs (RAG Web Browser exposed as a callable tool)
- Zapier, Make.com, n8n connectors for no-code workflow automation
- Webhooks (Actor lifecycle events), CLI (apify-cli), Docker base images for custom Actors
AI capabilities
Apify is one of the most AI-fluent providers in the scraping API category. Where most competitors bolt on a single LLM-extraction endpoint, Apify ships a layered AI stack. A hosted MCP server at mcp.apify.com (one of the only production-grade MCP endpoints in scraping). AI-purpose-built Actors (RAG Web Browser for agent search, Website Content Crawler for LLM-friendly markdown ingestion at $0.20 to $5 per 1,000 pages depending on rendering tier). The open-source Crawlee SDK with an AI-powered StagehandCrawler crawler type. First-party integrations into LangChain, LlamaIndex, Pinecone, Milvus, Qdrant, and OpenAI Assistants. The bet is clear. Apify wants to be the data layer underneath every AI agent that touches the web. Pricing transparency is the weak link. The MCP docs say 'Monitor your API usage through Apify Console to stay within your plan limits' rather than quoting per-call rates, and underlying Actor compute-unit costs cascade unpredictably.
Feature inventory
Production MCP server letting AI agents discover and invoke any Apify Actor as a tool. Supports OAuth and bearer-token auth. Compatible with Claude Desktop, Cursor, and VS Code Copilot's agent mode out of the box. URL parameters let you scope which tools are exposed. Source: docs.apify.com/platform/integrations/mcp
Pricing: Free to call. Billed via underlying Actor compute-units and per-result fees against your plan credits. SSE transport deprecating 2026-04-01.
'Queries Google Search, scrapes the top N pages using a full web browser, and returns their content as clean Markdown for further processing by an LLM.' Designed specifically for agent search-and-retrieve. Supports JS-heavy and static sites. Integrates with OpenAI Assistants, GPTs, Claude, RAG pipelines. Source: apify.com/apify/rag-web-browser
Pricing: Compute-based. 1 CU = 1 GB memory times 1 hour at $0.13 to $0.20/CU. No flat per-result pricing.
Deep crawler that outputs plain text, Markdown, or HTML formatted for LLM ingestion. Integrates with LangChain, LlamaIndex, Pinecone, Qdrant for direct vector DB pipelines. Apify's flagship 'feed a website to your LLM' Actor. Source: apify.com/apify/website-content-crawler
Pricing: Approximately $0.5 to $5 per 1,000 pages (headless browser tier) or $0.20 per 1,000 pages (raw HTTP tier), per Apify's own estimation in the Actor docs.
First-party loaders that expose Actor output as LangChain document objects or LlamaIndex documents. Lets developers wire a scraper into a RAG pipeline without writing transformation glue. Source: docs.apify.com/platform/integrations
Pricing: Free integration. Only the underlying Actor runs cost CUs.
A crawler type within the Crawlee framework that uses Stagehand (browser AI agent library) to navigate and extract content with LLM guidance rather than hand-coded selectors. Source: crawlee.dev
Pricing: Free in Crawlee. LLM inference cost depends on which model the user wires in (OpenAI/Anthropic API charged to user's account separately).
Apify Actors callable as Custom GPT actions and OpenAI Assistants tools. The RAG Web Browser is the canonical example. Drop it into a GPT and it gains web search plus content retrieval. Source: apify.com/apify/rag-web-browser
Pricing: No additional integration fee. Actor compute charges apply normally.
Direct connectors from Apify dataset to vector DB. Eliminates the embedding-and-upsert step from your RAG pipeline. Source: docs.apify.com/platform/integrations
Pricing: Free integration. Embedding costs (OpenAI text-embedding-3, etc.) charged to user's own provider account.
Our assessment
This is the most differentiated AI story in scraping APIs right now. Apify is one of maybe three providers (alongside Decodo with its open-source MCP and Bright Data with its various AI scrapers) where the AI features are real product offerings rather than marketing veneer. The hosted MCP server alone is a real reason to pick Apify for an agent-first build, because most competitors haven't shipped MCP at all and the ones that have offer self-hosted-only options. The RAG Web Browser and Website Content Crawler are well-designed for their stated purpose. They output the formats LLMs actually want (clean Markdown, semantic structure preserved) rather than dumping raw HTML and asking the LLM to parse. The pricing model undermines the story. An AI agent doesn't know in advance how many CUs its Actor calls will consume, and the MCP docs explicitly punt on quoting per-call rates. For an LLM-judging-cost workflow this is a real problem. Competitors quoting flat $1/1K or $5/1K per request let the agent reason about cost. Apify's compute-units force the agent to either over-provision or accept budget surprises. The features work. The pricing transparency around them is the gap.
Where it excels
- AI agent and MCP workflows where the agent discovers and invokes scrapers dynamically. The hosted mcp.apify.com server is the most mature production MCP endpoint in scraping.
- RAG and vector DB ingestion pipelines. Website Content Crawler outputs LLM-ready Markdown at $0.20 to $5 per 1,000 pages with direct Pinecone/Milvus/Qdrant connectors.
- Long-tail and niche-site scraping. 37,223 Actors in the marketplace cover domains no flat-API competitor has dedicated parsers for.
- Cross-platform scraping where one workflow needs Instagram, LinkedIn, TikTok, Reddit, and niche e-commerce. Apify hits 100 percent on all of these in our data, where ScraperAPI and Firecrawl have ToS denylists.
- Teams building custom scrapers. Crawlee (open source, MIT) is the de facto modern Node.js scraping framework, deployable to Apify cloud or anywhere else.
Where it falls short
- Amazon-heavy workloads at sustained load. 80 percent SR (16/20 trials), 14.5s avg and 21.8s p90 response times, and Apify was dropped from our 500-trial Amazon stress test top-5.
- Latency-sensitive workflows under 5s p50. Apify's blended avg response across 20 domains is 19.5s, with several domains exceeding 50s avg (booking, tripadvisor, walmart).
- Cost-predictable workloads. Compute-units times Actor-author pricing times proxy GB times storage times data transfer make per-request cost forecasting hard. The 20-domain blended average is $7.04/1K with a 90 times spread between cheapest and most expensive domain.
- Pay-only-for-success billing. Most Actors charge compute regardless of HTTP outcome, so failed trials still bill. That's a real cost on Apify's 5 percent of trials that fail our verification.