Why ScopeScrape Scrapes Reddit Without the API (And Why That Might Be the Right Call)

In late 2024, Reddit rolled out the Responsible Builder Policy, and it changed the calculus for anyone building tools that ingest Reddit data. The old workflow, where you could create API credentials in seconds and start fetching data, no longer exists. Now you submit an application, describe your use case, wait 2 to 4 weeks, and often get rejected without explanation. A number of developers I spoke with never received approval at all.

This left me with a question: if API approval is unreliable, what are the alternatives? I found that Reddit publishes JSON data on public endpoints, accessible to anyone with an HTTP client and a User-Agent. ScopeScrape takes that approach. This post explains what that means, what data I can actually get, and the concrete tradeoffs you make when you go this route.

What does Reddit's public JSON API actually expose?

Most people don't realize that any Reddit URL can be converted to a JSON response by appending .json to the end. Try it yourself:

https://reddit.com/r/saas/top/.json?t=month&limit=10

That request returns a JSON object with a list of the top 10 posts from r/saas in the past month. The structure is nested and sometimes baroque, but it's valid JSON.

What does each post object contain? The fields are the same ones PRAW returns (because PRAW is just a wrapper around this endpoint). I documented those fields in detail in my previous post on the Reddit API reference, so I won't repeat them here. The key insight: appending .json gives you the exact same shape as the official API.

But there are gaps. The public JSON endpoint does not return actual vote counts, only the net score (upvotes minus downvotes as calculated by Reddit's algorithm). It does not expose moderation history, private subreddit access, direct messages, or user inbox. If you need those features, you need the official API.

Feature	Public JSON Endpoint	Official Reddit API
Submissions	Yes	Yes
Comments	Yes	Yes
Post metadata (score, created_utc, author, etc.)	Yes	Yes
Net vote count (score)	Yes	Yes
Actual upvote and downvote counts	No	No (Reddit hides this from all clients)
Moderation history	No	Yes (if you're a mod)
Private subreddit access	No	Yes
User DMs	No	Yes
Write operations (post, comment, award)	No	Yes

Why did I go this route?

The honest answer: I could not get API approval in a reasonable timeframe. I submitted my application to Reddit in December 2024, describing ScopeScrape's use case (detecting pain points in Reddit discussions for market research). I included specifics: which subreddits, expected request volume, data retention policy. I heard nothing for 8 weeks. No response, no feedback, no indication of progress. At that point, building against the public JSON endpoint made sense as an interim solution.

I'm not alone. Developers on platforms like Apify have built tools that scrape Reddit at scale using headless browsers and proxy rotation. The Free Reddit Scraper Pro on Apify (https://apify.com/trudax/reddit-scraper) demonstrates that the public endpoints can handle serious volume. That tool uses a headless browser (Puppeteer or Playwright) to load Reddit pages and extract JSON, paired with proxy rotation to avoid IP blocking. It works. People use it.

ScopeScrape's approach is simpler but less efficient. Instead of a headless browser, I make plain HTTP requests with a proper User-Agent and reasonable delays between requests. No JavaScript rendering, no proxy pools. The tradeoff is throughput: I can do maybe 30 requests per minute without getting rate-limited or blocked. The Apify tool, with proper infrastructure, can do more. But for my use case (monitoring pain points in a few dozen subreddits), 30 requests per minute is sufficient.

What are the tradeoffs?

Using the public JSON endpoints instead of the official API means accepting several constraints.

First, rate limits are informal. Reddit does not publish a rate limit for the public endpoints. In practice, you can make roughly 30 requests per minute without triggering IP-level blocks. If you exceed that, your IP will get a 429 response (too many requests) and temporary blocking. There's no way to know in advance how long the block will last. The official API, by contrast, has explicit rate limits (60 requests per 60 seconds) that you can query and respect programmatically.

Second, Reddit can remove these endpoints at any time. They have not promised to keep reddit.com/r/foo/top.json available forever. If Reddit decides to turn off public JSON access, ScopeScrape breaks. The official API comes with a terms of service and some implicit commitment to stability. Public endpoints are best-effort.

Third, you cannot expand "MoreComments" stubs without additional requests. When Reddit returns a comment thread, deep replies are often truncated into a MoreComments object. To get those replies, you need to make a separate API call. With the public endpoint, that means another HTTP request. With PRAW, you have the same problem, but you can at least batch those requests. ScopeScrape's HTTP approach makes this slightly more cumbersome.

Fourth, you have no access to write operations. You can read posts and comments, but you cannot post, comment, award, or moderate. For data analysis, this is fine. For bots or tools that need to interact with Reddit, this is a dealbreaker. You would need the official API.

Public JSON vs Official API vs Browser Scraping

To make the decision clearer, here is a side-by-side comparison of three approaches to getting Reddit data:

Aspect	Public JSON Endpoint	Official Reddit API (PRAW)	Browser Scraping (Apify-style)
Authentication required	No	Yes (OAuth2 + pre-approval)	No
Rate limits (explicit)	No (informal, ~30/min)	Yes (60/60s per OAuth client)	No (depends on proxies + backoff)
Public data access	Yes	Yes	Yes
Private data access	No	Yes	No
Write operations	No	Yes	No
Setup time	Minutes (just make HTTP requests)	Weeks (application review)	Hours (configure proxies + browser)
Infrastructure complexity	Low (plain HTTP)	Low (Python + PRAW)	High (proxies, headless browser, retry logic)
Compliance risk	Medium (no ToS, best-effort)	Low (official, ToS-protected)	High (violates Terms of Service)
Cost	Minimal (free)	Free (rate limits apply)	Significant (proxies, compute)

When should you use the official API instead?

If you need any of these things, skip the public JSON endpoint and wait for API approval or use an alternative data source:

You need write access (creating posts, comments, awards, or moderation actions). The public endpoint is read-only. PRAW lets you modify Reddit state.

You need higher throughput. If you are collecting data from hundreds of subreddits or need real-time updates, 30 requests per minute will bottleneck you. The official API gives you 60 requests per 60 seconds per client. With multiple clients, you can scale further.

You are accessing private subreddits. Public JSON endpoints do not work on private communities. OAuth authentication (official API) is required.

You need compliance guarantees. If you are building a commercial product or handling sensitive data, the public endpoint's best-effort nature is a liability. The official API comes with terms of service and support.

You need moderation or admin features. Only the official API exposes mod logs, ban lists, and moderation queues.

Here is the thing that matters for ScopeScrape: I can easily swap in PRAW later. When my API application finally gets approved, or if Reddit shuts down the public endpoints, I simply change the adapter. ScopeScrape's code is structured around a data source abstraction. Swapping from the public JSON adapter to the PRAW adapter is a one-file change in the abstraction layer. The rest of the pain point detection pipeline does not care where the data comes from.

This is why the adapter pattern matters for long-term projects. You can start with the fastest option to ship, then switch to the more reliable option when circumstances change. Not every project needs this flexibility, but if you are building something that might outlive your current data source, designing for it upfront saves rewrites later.