Reddit API Reference for Developers: Fields, Limits, Gotchas (2026)

I spent two weeks building a Reddit data pipeline for ScopeScrape and found myself answering the same questions over and over: what fields does a Submission actually return? How do I get all comments without hitting rate limits? Can I still access historical data now that Pushshift is gone? This post is the reference I wish existed when I started.

How do I get Reddit API access in 2026?

This changed significantly in late 2024. Reddit ended self-service API key generation under their new Responsible Builder Policy. You can no longer just go to reddit.com/prefs/apps and create credentials immediately.

The current process:

Read Reddit's Responsible Builder Policy
Submit an application via Reddit's Developer Support form describing your use case, which subreddits you'll access, and expected request volume
Wait for approval (typically 2-4 weeks)
Once approved, generate your OAuth2 credentials at reddit.com/prefs/apps

Having an established Reddit account with positive karma improves your chances. Accounts that look like they were created just for API access tend to get rejected.

What are the different app types and which should I pick?

App Type	Use Case	OAuth Flow	Has client_secret?
Script	Personal scripts, bots, data pipelines running on your own machine	Password grant (requires your Reddit username and password)	Yes
Web app	Web services that act on behalf of other Reddit users	Authorization code grant (redirect-based)	Yes
Installed app	Mobile apps, desktop apps (runs on user's device)	Implicit grant	No (cannot keep a secret)

For data collection and analysis, pick script. It's the simplest: you provide your client_id (the 14+ character string under "personal use script"), client_secret (the 27+ character string next to "secret"), your Reddit username, and your password. PRAW handles the rest.

import praw

reddit = praw.Reddit(
    client_id="your_14char_client_id",
    client_secret="your_27char_client_secret",
    user_agent="scopescrape/0.1 by u/your_username",
    username="your_username",
    password="your_password"
)

Script apps only have access to accounts registered as "developers" of the app. If you need to act on behalf of arbitrary users, you need a web app.

What fields does a Submission object return?

PRAW dynamically provides attributes based on what Reddit's API returns. Reddit can add or remove fields without notice, and PRAW does not document specific attributes for this reason. That said, in practice these are the fields you can rely on as of March 2026:

Field	Type	Notes
`id`	string (base36)	Unique identifier, e.g. "abc123de"
`title`	string	Post title. Plain text, max 300 characters.
`selftext`	string	Post body in Markdown. Empty string for link posts.
`score`	int	Net upvotes (upvotes minus downvotes). Not actual vote counts.
`upvote_ratio`	float	Proportion of upvotes to total votes. 0.5 means 50/50 split.
`num_comments`	int	Total comment count. May not match PRAW's extracted count exactly.
`created_utc`	float	Unix timestamp in seconds since epoch.
`subreddit`	Subreddit	Subreddit object. Use `str(submission.subreddit)` for the name.
`author`	Redditor or None	None if the account was deleted. Check before accessing.
`permalink`	string	Relative URL path, e.g. "/r/saas/comments/abc123de/..."
`url`	string	For link posts, the external URL. For self posts, the full Reddit URL.
`is_self`	bool	True for text posts, False for link posts.
`link_flair_text`	string or None	Post flair label. Subreddit-specific. None if no flair.
`over_18`	bool	NSFW flag.
`locked`	bool	True if moderators locked comments.
`stickied`	bool	True if pinned by moderators.
`distinguished`	string or None	"moderator", "admin", or None.
`is_original_content`	bool	OC flag. Set by the post author.
`spoiler`	bool	True if marked as spoiler.
`name`	string	Fullname with type prefix, e.g. "t3_abc123de".

Because PRAW fetches attributes lazily (it does not pull the full object until you access a field), always use defensive access in production code:

# Safe: returns 0 if field is missing or deprecated
score = getattr(submission, 'score', 0)

# Unsafe: throws AttributeError if field is removed
score = submission.score

What fields does a Comment object return?

Field	Type	Notes
`id`	string (base36)	Unique identifier.
`body`	string	Comment text in Markdown. "[deleted]" if removed.
`score`	int	Net votes. Reddit rounds negative scores.
`created_utc`	float	Unix timestamp.
`author`	Redditor or None	None if account was deleted.
`parent_id`	string	Fullname of parent. "t3_xyz" = parent is a post. "t1_xyz" = parent is a comment.
`replies`	CommentForest	Child comments. May contain MoreComments stubs.
`is_submitter`	bool	True if comment author is the original post author (OP).
`stickied`	bool	True if pinned by moderators (usually automod or mod comments).
`distinguished`	string or None	"moderator", "admin", or None.

How do I get all comments from a post?

This is the single most common source of bugs I've seen. When you access submission.comments, you get a CommentForest object, not a flat list. The forest contains a mix of Comment objects and MoreComments stubs. If you iterate directly, you will hit AttributeError: 'MoreComments' object has no attribute 'body'.

The correct pattern:

# Replace MoreComments stubs with actual comments.
# limit=None replaces ALL stubs (most thorough, most API calls).
# limit=32 is the default (replaces up to 32, drops the rest).
# limit=0 removes all stubs without fetching (fastest, loses deep comments).
submission.comments.replace_more(limit=32)

# Flatten the tree into a single list
all_comments = submission.comments.list()

for comment in all_comments:
    print(comment.body)

Each MoreComments stub replacement costs one API call that returns roughly 30-40 comments. On a popular post with 500+ comments, replacing all stubs can consume 15-20 API calls just for one post.

replace_more() Parameter	Behavior	API Cost	When to Use
`limit=None`	Replace every stub	High (1 call per stub)	Deep analysis of a single post
`limit=32` (default)	Replace up to 32, discard the rest	Moderate	Balanced data collection
`limit=0`	Remove all stubs, keep only already-loaded comments	Zero	Speed-critical bulk scanning

There is also a threshold parameter (default 0) that controls the minimum number of comments a stub must represent before it gets replaced. Setting threshold=10 skips small stubs that only contain a few replies.

Comment sort order

You can control how comments are sorted before fetching:

Sort Value	Behavior
`confidence`	Reddit's default ranking algorithm (surfaces relevant, high-quality comments)
`top`	Highest score first
`new`	Most recent first
`controversial`	Highest disagreement (lots of both upvotes and downvotes)
`old`	Oldest first
`qa`	Q&A mode: original poster's replies surface first

submission.comment_sort = "new"
submission.comments.replace_more(limit=8)
comments = submission.comments.list()

What are the rate limits?

Access Level	Rate Limit	Notes
Unauthenticated	10 requests/minute	Essentially unusable for data collection
OAuth authenticated (free tier)	100 requests/minute	Sufficient for small projects, tight for bulk scanning
Premium tier ($12,000+/year)	Higher limits (negotiated)	Contact Reddit's enterprise sales

PRAW handles rate limiting automatically. It reads Reddit's response headers (X-Ratelimit-Remaining, X-Ratelimit-Reset) and sleeps when necessary. You do not need to add your own sleep calls.

If you exceed the limit, PRAW raises prawcore.exceptions.TooManyRequests. In practice this rarely happens because PRAW self-throttles, but it can occur if you're running multiple PRAW instances or if Reddit's headers are inconsistent.

Practical math for a scanning pipeline: at 100 req/min, you can fetch roughly 100 submissions per minute (one API call each). But if each submission has comments you want to expand, the budget shrinks fast. A post with 8 MoreComments stubs costs 9 total calls (1 for the post + 8 for stubs). Scanning 10 such posts would consume 90 of your 100 per-minute budget.

What data does the API not expose?

These are the fields people commonly expect to exist but do not.

Missing Data	What You Get Instead	Workaround
Individual upvote and downvote counts	`score` (net) and `upvote_ratio` (proportion)	Estimate: `upvotes = score / upvote_ratio`. Breaks when score is 0.
View counts	Nothing. Removed years ago.	Use `num_comments` as a rough engagement proxy.
Edit history	`edited` (timestamp of last edit, or False)	None. You only see the current version.
Deleted content	"[deleted]" for both body and author	None via the API. Pushshift archives may have the original.
Private subreddit posts	403 error	Must be an approved member of the subreddit.
Quarantined subreddit posts	403 unless opted in	`reddit.subreddit("name").quaran.opt_in()`

The vote count limitation is the most consequential for analysis. You're working with a survivorship-biased dataset (only non-deleted content) and approximate engagement metrics (net score, not absolute vote volume). Any analysis should acknowledge these constraints.

PRAW vs Async PRAW vs raw HTTP: which should I use?

Approach	Best For	Rate Limiting	Thread Safety
PRAW (praw 7.7.1)	Scripts, data pipelines, CLIs	Automatic (reads response headers)	Not thread-safe. Do not share instances across threads.
Async PRAW (asyncpraw)	Discord bots, async web apps, asyncio pipelines	Automatic	Async-safe. Uses aiohttp internally.
Raw HTTP (requests/httpx)	Maximum control, non-Python languages	Manual (read headers yourself, implement backoff)	Depends on your HTTP client.

PRAW is not thread-safe because it depends on requests.Session, which is not thread-safe. If you need parallelism, either use Async PRAW or create separate PRAW instances per thread (each with its own credentials).

For ScopeScrape, I chose synchronous PRAW because the CLI tool runs single-threaded and PRAW's automatic rate limiting means I never have to think about backoff logic.

Can I still access historical Reddit data?

This is the question that comes up most in the post-2023 landscape. The short answer: it's complicated.

Source	Status (March 2026)	Data Range	Cost
Reddit's official API	Active, requires approval	Real-time and recent (search endpoint is limited)	Free tier: 100 req/min
Pushshift (pushshift.io)	Real-time ingestion stopped in 2023. Historical archives remain.	2005 to mid-2023	Free (archives/dumps)
PullPush (pullpush.io)	Community-maintained Pushshift alternative	Varies. Check their endpoints.	Free API
Academic Torrents / BigQuery	Static dumps available	Pushshift monthly dumps through 2023	Free (bulk download)

Reddit's own search endpoint (subreddit.search()) is limited. It returns at most 250 results per query, does not support date range filtering natively, and results are sorted by relevance rather than time. For any serious historical analysis, you need Pushshift archives or PullPush.

For a tool like ScopeScrape that needs recent data (past week, past month), Reddit's API is sufficient. The search limitations only matter if you're trying to build a comprehensive historical dataset.

What does a real Submission look like as JSON?

{
  "id": "abc123de",
  "title": "I wish there was a way to track customer churn reasons automatically",
  "selftext": "We lose ~15% of customers annually and nobody knows why...",
  "score": 847,
  "upvote_ratio": 0.92,
  "num_comments": 124,
  "created_utc": 1742726400.0,
  "subreddit": "saas",
  "author": "founder_jane",
  "permalink": "/r/saas/comments/abc123de/i_wish_there_was_a_way/",
  "url": "https://www.reddit.com/r/saas/comments/abc123de/i_wish_there_was_a_way/",
  "is_self": true,
  "link_flair_text": "Question",
  "over_18": false,
  "locked": false,
  "stickied": false,
  "distinguished": null,
  "is_original_content": false,
  "spoiler": false,
  "name": "t3_abc123de"
}

Note: this is a representative example, not a live API response. The actual JSON from Reddit includes additional internal fields (subreddit_id, domain, media, gilded, etc.) that are less commonly used in data analysis.

Common PRAW errors and how to fix them

Error	Cause	Fix
`AttributeError: 'MoreComments' object has no attribute 'body'`	Iterating over CommentForest without calling replace_more() first	Call `submission.comments.replace_more(limit=N)` then `.list()`
`prawcore.exceptions.OAuthException`	Invalid client_id, client_secret, or Reddit credentials	Double-check all four values. client_id is the short string, client_secret is the long one.
`prawcore.exceptions.ResponseException: 403`	Accessing a private or quarantined subreddit without permission	For quarantined: `reddit.subreddit("name").quaran.opt_in()`. For private: must be approved member.
`prawcore.exceptions.TooManyRequests`	Exceeded 100 req/min (rare with PRAW, common with raw HTTP)	PRAW auto-throttles. If this fires, you may have multiple instances running.
`praw.exceptions.RedditAPIException`	Various Reddit-side errors (post not found, subreddit banned, etc.)	Catch and inspect `exception.items` for the specific error code.
`AttributeError` on a submission field	Reddit removed or renamed the field; PRAW fetches lazily	Use `getattr(obj, 'field', default)` for all field access in production.

How I use this data for pain point detection

Given these API constraints, here is the approach ScopeScrape takes. I scan three text fields per post: title, selftext, and each comment's body. I match against 60+ signal phrases organized into four tiers:

Tier	Signal Type	Example Phrases
1 (strongest)	Explicit pain	"I'm frustrated with", "this is broken", "driving me crazy", "we struggle with"
2	Seeking solutions	"is there a tool for", "anyone know how to", "looking for a way to"
3	Workarounds in use	"what I do instead", "my hack for this", "I ended up building"
4 (weakest)	Implicit signals	"would be nice if", "thinking about switching", "looking for alternatives"

Each detected signal gets scored across four dimensions (frequency, intensity, specificity via NER, recency via time decay) with configurable YAML weights. The scoring framework is documented in a separate post.

Since the API does not expose individual vote counts, I use comment score distribution as a proxy for community consensus: if multiple comments expressing the same pain point have high scores, the community is validating that pain. High variance in scores suggests disagreement rather than consensus.

def estimate_consensus(pain_comments):
    scores = [c.score for c in pain_comments]
    if not scores:
        return {"mean": 0, "median": 0, "agreement": "none"}

    mean_score = sum(scores) / len(scores)
    sorted_scores = sorted(scores)
    median_score = sorted_scores[len(scores) // 2]
    variance = sum((s - mean_score) ** 2 for s in scores) / len(scores)

    # Low variance + high mean = strong agreement
    # High variance = contentious topic
    agreement = "strong" if variance < 100 and mean_score > 10 else "weak"
    return {"mean": mean_score, "median": median_score, "agreement": agreement}

The full source code for ScopeScrape's Reddit adapter is on GitHub.

Reddit API Reference for Developers: Fields, Limits, and Gotchas (2026)