ScopeScrape is not a general-purpose scraper. It finds pain points in community discussions. The quality of what you get out depends entirely on how well you define what you're looking for. If you give it a vague query, you get noise. If you give it a specific one, you get signals. This post walks through what makes a problem statement effective and how to build one. As the tool evolves, this guide will evolve with it.

What makes a good problem statement for ScopeScrape?

A good problem statement has three parts: a target audience, a domain, and a hypothesis.

The target audience is where you look. This can be a subreddit (r/saas, r/webdev, r/smallbusiness), a Hacker News category, or a specific platform you specify. The domain is what product category or workflow you care about. The hypothesis is your prediction about what kind of pain you expect to find. Together, these three dimensions shape the search space and determine what the signal detector will match against.

Here is a table showing the difference between weak and strong problem statements:

Weak Statement Strong Statement
"Scrape r/all for everything" "Scan r/saas for project management frustrations"
"Find all complaints" "Find users switching from Notion due to database performance issues"
"Analyze the market" "Identify recurring CRM pain points in r/smallbusiness over the last month"

The weak statements lack one or more of the three parts. "Scrape r/all for everything" has audience and no domain or hypothesis. "Find all complaints" has no audience and no specific domain. "Analyze the market" is so broad it says nothing about what pain you expect to find or where to look.

The strong statements are concrete. They specify a community, a product area, and a specific kind of friction. This is the shape the signal detector is built to search for.

Why does specificity matter?

ScopeScrape's signal detector contains 60+ phrases across 4 tiers, organized from "explicit pain" (Tier 1) down to "implicit signals" (Tier 4). Each phrase is weighted, and the detector scores matches across frequency, intensity, specificity via named entity recognition, and recency. The broader your query, the more tiers it has to activate to find anything, and the more noise you get.

Compare two queries. First, run keywords "frustrated" and "project management" on r/saas. The detector will activate Tier 1 (explicit pain) and Tier 2 (seeking solutions) because the audience is relevant and the domain is narrow. You get high-signal results about real frustrations with project management tools. Second, run a single keyword "problem" on r/all. The detector activates Tier 4 (implicit signals) because the keyword is weak and the audience is universal. You get thousands of posts about every problem under the sun. The first query is actionable. The second is noise.

Specificity also affects recency weighting. If your domain is narrow and your audience is focused, recent posts matter more because they reflect current market conditions. A post from six months ago about Notion database performance in r/saas is less relevant than one from last week. But if you query r/all with a generic keyword, the tool cannot distinguish between a relevant old post and an irrelevant new one.

What are the three blindspots to watch for?

Even with a strong problem statement, you can miss real patterns. These three biases are baked into how community discussions work:

Blindspot What It Means Why It Matters
Survivorship bias You only see posts from people who bothered to post. Silent churn is invisible. If 1,000 users quietly switch from Tool A to Tool B, but only 10 post about it online, you will discover the signal for those 10. The other 990 are gone.
Recency bias The default time decay weights recent posts higher. A problem that peaked months ago might not surface. If a major bug in your competitor's product was discussed heavily in January but is fixed now, searching for it in March will rank it lower even if it shaped customer decisions.
Platform bias Reddit skews toward developer and technical users. B2B enterprise pain points might live elsewhere. If you sell to enterprises, the pain points discussed in r/webdev may not match the pain points discussed on LinkedIn, G2 reviews, or Gartner forums. You need to search multiple platforms.

These are not flaws in ScopeScrape. They are properties of how people reveal pain publicly. Knowing them shapes how you interpret what you find.

How do you iterate on your problem statement?

Start broad. Run a query with a loose hypothesis and review the top 10 results. Did the signal detector match what you expected? If the results are too noisy, tighten the keywords. If too few results appeared, broaden the domain or audience.

Here is a progression of refinement:

Version 1: Keywords: "frustrated", "Notion". Audience: r/productivity. Result: 47 matches, mostly generic frustrations with note-taking. Signal is too weak.

Version 2: Keywords: "switching from Notion", "Notion database", "Notion performance". Audience: r/productivity. Result: 12 matches. Better. You see real posts about Notion's database limitations and users comparing alternatives. But some posts are about pricing, not performance.

Version 3: Keywords: "Notion database", "Notion slow", "Notion performance". Audience: r/saas, r/productivity. Result: 8 matches, all focused on database speed and scaling. This is actionable.

Each iteration narrows the search space. The signal gets stronger. The cost is reduced coverage. You may miss some pain points if you go too narrow. Balance is the art of it.

What can ScopeScrape not tell you?

Be honest about the limits. ScopeScrape finds signals. It does not tell you market size, willingness to pay, whether a solution already exists for the pain, or whether the pain is big enough to build a business around. It finds what people are saying publicly. Your job is to interpret it.

If you find 30 posts about users struggling with a specific workflow in a domain, that is a signal. It does not tell you whether 1,000 people have that pain or 1 million. It does not tell you if they would pay to fix it. It does not tell you if three competitors already solved it. You need to answer those questions separately.

Use ScopeScrape to discover what is being discussed. Use other methods to validate whether it matters. Do interviews. Check if solutions exist. Look at pricing and adoption of existing products in the space. The tool is the beginning of the story, not the end.

Next steps

This guide reflects the tool as it exists today. Future versions of ScopeScrape will include LLM-powered analysis of detected signals, support for more platforms beyond Reddit and Hacker News, and time-series aggregation to track how pain points evolve. When those features ship, the guidance here will be updated. For now, focus on writing tight problem statements, iterating on keywords, and reading the results carefully rather than assuming the numbers tell you everything.