I ran the tool against live Reddit. The command was straightforward: python -m scopescrape scan --subreddits saas --keywords "frustrated,alternative to" --output json
160 posts fetched from r/saas in 11 seconds. 119 scored above the 5.0 threshold. Here is what the data actually showed, including what went wrong.
What did the score distribution look like?
| Range | Count | Share |
|---|---|---|
| 9.0+ | 6 | 5% |
| 8.0-8.9 | 32 | 27% |
| 7.0-7.9 | 69 | 58% |
| 6.0-6.9 | 11 | 9% |
| 5.0-5.9 | 1 | 1% |
The average composite score was 7.71. The bell curve sits squarely in the 7.0-7.9 range. The 6 posts above 9.0 were the strongest signals, and they were visibly different from the others when I read them.
What kinds of signals dominated?
The system uses four signal tiers. Here is what showed up:
| Tier | Matches | Share |
|---|---|---|
| COMPARISON | 114 | 37.6% |
| PAIN_POINT | 97 | 32% |
| ASK | 67 | 22.1% |
| EMOTIONAL | 25 | 8.2% |
The surprise: COMPARISON signals outnumbered explicit PAIN_POINT signals by 17 matches. The r/saas community is actively evaluating and switching tools, not just venting. The tool_eval category dominated across signal categories with 102 matches.
What were the top posts?
The top 5 posts tell a story about what the system actually detects:
Post 1 [9.34]: "Switched from brainstorming startup ideas to collecting real user frustrations." This is 4 signals, sentiment -0.95. The post describes the exact workflow ScopeScrape automates.
Post 4 [9.25]: "Twitter search is a goldmine for startup leads." This one hit 10 signals, the highest signal count in the entire dataset. Pain point, comparison, and help-seeking signals all triggered. The system correctly identified that a post about sourcing leads is relevant to market research.
Post 7 [8.96]: "Honestly disappointed with Railway." A genuine product complaint. The post contains "frustrating", "deal breaker", and "compared to." No ambiguity here.
Post 10 [8.66]: "I built an API automation tool for 8 months before talking to a single customer." This is a founder post-mortem with 8 signals, including "struggling to" appearing twice and "terrible". The post covers the cost of skipping user research.
Post 14 [8.51] literally describes the manual process ScopeScrape replaces: "I spent 20 hours manually researching Reddit for SaaS ideas."
What tools are people talking about?
The entity extraction picked up brand names, product categories, and infrastructure names. After light cleaning:
| Entity | Mentions |
|---|---|
| 15 | |
| 14 | |
| 12 | |
| SEO | 12 |
| Slack | 10 |
| MRR | 10 |
| 9 | |
| Stripe | 8 |
| Product Hunt | 7 |
| GitHub | 7 |
| CRM | 7 |
| Claude | 6 |
The top three are all platforms people use to source user feedback. Reddit dominates because that is where the data came from. LinkedIn and Twitter follow because those are where market research happens. The infrastructure tools (Slack, Stripe, GitHub) confirm that the r/saas community skews technical founder.
What went wrong?
I need to be direct about the blind spots.
The frequency scorer assigned 10.0 to nearly every post. This happened because BM25 compared posts against themselves. Every post was fetched by the same query, so the scoring algorithm said "yes, they are all relevant." The frequency dimension added no discriminative value. To fix this, I need to either change the baseline corpus that BM25 compares against (use a large corpus of random Reddit text as the negative set) or replace BM25 with intra-corpus term overlap (how often does "frustrated" appear relative to the size of the post).
The entity extractor without spaCy is noisy. The regex matches any capitalized word, so common words like "You", "They", "Happy", and "Curious" got tagged as entities. The stopword filter was not aggressive enough. spaCy's NER model would fix this, but it does not support Python 3.14 yet.
The word "broken" triggered a pain point signal in promotional posts. One post asked for beta feedback ("tell me what's broken") and the detector flagged it as a pain point. The system has no concept of negation or context beyond a 50-character window.
All three issues are now in the product backlog as B-001, B-002, and B-003.
What would I do differently on the next run?
Add more subreddits to compare pain patterns. The next run will include r/startups, r/smallbusiness, and r/Entrepreneur. This gives me a broader view of whether pain themes are consistent across founder communities or unique to the SaaS space.
Fetch comments, not just posts. The richest pain signals live in replies. A post might be innocuous, but the comments reveal what people actually struggled with.
Lower min_score to 3.0 on the next run. I want to see what the bottom of the distribution looks like, which posts score below threshold and why, to understand where the scoring fails.
Run the same query weekly and track whether pain themes shift over time. Are people complaining about payment processing in March but infrastructure costs in June? Those shifts tell a story.
What was the takeaway?
The tool works. It found real signals in real data. The top-scored posts are genuinely useful for someone doing market research. A founder trying to understand what keeps SaaS operators up at night would save hours by starting with the 9.0+ tier.
But the scoring has blind spots. The frequency dimension needs a better baseline. The entity extraction needs spaCy. The pain point detector needs negation awareness. The backlog is open at the project repository, and each issue maps to a real limitation that affects output quality.