When I ran ScopeScrape against 3 subreddits in January, it found 1,459 posts containing some form of pain signal. 38 of them actually mattered. The scoring framework is what separates the 38 from the other 1,421.
What I learned from Reddit_Scrapper's approach
Before building my own system, I studied an existing project called Reddit_Scrapper that uses a five-dimension scoring model. Their approach weights relevance (0.25), emotion (0.10), pain (0.20), implementability (0.15), and technical_depth (0.20). They batch posts to an LLM and get structured scores back as JSON.
The design taught me something important: implementability and technical_depth are too LLM-dependent to measure reliably without AI. I dropped both and restructured around four dimensions I could compute deterministically from the text itself.
| Dimension | Reddit_Scrapper | ScopeScrape | Notes |
|---|---|---|---|
| Frequency | Bundled with relevance | 0.25 weight | Standalone dimension, measurable via clustering |
| Intensity | 0.10 weight | 0.20 weight | Measured via VADER sentiment, increased importance |
| Specificity | Part of relevance | 0.30 weight (highest) | Extracted via NER, made explicit and weighted highest |
| Recency | Not modeled | 0.25 weight | Time-decay function, new addition |
| Implementability | 0.15 weight | Dropped | Requires LLM judgment, not measurable locally |
| Technical_depth | 0.20 weight | Dropped | LLM-dependent, conflicts with transparency goal |
I kept the concept of configurable weights. That's the part that matters: letting users tune the framework for their use case, not locking them into my defaults.
The four dimensions
| Dimension | Weight | Computed Via | What It Measures |
|---|---|---|---|
| Frequency | 0.25 | BM25 similarity clustering | How many times is this pain mentioned |
| Intensity | 0.20 | VADER sentiment analysis | How strongly is the pain expressed |
| Specificity | 0.30 | spaCy named entity recognition | Is this concrete or a vague complaint |
| Recency | 0.25 | Exponential time-decay function | How fresh is this signal |
Specificity gets the highest weight (0.30) because specific pain points are actionable. "CRM integrations are broken" tells you what to fix. "Software sucks" does not. A vague complaint with high frequency and intensity still costs you engineering time if you can't identify the actual problem.
A worked example
A hypothetical pain point mentioning "API rate limits are killing us" appears 47 times across Reddit in the past two weeks with 3 named entities (AWS, API, rate limits) and a VADER compound score of -0.72:
| Dimension | Raw Value | Normalized (0-10) | Weight | Contribution |
|---|---|---|---|---|
| Frequency | 47 mentions | 9.4 | 0.25 | 2.35 |
| Intensity | -0.72 (VADER) | 7.2 | 0.20 | 1.44 |
| Specificity | 3 entities | 8.5 | 0.30 | 2.55 |
| Recency | 2 days old | 9.8 | 0.25 | 2.45 |
| Final Score | 8.79 / 10 | |||
It clears the default threshold of 7.0 and shows up in the report. A similar pain mentioned 3 times six months ago might score 5.1 and get filtered out.
YAML configuration
Different use cases need different weights. I configured this via simple YAML:
scoring:
weights:
frequency: 0.25
intensity: 0.20
specificity: 0.30
recency: 0.25
threshold: 7.0
hard_filters:
specificity_min: 3.0
recency_days_max: 90
A product manager looking for feature ideas might boost specificity to 0.40 and drop frequency to 0.15. A founder validating market size bumps frequency to 0.40. A startup optimizing retention emphasizes intensity at 0.35. Each dimension adjusts independently, and the tool validates that weights sum to 1.0.
Hard filters act as gates. A pain point scoring 8.8 but with specificity below 3.0 gets dropped. This enforces data quality without touching the formula.
The optional LLM layer
For users who want deeper analysis, I built an optional two-stage LLM pipeline on top of the base scoring. It is completely opt-in; you only pay if you choose it.
Stage 1: Batch pre-filter (~0.001 USD per post). Classify each high-scoring pain point as relevant, irrelevant, or tangential. Reduces noise, focuses expensive analysis on real signals.
Stage 2: Insight extraction (~0.01 USD per post). Synthesize a 1-2 paragraph brief of each pain point, what the problem is, who is affected, and why it matters.
Budget caps prevent surprise bills. Set a max spend: "Extract insights for top 20 pain points, cap at 0.50 USD." The tool stops when you hit the limit.
The base tool works without LLMs. The LLM layer amplifies it. You are never locked into expensive APIs to get value.
Why not a black-box model
A gradient-boosted ensemble could probably score pain points better than my formula. Those systems might outperform static weights.
I did not build one. You cannot debug a black box. If a model says a pain point scores 8.2, you do not know if it is frequency, intensity, or some learned interaction between them. You cannot change your use case without retraining. You cannot understand it, so you cannot trust it.
My YAML weights mean you see exactly why a pain point scored 8.2 versus 6.1. You run the formula by hand. You argue about weights with your team. You adjust for your specific use case. Most importantly, you know what you are optimizing for.
Transparency beats accuracy when accuracy does not tell you why. For an open-source tool meant to be debuggable and configurable, static weights with YAML config are the right tradeoff.