When I ran ScopeScrape against 3 subreddits in January, it found 1,459 posts containing some form of pain signal. 38 of them actually mattered. The scoring framework is what separates the 38 from the other 1,421.

What I learned from Reddit_Scrapper's approach

Before building my own system, I studied an existing project called Reddit_Scrapper that uses a five-dimension scoring model. Their approach weights relevance (0.25), emotion (0.10), pain (0.20), implementability (0.15), and technical_depth (0.20). They batch posts to an LLM and get structured scores back as JSON.

The design taught me something important: implementability and technical_depth are too LLM-dependent to measure reliably without AI. I dropped both and restructured around four dimensions I could compute deterministically from the text itself.

Dimension Reddit_Scrapper ScopeScrape Notes
Frequency Bundled with relevance 0.25 weight Standalone dimension, measurable via clustering
Intensity 0.10 weight 0.20 weight Measured via VADER sentiment, increased importance
Specificity Part of relevance 0.30 weight (highest) Extracted via NER, made explicit and weighted highest
Recency Not modeled 0.25 weight Time-decay function, new addition
Implementability 0.15 weight Dropped Requires LLM judgment, not measurable locally
Technical_depth 0.20 weight Dropped LLM-dependent, conflicts with transparency goal

I kept the concept of configurable weights. That's the part that matters: letting users tune the framework for their use case, not locking them into my defaults.

The four dimensions

Dimension Weight Computed Via What It Measures
Frequency 0.25 BM25 similarity clustering How many times is this pain mentioned
Intensity 0.20 VADER sentiment analysis How strongly is the pain expressed
Specificity 0.30 spaCy named entity recognition Is this concrete or a vague complaint
Recency 0.25 Exponential time-decay function How fresh is this signal

Specificity gets the highest weight (0.30) because specific pain points are actionable. "CRM integrations are broken" tells you what to fix. "Software sucks" does not. A vague complaint with high frequency and intensity still costs you engineering time if you can't identify the actual problem.

A worked example

A hypothetical pain point mentioning "API rate limits are killing us" appears 47 times across Reddit in the past two weeks with 3 named entities (AWS, API, rate limits) and a VADER compound score of -0.72:

Dimension Raw Value Normalized (0-10) Weight Contribution
Frequency 47 mentions 9.4 0.25 2.35
Intensity -0.72 (VADER) 7.2 0.20 1.44
Specificity 3 entities 8.5 0.30 2.55
Recency 2 days old 9.8 0.25 2.45
Final Score 8.79 / 10

It clears the default threshold of 7.0 and shows up in the report. A similar pain mentioned 3 times six months ago might score 5.1 and get filtered out.

YAML configuration

Different use cases need different weights. I configured this via simple YAML:

scoring:
  weights:
    frequency: 0.25
    intensity: 0.20
    specificity: 0.30
    recency: 0.25

  threshold: 7.0

  hard_filters:
    specificity_min: 3.0
    recency_days_max: 90

A product manager looking for feature ideas might boost specificity to 0.40 and drop frequency to 0.15. A founder validating market size bumps frequency to 0.40. A startup optimizing retention emphasizes intensity at 0.35. Each dimension adjusts independently, and the tool validates that weights sum to 1.0.

Hard filters act as gates. A pain point scoring 8.8 but with specificity below 3.0 gets dropped. This enforces data quality without touching the formula.

The optional LLM layer

For users who want deeper analysis, I built an optional two-stage LLM pipeline on top of the base scoring. It is completely opt-in; you only pay if you choose it.

Stage 1: Batch pre-filter (~0.001 USD per post). Classify each high-scoring pain point as relevant, irrelevant, or tangential. Reduces noise, focuses expensive analysis on real signals.

Stage 2: Insight extraction (~0.01 USD per post). Synthesize a 1-2 paragraph brief of each pain point, what the problem is, who is affected, and why it matters.

Budget caps prevent surprise bills. Set a max spend: "Extract insights for top 20 pain points, cap at 0.50 USD." The tool stops when you hit the limit.

The base tool works without LLMs. The LLM layer amplifies it. You are never locked into expensive APIs to get value.

Why not a black-box model

A gradient-boosted ensemble could probably score pain points better than my formula. Those systems might outperform static weights.

I did not build one. You cannot debug a black box. If a model says a pain point scores 8.2, you do not know if it is frequency, intensity, or some learned interaction between them. You cannot change your use case without retraining. You cannot understand it, so you cannot trust it.

My YAML weights mean you see exactly why a pain point scored 8.2 versus 6.1. You run the formula by hand. You argue about weights with your team. You adjust for your specific use case. Most importantly, you know what you are optimizing for.

Transparency beats accuracy when accuracy does not tell you why. For an open-source tool meant to be debuggable and configurable, static weights with YAML config are the right tradeoff.