Project Structure

📁 scopescrape/ Project root
📁 scopescrape/ Core package
📄 __init__.py Package init
📄 cli.py Click CLI entrypoint
📄 config.py YAML config loader
📄 scoring.py Pain point scoring engine
📄 signals.py Signal phrase detection
📄 models.py Data models (Post, PainPoint)
📁 adapters/ Platform adapters
📄 __init__.py
📄 base.py Abstract base adapter
📄 reddit.py Reddit (PRAW)
📄 hackernews.py Hacker News (Algolia)
📄 github.py GitHub Discussions (GraphQL)
📄 producthunt.py Product Hunt (GraphQL)
📁 analysis/ AI analysis layer
📄 __init__.py
📄 pipeline.py Two-stage LLM pipeline
📄 providers.py OpenAI / Anthropic abstraction
📁 export/ Export handlers
📄 __init__.py
📄 json_export.py
📄 csv_export.py
📄 parquet_export.py
📁 tests/ Test suite
📄 test_scoring.py
📄 test_signals.py
📁 test_adapters/
📁 dashboard/ Streamlit dashboard
📄 app.py
📄 pyproject.toml Poetry config
📄 scopescrape.yaml.example Config template
📄 README.md

Key Modules

⚙️

cli.py

Click-based CLI with scan, analyze, export, and config commands. Supports YAML config and flag overrides.

🔌

adapters/base.py

Abstract base class defining the adapter interface. fetch(), normalize(), and rate_limit() methods.

🎯

signals.py

60+ signal phrases across 4 tiers. Pattern matching with regex, case-insensitive, context-aware window extraction.

📊

scoring.py

Multi-dimensional scoring: frequency, intensity, specificity, recency. Configurable YAML weights. Hard filter support.

🧠

analysis/pipeline.py

Two-stage LLM pipeline. Stage 1: cheap batch pre-filter. Stage 2: expensive insight extraction. Budget caps.

💾

export/

JSON, CSV, Parquet, SQLite. Consistent schema across all formats. Streaming export for large datasets.