Methodology
OpenGov combines official government data with machine analysis to make congressional activity accessible. This page explains exactly where the data comes from, what we do to it, and where the gaps are.
Where the Data Comes From
What We Cover
119th Congress (2025-2026): Full coverage — all bills, votes, and members.
118th Congress (2023-2024): Partial — bills and votes loaded, analysis coverage varies.
Scope: Federal only. State legislatures are on the roadmap but not yet available.
How Often It Updates
What's Machine-Generated vs Raw
Some data on OpenGov comes directly from government sources with no processing. Other data is generated by AI models. This table shows exactly which is which.
| Data Type | Source | Method |
|---|---|---|
| Vote totals, sponsor lists, bill status | congress.gov / GovInfo | Raw -- no AI processing |
| FEC contribution totals | FEC.gov | Raw -- no AI processing |
| Plain-English bill summaries | LLM-generated | Claude (Haiku) via Claude CLI. ~98% of 119th bills covered. |
| Stance classification (expand/restrict/etc.) | LLM-generated | Claude (Haiku). 15-verb constrained vocabulary. ~98% coverage. |
| Policy direction labels | LLM-generated | Claude (Haiku). Binary expand/restrict classification. ~94% coverage. |
| Position synthesis and themes | LLM-generated | Claude (Opus). Aggregated from member voting + sponsorship patterns. |
| Bill similarity scores | Vector embeddings | all-MiniLM-L6-v2 (384d), cosine similarity via Neo4j vector index |
| Bill connection paths | Graph traversal | Neo4j shortestPath algorithm, no AI |
| Social post issue tagging | LLM-generated | Claude (Haiku) with keyword fallback |
How Bills Are Classified
Bills flow through a 5-stage pipeline that combines official government taxonomy with semantic search and quality validation.
Official Data Ingestion
Bill text, metadata, and legislative subjects are downloaded from GovInfo. Executive actions come from the Federal Register API. Data is synced daily via sitemaps to capture new and amended legislation.
12,300+ bills from the 119th Congress across all 8 bill types (H.R., S., H.J.Res, S.J.Res, H.Con.Res, S.Con.Res, H.Res, S.Res).
CRS Classification
Each bill is classified using its Congressional Research Service (CRS) policy area -- the same taxonomy used by the Library of Congress. CRS policy areas are mapped to our 42 tracked issues through a hand-curated configuration with disambiguation rules for overlapping areas.
33 CRS policy areas mapped to 11 themes and 42 issues. When a CRS area covers multiple issues (e.g., "Crime and Law Enforcement" spans gun rights, public safety, and criminal justice), legislative subjects and title keywords disambiguate the classification.
Semantic Search & Ranking
Bill text is chunked and embedded using sentence-transformers (all-MiniLM-L6-v2). For each issue, a hybrid search combines vector similarity with keyword matching, scoped to bills in the relevant CRS policy areas.
384-dimensional embeddings, top-100 chunk retrieval per issue, keyword boosting for domain-specific terms.
Heat Score & Coverage Bypass
A heat score (0-15) measures each bill's legislative momentum -- factoring in cosponsor count, committee advancement, floor votes, and enactment. High-heat bills that semantic search missed are injected into results, ensuring legislatively important bills are never overlooked.
4-stage retrieval: CRS graph scoping, vector search, heat-score bypass (CRS-scoped), cross-CRS keyword bypass. Simple resolutions (commemorative/symbolic) are excluded from bypass to maintain precision.
Validation Against Golden Sets
Retrieval quality is measured against curated golden sets -- hand-verified lists of bills that must appear (recall) and must not appear (precision) for each issue. This provides objective, reproducible quality metrics.
42 golden sets with 5-18 must-find bills each. Overall: 94% recall, 92% rejection rate. 23 issues achieve 100% recall.
Neutrality
All generated text follows a strict neutrality policy: numbers and comparisons only, never judgment adjectives. We do not say a bill is "good" or "bad" -- we show what it does and let you decide.
How We Order Things
The order in which candidates and issues appear on screen can imply preference or importance, even unintentionally. We take this seriously. Here's how we handle it.
Candidates
Candidate lists are randomized per session. Each time you visit, candidates appear in a different order. No candidate gets persistent top placement. We use a session-based seed so the order stays consistent while you browse (no jarring re-shuffles), but changes when you return later.
Issues
Issue lists are ordered by Congressional activity — how many bills and votes are happening on that topic right now. If you've selected demographics (like "renter" or "parent"), issues that affect people like you are shown first. We never editorially pick which issues are more important.
What We Don't Do
We don't sort candidates by party, fundraising, poll numbers, or incumbency status. We don't prioritize issues by controversy or newsworthiness. We don't use engagement metrics or clicks to reorder content. Every ordering decision is either randomized, data-driven, or personalized by your own selections.
Types of Evidence
When we show what a candidate has done on an issue, each piece of evidence is labeled by its source type. Not all evidence carries the same weight — a floor vote is an official action, while a campaign statement is a promise. We show you the difference so you can judge for yourself.
| Source Type | What It Means | Strength |
|---|---|---|
| Floor Vote | The candidate voted YES or NO on a bill in the Senate or House. This is an official, recorded action. | Official record |
| Bill Sponsored | The candidate sponsored or co-sponsored a bill. Sponsorship indicates active support for the legislation. | Official record |
| Campaign Site | A statement from the candidate's official campaign website, extracted verbatim. | Candidate's own words |
| Social Media | A post from the candidate's verified social media accounts (Bluesky, YouTube, press releases). | Candidate's own words |
When we have no evidence for a candidate on a topic, we say so explicitly: "No public record found." This does not mean the candidate has no position — it means we haven't found evidence in our sources yet.
How Suggested Topics Are Generated
When you see suggested topics on a race page, those are generated from what's actually happening in Congress — issues with the most bill introductions, committee activity, and floor votes in the current session.
We display topics as neutral labels (e.g., "Immigration" not "Border crisis") to avoid framing bias. The labels come from our standardized issue taxonomy, not editorial choices.
If you've selected demographics, suggested topics are re-ranked to show issues that affect people like you first. This changes the order, not the content — you can still see all topics by scrolling.
Known Limitations
- *Bill gap (Jan 2 - Apr 8, 2026): Bills introduced between January 2 and April 8, 2026 are being backfilled due to a sync pipeline issue. Everything before January 2 is complete.
- *Member bios: Not yet populated for all members. Coming soon.
- *Graph features: "Bridge senators" and "voting blocs" are in development. Current versions have methodological issues and are not publicly surfaced.
- *X/Twitter integration: Not yet available due to API cost constraints. Bluesky, YouTube, and Senate press releases are covered.
- *State legislatures: Federal only. State coverage is on the roadmap.
How to Report a Data Issue
If you find incorrect data -- a wrong vote count, a misclassified bill, a broken link -- please open an issue on our GitHub repository or email [email protected]. We take data accuracy seriously and will investigate every report.