Home/Methodology

Methodology

OpenGov combines official government data with machine analysis to make congressional activity accessible. This page explains exactly where the data comes from, what we do to it, and where the gaps are.

Where the Data Comes From

Congressional Bills & Votes
GovInfo (govinfo.gov)
XML bulk download + sitemap-based incremental sync
DailyU.S. Government Publishing Office
Member Information
congress.gov / bioguide.congress.gov
Member directory sync
WeeklyLibrary of Congress
Campaign Finance
FEC.gov
Direct contributions from PAS2 bulk files, independent expenditures from Schedule E API
WeeklyFederal Election Commission
Endorsements
Public endorsement lists + FEC data
Curated from public endorsement lists and FEC independent expenditure filings
As availableMultiple public sources
Social Media
Bluesky API, YouTube RSS, Senate press releases
Direct API (Bluesky), RSS feeds (YouTube, press releases)
Every 2-6 hoursOfficial member accounts
Executive Actions
Federal Register API
Executive orders, memoranda, proclamations
DailyNational Archives

What We Cover

12,337
Bills Tracked
119th Congress (2025-2026)
1,021
Roll-Call Votes
119th Congress
548
Members of Congress
Senate + House

119th Congress (2025-2026): Full coverage — all bills, votes, and members.

118th Congress (2023-2024): Partial — bills and votes loaded, analysis coverage varies.

Scope: Federal only. State legislatures are on the roadmap but not yet available.

How Often It Updates

Bills and votes
Daily (6:00 UTC)Automated sync from GovInfo
Social media posts
Every 2-6 hoursBluesky (2h), YouTube + press releases (6h)
News heat scores
3x dailyAggregated from public news APIs
Member profiles and analysis
WeeklyRecomputed and cached in Redis (7-day TTL)
Issue classification
ContinuousNew bills classified within 24 hours of ingestion
Campaign finance
WeeklyFEC bulk files and Schedule E API

What's Machine-Generated vs Raw

Some data on OpenGov comes directly from government sources with no processing. Other data is generated by AI models. This table shows exactly which is which.

Data TypeSourceMethod
Vote totals, sponsor lists, bill statuscongress.gov / GovInfoRaw -- no AI processing
FEC contribution totalsFEC.govRaw -- no AI processing
Plain-English bill summariesLLM-generatedClaude (Haiku) via Claude CLI. ~98% of 119th bills covered.
Stance classification (expand/restrict/etc.)LLM-generatedClaude (Haiku). 15-verb constrained vocabulary. ~98% coverage.
Policy direction labelsLLM-generatedClaude (Haiku). Binary expand/restrict classification. ~94% coverage.
Position synthesis and themesLLM-generatedClaude (Opus). Aggregated from member voting + sponsorship patterns.
Bill similarity scoresVector embeddingsall-MiniLM-L6-v2 (384d), cosine similarity via Neo4j vector index
Bill connection pathsGraph traversalNeo4j shortestPath algorithm, no AI
Social post issue taggingLLM-generatedClaude (Haiku) with keyword fallback

How Bills Are Classified

Bills flow through a 5-stage pipeline that combines official government taxonomy with semantic search and quality validation.

1

Official Data Ingestion

Bill text, metadata, and legislative subjects are downloaded from GovInfo. Executive actions come from the Federal Register API. Data is synced daily via sitemaps to capture new and amended legislation.

12,300+ bills from the 119th Congress across all 8 bill types (H.R., S., H.J.Res, S.J.Res, H.Con.Res, S.Con.Res, H.Res, S.Res).

2

CRS Classification

Each bill is classified using its Congressional Research Service (CRS) policy area -- the same taxonomy used by the Library of Congress. CRS policy areas are mapped to our 42 tracked issues through a hand-curated configuration with disambiguation rules for overlapping areas.

33 CRS policy areas mapped to 11 themes and 42 issues. When a CRS area covers multiple issues (e.g., "Crime and Law Enforcement" spans gun rights, public safety, and criminal justice), legislative subjects and title keywords disambiguate the classification.

3

Semantic Search & Ranking

Bill text is chunked and embedded using sentence-transformers (all-MiniLM-L6-v2). For each issue, a hybrid search combines vector similarity with keyword matching, scoped to bills in the relevant CRS policy areas.

384-dimensional embeddings, top-100 chunk retrieval per issue, keyword boosting for domain-specific terms.

4

Heat Score & Coverage Bypass

A heat score (0-15) measures each bill's legislative momentum -- factoring in cosponsor count, committee advancement, floor votes, and enactment. High-heat bills that semantic search missed are injected into results, ensuring legislatively important bills are never overlooked.

4-stage retrieval: CRS graph scoping, vector search, heat-score bypass (CRS-scoped), cross-CRS keyword bypass. Simple resolutions (commemorative/symbolic) are excluded from bypass to maintain precision.

5

Validation Against Golden Sets

Retrieval quality is measured against curated golden sets -- hand-verified lists of bills that must appear (recall) and must not appear (precision) for each issue. This provides objective, reproducible quality metrics.

42 golden sets with 5-18 must-find bills each. Overall: 94% recall, 92% rejection rate. 23 issues achieve 100% recall.

Neutrality

All generated text follows a strict neutrality policy: numbers and comparisons only, never judgment adjectives. We do not say a bill is "good" or "bad" -- we show what it does and let you decide.

--We don't editorialize -- bills are classified by CRS taxonomy, not editorial judgment
--We don't predict outcomes -- heat scores measure momentum, not probability of passage
--We don't use partisan data sources -- all data comes from official government repositories
--We don't hide our methodology -- quality metrics are measured and disclosed on this page

How We Order Things

The order in which candidates and issues appear on screen can imply preference or importance, even unintentionally. We take this seriously. Here's how we handle it.

Candidates

Candidate lists are randomized per session. Each time you visit, candidates appear in a different order. No candidate gets persistent top placement. We use a session-based seed so the order stays consistent while you browse (no jarring re-shuffles), but changes when you return later.

Issues

Issue lists are ordered by Congressional activity — how many bills and votes are happening on that topic right now. If you've selected demographics (like "renter" or "parent"), issues that affect people like you are shown first. We never editorially pick which issues are more important.

What We Don't Do

We don't sort candidates by party, fundraising, poll numbers, or incumbency status. We don't prioritize issues by controversy or newsworthiness. We don't use engagement metrics or clicks to reorder content. Every ordering decision is either randomized, data-driven, or personalized by your own selections.

Types of Evidence

When we show what a candidate has done on an issue, each piece of evidence is labeled by its source type. Not all evidence carries the same weight — a floor vote is an official action, while a campaign statement is a promise. We show you the difference so you can judge for yourself.

Source TypeWhat It MeansStrength
Floor VoteThe candidate voted YES or NO on a bill in the Senate or House. This is an official, recorded action.Official record
Bill SponsoredThe candidate sponsored or co-sponsored a bill. Sponsorship indicates active support for the legislation.Official record
Campaign SiteA statement from the candidate's official campaign website, extracted verbatim.Candidate's own words
Social MediaA post from the candidate's verified social media accounts (Bluesky, YouTube, press releases).Candidate's own words

When we have no evidence for a candidate on a topic, we say so explicitly: "No public record found." This does not mean the candidate has no position — it means we haven't found evidence in our sources yet.

How Suggested Topics Are Generated

When you see suggested topics on a race page, those are generated from what's actually happening in Congress — issues with the most bill introductions, committee activity, and floor votes in the current session.

We display topics as neutral labels (e.g., "Immigration" not "Border crisis") to avoid framing bias. The labels come from our standardized issue taxonomy, not editorial choices.

If you've selected demographics, suggested topics are re-ranked to show issues that affect people like you first. This changes the order, not the content — you can still see all topics by scrolling.

Known Limitations

  • *Bill gap (Jan 2 - Apr 8, 2026): Bills introduced between January 2 and April 8, 2026 are being backfilled due to a sync pipeline issue. Everything before January 2 is complete.
  • *Member bios: Not yet populated for all members. Coming soon.
  • *Graph features: "Bridge senators" and "voting blocs" are in development. Current versions have methodological issues and are not publicly surfaced.
  • *X/Twitter integration: Not yet available due to API cost constraints. Bluesky, YouTube, and Senate press releases are covered.
  • *State legislatures: Federal only. State coverage is on the roadmap.

How to Report a Data Issue

If you find incorrect data -- a wrong vote count, a misclassified bill, a broken link -- please open an issue on our GitHub repository or email [email protected]. We take data accuracy seriously and will investigate every report.