How We Collect & Analyze Tech Job Data
Our methodology for building the most comprehensive longitudinal dataset of tech hiring trends.
Data Source
Remote Inclusive's primary data source is the monthly "Who is Hiring?" threads on Hacker News (HN). These threads have been a staple of the tech hiring ecosystem since 2011, posted on the first of every month. Each top-level comment represents a single job posting from a company.
We analyze every individual posting from January 2020 onward — covering 75+ consecutive months and more than 36,000 job observations. This makes our dataset one of the most comprehensive longitudinal views of tech hiring sentiment available.
The HN threads skew toward startups, growth-stage companies, and tech-first employers — exactly the segment of the market most responsive to remote work, salary transparency, and AI-driven changes. This makes the dataset particularly valuable for tracking leading indicators of broader tech hiring trends.
Classification Methods
Each job observation is automatically classified across multiple dimensions:
Remote Classification
Postings are classified as remote, hybrid, or onsite based on keyword analysis. Keywords include "remote", "WFH", "distributed", "hybrid", and geographic flexibility indicators.
Salary Detection
Dollar patterns ($NNk, $NNN,NNN), salary/compensation range mentions, total compensation (TC) references, and hourly rate disclosures are detected. Salary ranges are extracted and normalized to annual figures.
Entry-Level Detection
Keywords include: junior, entry-level, new grad, recent grad, 0–2 years experience, jr., intern, and internship. Any match flags the posting as entry-level.
Company Resolution
Company names are normalized and deduplicated across months. Aliases, subsidiaries, and name variations are resolved to canonical company entities, enabling longitudinal tracking per company.
Aggregate Metrics
Beyond per-posting classification, we compute aggregate metrics at both the market and company level:
- Ghost Rate — Percentage of listings showing signs of being stale or inactive (90-day window)
- Repost Rate — Percentage of postings that are repeated from prior months without significant changes (30-day window)
- Stale Rate — Percentage of active postings that have been open beyond typical filling time (30-day window)
- Remote Share — Percentage of a company's active jobs classified as remote
- Compensation Coverage Rate — Percentage of postings that include salary information
- Job Velocity — Rate of new job postings over 7-day and 30-day periods
Limitations
No dataset is perfect. Key limitations to keep in mind:
- HN threads skew toward startups and tech-first companies — large enterprises are underrepresented
- The data reflects job postings (hiring intent), not actual hires or employment
- Keyword-based classification has inherent false-positive and false-negative rates
- Salary data reflects disclosed ranges only — actual compensation may differ
- Monthly thread timing creates a consistent but discrete sampling cadence
Frequently Asked Questions
Where does Remote Inclusive get its data?
Our primary data source is the monthly "Who is Hiring?" threads on Hacker News, which have been posted on the first of every month since 2011. We analyze every individual job posting (top-level comment) in these threads from January 2020 onward — over 36,000 job observations across 75+ months.
How do you detect remote jobs?
We use keyword and pattern matching to identify remote job postings. This includes explicit mentions of "remote", "work from home", "WFH", "distributed", and geographic flexibility indicators. Postings are classified as remote, hybrid, or onsite based on the strongest signal present.
What is ghost rate and how is it calculated?
Ghost rate measures the percentage of a company's job postings that appear to be inactive or "ghost" listings. It is calculated over a 90-day window by comparing postings that show signs of being stale, frequently reposted without changes, or otherwise not genuinely open positions.
How do you detect salary information in postings?
Salary detection uses pattern matching for: dollar amount patterns ($NNk, $NNN,NNN), explicit salary/compensation/pay range mentions, total compensation (TC) references, and hourly rate disclosures. A posting is marked as disclosing salary if any of these patterns are found.
How often is the data updated?
New HN Who is Hiring threads are published on the first of each month and ingested shortly after. Company aggregate metrics (ghost rates, salary coverage, remote share) are recomputed daily. Weekly market reports are generated every Monday.
How do you identify entry-level positions?
Entry-level detection includes mentions of: junior, entry-level, new grad, recent grad, 0–2 years experience, jr., intern, and internship. Postings containing any of these keywords are classified as entry-level opportunities.