February 2026
We built a dataset of 338 Hacker News (HN) discussion threads covering 57 IPOs (2011–2024) and tested whether this technically sophisticated crowd can predict stock performance. The direction of HN sentiment—bullish vs. bearish—predicts nothing. But companies where the crowd reaches clear consensus—whether bullish or bearish—underperform companies that generate mixed reactions. The 28 consensus-labeled IPOs trail the 29 mixed-labeled ones by 27pp at one year (p = 0.033) and 60pp at two years (p = 0.009; permutation p = 0.008; multiple-testing adjusted p = 0.063). The consensus group is 86% bearish; a bearish-only test confirms the two-year result (p = 0.010). The finding is ordinal—it depends on the LLM’s categorical label, not the numeric score—and exploratory.
We used a multi-pass LLM pipeline to go from raw HN threads to structured sentiment:
Collect. Searched Algolia for IPO/S-1 discussions → 4,838 stories.
Filter. Claude Haiku 4.5 classified story relevance → 596 IPO stories.
Extract. Fetched 48K comments; Haiku tagged each as financially insightful or noise.
Score. Claude Sonnet 4.5 read all insightful comments per thread and rated sentiment on six dimensions (growth, profitability, moat, valuation, market opportunity, risk), producing a composite score on [−1, +1] via s = (d̄ − 5.5)/4.5 → 338 scored threads.
Match. Merged with stock prices (Yahoo Finance); excess returns = stock buy-and-hold return minus S&P 500 return over the same calendar window, from IPO-date close → 57 companies with full data.
The two-pass design matters: Haiku is cheap and fast for filtering tens of thousands of comments; Sonnet handles the harder synthesis task. This handles sarcasm and technical jargon far better than keyword-counting .
58% of threads are classified bearish. Only 3% are bullish. The mean company-level composite sentiment score is −0.20. HN commenters are, on average, deeply skeptical of companies going public—consistent with the tech community’s tendency to view IPOs as insider liquidity events .
Does it matter whether HN is bullish or bearish? No.
| Horizon | ρ | p | n |
|---|---|---|---|
| 30-day | 0.16 | 0.25 | 57 |
| 90-day | 0.04 | 0.79 | 57 |
| 180-day | 0.06 | 0.68 | 57 |
| 1-year | 0.05 | 0.69 | 57 |
| 2-year | 0.11 | 0.40 | 57 |
An OLS regression of 1-year excess returns on sentiment yields R2 < 0.001. Logistic regression predicting whether a company beats the index: pseudo-R2 = 0.011. The directional signal is dead.
We classify each company by its dominant LLM sentiment label—bearish or bullish (“consensus”) vs. mixed (“ambiguous”). In practice, all 29 ambiguous companies have dominant label “mixed”; no company-level neutral dominants exist. Both consensus extremes underperform the mixed middle:
| Label | n | Mean | Med. | Beat Mkt |
|---|---|---|---|---|
| Bullish | 4 | −22% | −44% | 25% |
| Mixed | 29 | +15% | −5% | 45% |
| Bearish | 24 | −11% | −32% | 29% |
Mann-Whitney tests comparing the 29 mixed-labeled companies against the 28 consensus-labeled ones:
| Horizon | Mean Spread | p | n |
|---|---|---|---|
| 1-year | +27pp | 0.033 | 29 vs. 28 |
| 2-year | +60pp | 0.009 | 29 vs. 28 |
This finding is ordinal: it depends on the LLM’s categorical judgment, not the numeric score. The continuous Spearman correlation between |si| and returns is non-significant (ρ = −0.07, p = 0.62 at 2 years) under six-dimensional scoring. The signal lives in the qualitative distinction between consensus and ambiguity, not in the precise numeric magnitude.
Composition. The consensus group is 24 bearish + 4 bullish. Testing bearish-only (n = 24) vs. mixed (n = 29): p = 0.051 at 1yr, p = 0.010 at 2yr. The 4 bullish companies all underperform (mean 1yr: −22%) but are too few for standalone inference. The consensus penalty is most defensibly a bearish-consensus effect at 1 year and a general consensus effect at 2 years.
Permutation placebo. We shuffle the mapping between labels and returns 100,000 times. The observed U = 556 (2yr) exceeds 99.2% of the null distribution (emp. p = 0.008). The result strengthens when restricted to the 27 multi-story companies (p = 0.014).
Not just means. The 1-year mean spread (+27pp) is sensitive to Peloton (+269%). But the median spread—+28pp at 1yr, +47pp at 2yr—barely moves when you remove the top 3 winners (+26pp, +40pp). At 2 years, the consensus group’s 75th percentile (−22%) is below the mixed group’s median (−15%): almost the entire distribution shifts. Only 18% of consensus-labeled companies beat the market at 2 years, vs. 48% of mixed.
A better framing than “consensus predicts underperformance”: disagreement predicts outcome dispersion and upside. The mixed group has wider return variance and higher mean returns. Three candidate mechanisms, not mutually exclusive:
Uncertainty pricing. IPOs where informed observers disagree represent genuinely unresolved uncertainty. Returns are bounded at −100% but unbounded above, so unresolved uncertainty carries positive skew. predicts that disagreement among investors leads to overpricing, but our crowd isn’t setting prices—their disagreement may proxy for fundamental ambiguity that the market hasn’t fully priced.
Attention. Consensus-generating IPOs are, by definition, attention-grabbing. High-attention IPOs may be more efficiently priced, leaving less room for outperformance . Disagreement proxies for lower attention.
Narrative simplicity. Clear consensus correlates with simple stories. Companies with complex prospects—the ones that generate mixed reactions—may be where pricing errors persist.
n = 57 is small. The 2-year result (p = 0.009; permutation p = 0.008) is significant, but under Benjamini-Hochberg correction across 14 independent tests, the adjusted p = 0.063—borderline. The 1-year result (p = 0.033) does not survive correction. Bootstrap CIs for the mean difference include zero ([−0.07, +0.63] at 1yr; [−0.17, +1.33] at 2yr).
The consensus finding is exploratory—it emerged from the data, not a pre-registered hypothesis.
The consensus group is 86% bearish. The “consensus regardless of direction” claim rests on 4 bullish companies.
The sample is 86% tech companies. We can’t say anything about non-tech IPOs.
Sentiment labels come from Claude Sonnet 4.5. A different LLM could produce different labels and different results.
Requiring 1-year stock data excludes companies that were acquired or delisted quickly—precisely the extreme outcomes most relevant to our hypothesis.
This is not a trading signal.
We propose an out-of-sample test: freeze the pipeline, collect prospectively on new IPOs from March 2026, pre-register the hypothesis (mixed labels outperform consensus), and evaluate after 30+ new IPOs have 1-year return data.
Data & code: https://github.com/no-way-labs/
99
Antweiler, W. and Frank, M.Z. (2004). Is all that talk just noise? The Journal of Finance, 59(3):1259–1294.
Anthropic. (2025). Claude model family. Technical documentation.
Baker, M. and Wurgler, J. (2007). Investor sentiment in the stock market. Journal of Economic Perspectives, 21(2):129–152.
Barber, B.M. and Odean, T. (2008). All that glitters: The effect of attention and news on the buying behavior of individual and institutional investors. The Review of Financial Studies, 21(2):785–818.
Chen, H., De, P., Hu, Y.J., and Hwang, B.H. (2014). Wisdom of crowds: The value of stock opinions transmitted through social media. The Review of Financial Studies, 27(5):1367–1403.
Da, Z., Engelberg, J., and Gao, P. (2011). In search of attention. The Journal of Finance, 66(5):1461–1499.
Miller, E.M. (1977). Risk, uncertainty, and divergence of opinion. The Journal of Finance, 32(4):1151–1168.
Ritter, J.R. (1991). The long-run performance of initial public offerings. The Journal of Finance, 46(1):3–27.