All updates

Update · June 2026

Creative Work — the $7 Billion Missing Line Item

This year, an estimated $200 billion will be raised through IPOs by the world's largest developers of AI.

SpaceX prices the largest listing in history, $75 billion at a $1.77 trillion valuation, with xAI folded in since February (Fortune). Anthropic filed for its listing days after a funding round valued it at $965 billion (CNBC). OpenAI followed on 8 June, targeting up to $1 trillion (TechCrunch).

Every model these companies sell was trained on creative work. Books alone made up roughly 16% of GPT-3's training mix (Brown et al., 2020), and the rest came from the open web, with Common Crawl, a public scrape of the internet, supplying more than 80% of the raw tokens used to train GPT-3 (Mozilla Foundation). Anthropic held more than seven million pirated books in its training library (NPR).

So what have creators been paid?

Anthropic has agreed to pay $1.5 billion to settle claims that it pirated around 500,000 books (The Guardian). It is the largest copyright settlement in history (Authors Guild). It works out at roughly $3,000 per book, paid once. Set against the company's valuation, it amounts to 0.15%, or less than twelve days of revenue at the current run rate (CNBC). Apply the settlement's own $3,000 per book to all seven million books in the library and the bill comes to $21 billion, fourteen times what was paid.

OpenAI's News Corp deal is reported at $250 million over five years (WSJ). The terms of its partnership with The Atlantic are undisclosed (OpenAI). Google pays Reddit a reported $60 million a year to train on posts that Reddit's users wrote (CBS News). One widely cited tracker puts the entire disclosed AI content licensing market at roughly $800 million a year, concentrated in a handful of large rights holders (Media & the Machine). The individual creator receives effectively nothing.

What should creators be paid?

Spotify pays around 70% of revenue to rights holders, because the content is the product (Spotify). A traditional publisher pays its author a royalty of 10 to 15% of the hardcover price (The Bindery Agency). American radio stations pay a few per cent of revenue for the songs they play. BMI's blanket licence alone runs at 2.2% of gross revenue (Music Business Worldwide).

AI sits between those examples. The output is transformed but the model is built from the work, and its output competes with creators in their own markets. A defensible band is 5 to 15% of revenue. Take 10% as the central case.

Note the only AI developer to propose a number is Mistral. In March its founders proposed a revenue-based levy on every AI provider operating in Europe, paid into a central cultural fund in exchange for legal certainty (FT), and the company put the rate at 1 to 1.5% of revenue (AFP).

OpenAI and Anthropic will turn over around $72 billion this year between them at current run rates. Roughly $25 billion annualised at OpenAI (Reuters) and $47 billion at Anthropic (CNBC). A 10% royalty pool is $7.2 billion a year: nine times the entire current licensing market, and enough to fund the Anthropic settlement nearly five times over, every year. Even at 5%, the most conservative end of the band, the pool is $3.6 billion, more than four times what the whole industry pays today.

These valuations also price in growth. For the multiples to compress to a mature ten times revenue, income across the three would have to reach $200 billion a year (assuming a $2 trillion valuation for xAI, OpenAI and Anthropic). At that point a 10% pool is $20 billion a year. For scale, ALCS, the UK's collecting society for writers, has distributed £750 million to its 130,000 members in total since 1977 (ALCS).

Who gets what?

A pool is only half of the system, the other half is the split. Treat each creative work as an input whose presence, weight and influence in a model can be measured. Attribution research can trace how much each piece of training data improved a model, and it has produced a striking finding: a well-written book makes a model measurably better even when the model never reproduces a line of it (Wang et al., 2024). A small amount of strong writing teaches a model more than a mountain of mediocre text, which is exactly why the labs went after books and news archives. A creator's share of the pool should be their work's measured share of that contribution.

Collecting societies solved this problem for music. They sample usage statistically and distribute revenue proportionally. AI needs the same rails at AI scale, paying creators at training for what is ingested and at inference for what is used. The European Parliament voted in March for itemised disclosure of every work used in training and fair remuneration for creators (European Parliament). The infrastructure to deliver either does not yet exist.

The Creator Economy for AI

Paying creators fairly, as part of a new creator economy for AI, is a virtuous circle. A functioning and flourishing creator economy is AI's supply chain.

AI rests on three pillars: compute, talent and data, and data has been treated as an externality. This year's IPOs make that literal. Hundreds of billions will be raised for compute and infrastructure, while the creative work that the products are made from appears nowhere on the books.

Note: Please contact me directly if you dispute any of the figures or references in this work.

Anthropic's Claude was used for research, reference sourcing, calculations and drafting support.

References

  1. SpaceX IPO pricing (Fortune): fortune.com
  2. Anthropic IPO filing and revenue run rate (CNBC): cnbc.com
  3. OpenAI IPO filing (TechCrunch): techcrunch.com
  4. GPT-3 training data composition (Brown et al., 2020): arxiv.org/abs/2005.14165
  5. Common Crawl's share of training data (Mozilla Foundation, 2024): mozillafoundation.org
  6. Copyright protection by default (US Copyright Office): copyright.gov
  7. Reddit User Agreement (users retain copyright, grant licence): redditinc.com
  8. Google and Reddit licensing deal (CBS News): cbsnews.com
  9. Anthropic settlement (The Guardian): theguardian.com
  10. Settlement status and terms (Authors Guild): authorsguild.org
  11. Alsup ruling: more than seven million pirated books downloaded (NPR): npr.org
  12. Settlement as share of valuation and days of revenue (calculation): $1.5bn ÷ $965bn = 0.155%; $1.5bn ÷ ($47bn ÷ 365 days) = 11.6 days. Valuation and run rate per CNBC.
  13. Full-library bill (calculation): 7,000,000 books × $3,000 per book = $21bn; $21bn ÷ $1.5bn settlement = 14.
  14. OpenAI and News Corp (WSJ): wsj.com
  15. OpenAI and The Atlantic (OpenAI): openai.com
  16. Sora shutdown and Disney withdrawal (Variety): variety.com
  17. AI content licensing market tracker (Media & the Machine): mediaandthemachine.substack.com
  18. OpenAI revenue (Reuters): finance.yahoo.com
  19. Data attribution without memorisation (Wang et al., 2024): arxiv.org/abs/2406.11011
  20. Author royalty rates, traditional publishing (The Bindery Agency): thebinderyagency.com
  21. US radio music royalty rates, BMI blanket licence (Music Business Worldwide): musicbusinessworldwide.com
  22. Music streaming payouts (Spotify Loud & Clear): loudandclear.byspotify.com
  23. ALCS distributions since 1977: prnewswire.co.uk
  24. Mistral levy proposal (FT): ft.com
  25. Mistral levy rate of 1 to 1.5% (AFP, via Killeen Daily Herald): kdhnews.com
  26. European Parliament resolution on copyright and generative AI, 10 March 2026: europarl.europa.eu