This year, an estimated $200 billion will be raised through IPOs by the world's largest developers of AI.
SpaceX prices the largest listing in history, $75 billion at a $1.77 trillion valuation, with xAI folded in since February (Fortune). Anthropic filed for its listing days after a funding round valued it at $965 billion (CNBC). OpenAI followed on 8 June, targeting up to $1 trillion (TechCrunch).
Every model these companies sell was trained on creative work. Books alone made up roughly 16% of GPT-3's training mix (Brown et al., 2020), and the rest came from the open web, with Common Crawl, a public scrape of the internet, supplying more than 80% of the raw tokens used to train GPT-3 (Mozilla Foundation). Anthropic held more than seven million pirated books in its training library (NPR).
So what have creators been paid?
Anthropic has agreed to pay $1.5 billion to settle claims that it pirated around 500,000 books (The Guardian). It is the largest copyright settlement in history (Authors Guild). It works out at roughly $3,000 per book, paid once. Set against the company's valuation, it amounts to 0.15%, or less than twelve days of revenue at the current run rate (CNBC). Apply the settlement's own $3,000 per book to all seven million books in the library and the bill comes to $21 billion, fourteen times what was paid.
OpenAI's News Corp deal is reported at $250 million over five years (WSJ). The terms of its partnership with The Atlantic are undisclosed (OpenAI). Google pays Reddit a reported $60 million a year to train on posts that Reddit's users wrote (CBS News). One widely cited tracker puts the entire disclosed AI content licensing market at roughly $800 million a year, concentrated in a handful of large rights holders (Media & the Machine). The individual creator receives effectively nothing.
What should creators be paid?
Spotify pays around 70% of revenue to rights holders, because the content is the product (Spotify). A traditional publisher pays its author a royalty of 10 to 15% of the hardcover price (The Bindery Agency). American radio stations pay a few per cent of revenue for the songs they play. BMI's blanket licence alone runs at 2.2% of gross revenue (Music Business Worldwide).
AI sits between those examples. The output is transformed but the model is built from the work, and its output competes with creators in their own markets. A defensible band is 5 to 15% of revenue. Take 10% as the central case.
Note the only AI developer to propose a number is Mistral. In March its founders proposed a revenue-based levy on every AI provider operating in Europe, paid into a central cultural fund in exchange for legal certainty (FT), and the company put the rate at 1 to 1.5% of revenue (AFP).
OpenAI and Anthropic will turn over around $72 billion this year between them at current run rates. Roughly $25 billion annualised at OpenAI (Reuters) and $47 billion at Anthropic (CNBC). A 10% royalty pool is $7.2 billion a year: nine times the entire current licensing market, and enough to fund the Anthropic settlement nearly five times over, every year. Even at 5%, the most conservative end of the band, the pool is $3.6 billion, more than four times what the whole industry pays today.
These valuations also price in growth. For the multiples to compress to a mature ten times revenue, income across the three would have to reach $200 billion a year (assuming a $2 trillion valuation for xAI, OpenAI and Anthropic). At that point a 10% pool is $20 billion a year. For scale, ALCS, the UK's collecting society for writers, has distributed £750 million to its 130,000 members in total since 1977 (ALCS).
Who gets what?
A pool is only half of the system, the other half is the split. Treat each creative work as an input whose presence, weight and influence in a model can be measured. Attribution research can trace how much each piece of training data improved a model, and it has produced a striking finding: a well-written book makes a model measurably better even when the model never reproduces a line of it (Wang et al., 2024). A small amount of strong writing teaches a model more than a mountain of mediocre text, which is exactly why the labs went after books and news archives. A creator's share of the pool should be their work's measured share of that contribution.
Collecting societies solved this problem for music. They sample usage statistically and distribute revenue proportionally. AI needs the same rails at AI scale, paying creators at training for what is ingested and at inference for what is used. The European Parliament voted in March for itemised disclosure of every work used in training and fair remuneration for creators (European Parliament). The infrastructure to deliver either does not yet exist.
The Creator Economy for AI
Paying creators fairly, as part of a new creator economy for AI, is a virtuous circle. A functioning and flourishing creator economy is AI's supply chain.
AI rests on three pillars: compute, talent and data, and data has been treated as an externality. This year's IPOs make that literal. Hundreds of billions will be raised for compute and infrastructure, while the creative work that the products are made from appears nowhere on the books.
Note: Please contact me directly if you dispute any of the figures or references in this work.
Anthropic's Claude was used for research, reference sourcing, calculations and drafting support.