I Built a System That Reads the Fine Print of 969 Corporate Legal Documents. Here's What They Say.
I teach law and I practice it. One of the things that’s always bothered me is that companies can change the legal terms that bind their users — hundreds of millions of people — and nobody notices. The old version vanishes. No record, no diff, no way to prove what it used to say. Private lawmaking is everywhere, but it lacks the basic infrastructure we expect in public lawmaking — notice, stable publication, and an intelligible record of what changed and when.
So I built TOS Tracker. It monitors Terms of Service, Privacy Policies, and other legal documents from nearly a thousand companies, government agencies, and organizations. Every few hours it checks each one for changes, computes a SHA-256 hash for verification, archives each version to the Internet Archive, and extracts specific legal clauses for cross-company comparison.
I’m now sitting on a dataset of 32 types of legal clauses extracted across 969 active documents. Here’s what’s actually in all that fine print.
What We’re Looking At
TOS Tracker watches:
- 684 Terms of Service from tech companies, banks, hospitals, government agencies, and more
- 241 Privacy Policies from the same organizations
- 44 other documents — EULAs, Acceptable Use Policies, DMCA policies, cookie policies, community guidelines
The coverage spans Big Tech, finance, healthcare, retail, telecom, gaming, government, and SaaS. And none of these documents are static — every company in our database reserves the right to rewrite their terms whenever they want.
What “I Agree” Actually Means
When you click “I Agree,” you’re consenting to specific legal provisions — provisions that most people never read and wouldn’t understand if they did. Here’s what the clause extraction data shows.
Mandatory Arbitration Is Everywhere
66 companies in the dataset require mandatory arbitration. If you have a dispute with them, you don’t get to go to court. You go to a private arbitrator — with limited discovery, no jury, and restricted appeal.
Some are at least upfront about it:
CVS: “THIS AGREEMENT INCLUDES AN ARBITRATION PROVISION, JURY TRIAL WAIVER, AND A CLASS ACTION WAIVER THAT AFFECT YOUR RIGHTS. IN ARBITRATION, THERE IS NO JUDGE OR JURY, AND THERE IS LESS DISCOVERY AND APPELLATE REVIEW THAN IN COURT.”
Cloudflare: “THIS AGREEMENT CONTAINS PROVISIONS REQUIRING THAT YOU AGREE TO THE USE OF ARBITRATION TO RESOLVE ANY DISPUTES ARISING UNDER THIS AGREEMENT RATHER THAN A JURY TRIAL OR ANY OTHER COURT PROCEEDINGS.”
Others bury it. Amazon puts it plainly enough — “Any dispute or claim relating in any way to your use of any Amazon Service will be resolved by binding arbitration, rather than in court” — but you’d have to scroll past thousands of words to find it. Bluesky at least puts a flag at the top: “Important Note: These Terms contain an agreement to resolve most disputes between us through arbitration.”
Sixty-six companies. Same basic clause. Billions of users bound by it.
32 Companies Block Class Actions
32 companies go further with explicit class action waivers. Even if a company does the same thing to millions of users, each one has to bring their claim alone.
The list includes Adobe, Microsoft, Netflix, Amazon, Discord, Dropbox, Epic Games, Etsy, Lyft, The New York Times, Coinbase, and CVS.
Dropbox is blunt about it:
“You may only resolve disputes with us on an individual basis, and may not bring a claim as a plaintiff or a class member in a class, consolidated, or representative action.”
Here’s one that caught my eye: Epic Games lets you opt out of arbitration within 30 days. But you cannot opt out of the class action waiver. Read that again. Even if you exercise your right to reject arbitration, you still can’t join a class action. The waiver survives.
The AI Training Land Grab
This is the one I’d tell my students to watch closely. A handful of major companies have written explicit AI training rights into their terms, and the approaches are all over the map.
Companies that say they’ll train on your stuff:
HubSpot doesn’t mince words:
“We may use Customer Data to develop, support, and improve HubSpot AI features and functionality. We may use Customer Data to train our AI models and similar products and services that rely on machine learning. You instruct us to use Customer Data to train HubSpot AI models.”
Catch that last sentence? “You instruct us.” By accepting their terms, you’re supposedly giving them an affirmative instruction to feed your data into their models.
Canva is training on your private uploads:
“We use content and media in user’s private accounts (such as photos, videos and audio) to train our models to apply machine learning to new unseen media.”
Meta frames it as a public good — using AI “so that people can use our Products safely regardless of physical ability or geographic location.” Their Instagram privacy policy is less diplomatic: “Develop and improve AI at Meta for Meta Products and for third parties.”
Snap and Shopify both describe using user data for machine learning, though with varying degrees of specificity about what that means.
Companies that say you can’t train on their stuff:
“You may not use the AI services, or data from the AI services, to create, train, or improve (directly or indirectly) any AI technology.”
“[You may not] develop any products or services that compete with our Services, including to develop or train any artificial intelligence or machine learning algorithms or models.”
So: they can train on your data. You can’t train on their outputs. I’ll leave it to the reader to decide how they feel about that arrangement.
One company worth noting: Zoom explicitly commits to not training on your content: “Zoom does not use any of your audio, video, chat, screen sharing, attachments or other communications-like Customer Content to train Zoom or third-party artificial intelligence models.” That’s a meaningful commitment in a landscape where most companies are going the other direction.
They Can Change Everything, Anytime
58 companies include unilateral modification clauses — they can rewrite their terms whenever they want, and your continued use counts as agreement.
AT&T’s is typical:
“AT&T may change or modify the Terms from time-to-time without notice other than posting the amended Terms on the Site. The amended Terms will automatically be effective when posted on our Site.”
No email. No notification. They post it on a page nobody visits, and it takes effect immediately. That’s the standard approach across the industry.
This is exactly why I built the tracker. If every company reserves the right to change the rules at any time, somebody should be watching when they do.
Your Data Gets Around
187 companies have clauses about sharing your data with third parties. 110 address data sale or transfer. The language is worth reading carefully.
The American Bar Association — which represents lawyers — is at least honest about it:
“We may sell your general data and industry and business data with third parties to offer you products and services that may be of interest to you.”
Compare that with Anthropic:
“Anthropic does not ‘sell’ your personal data as that term is defined by applicable laws and regulations.”
See that phrase — “as that term is defined”? The legal definition of “sale” under the CCPA is specific enough that a lot of data-sharing arrangements don’t technically count. The word “sell” in a privacy policy and the word “sell” in ordinary English are doing different things.
Government Data Requests
22 companies spell out what happens when the government comes knocking. Dropbox makes one of the stronger commitments I’ve seen: “No matter how the Services change, we won’t share your content with others, including law enforcement, for any purpose unless you direct us to.”
Most companies just say they’ll comply with legal process. Few make affirmative promises about when they’ll push back.
California Runs the Table
Governing law clauses are dominated by California, followed by Washington (Amazon, Microsoft), New York (financial services), then Texas, Virginia, and Delaware. If you use the internet, there’s a good chance your legal rights are governed by California law, regardless of where you live.
Why I Built This
I got tired of having to take companies at their word about what their terms said last month. Every version in TOS Tracker gets a SHA-256 hash, an Internet Archive submission, and a permanent URL. If someone cites a specific version of Google’s privacy policy in a law review article, that citation should still work in ten years.
I also wanted to see the patterns. It’s one thing to know that Amazon has an arbitration clause. It’s another to see that 66 companies use essentially the same language, or that a growing number of companies are writing AI training rights into their terms while simultaneously prohibiting you from doing the same thing with their outputs.
This data should be public and it should be free. So it is.
See It Yourself
- Compare clauses across companies — Arbitration, data sharing, AI training, and 29 other clause types, side by side
- Browse recent changes — A running feed of document changes as they happen
- Search everything — Full-text search with boolean operators across every tracked document
- Trends — Aggregate data and patterns
- For researchers — Citation tools, methodology, and API access
If you work in this area — academic research, journalism, regulation, compliance — I’d like to hear from you. andrew@leahey.org.
All clause excerpts are verbatim from the companies’ own published legal documents, extracted by TOS Tracker’s automated clause analysis system. Data as of February 6, 2026. Nothing in this post is legal advice.