

They are. They record the data, stealing it. They search it (or characteristics of it), and reprint it (in whole or in part) upon request.
Viewing it as something creative, or other than a glorified remixing machine is the problem. It’s a search engine for creative works they’ve stolen, and reproduce parts of.
They search the data-space of what they’re “trained” on (our content, the content of human beings), and reproduce statistically defined elements of it.
They’re search engines that have stolen what they’re “trained on”, and reproduce it as “results” (be that images or written text, it has to come from our collective data. Data we created). It’s theft. It’s copywrite fraud. Same as google stealing books (which they had to he sued over the digitizing of, and enter into rights agreements over).
Searching and reproducing content they’ve already recorded (aka stolen without permission), is absolutely part of what they are. Part of what they do.
Don’t stan for them or pretend they’re creative, intelligent, or doing anything original.
The real lie is that it’s “training data”. It’s not. It’s the internet, and it’s not training - it’s theft, it’s stealing and copying (violating copyright). Digital stealing, and processing into a “data set”, a representation or repackaging of our original works.
“data gathering” and “training data” is just what they’ve tricked you into calling it (just like they tried to trick people into calling it an “intelligence”).
It’s not data gathering, it’s stealing. It’s not training data, it’s our original work.
It’s not creating anything, it’s searching and selectively remixing the human creative work of the internet.