The Chinese Communist Party’s (CCP’s) national internet censor just announced that all AI-generated content will be required to have labels that are explicitly seen or heard by its audience and embedded in metadata. The Cyberspace Administration of China (CAC) just released the transcript for the media questions and answers (akin to an FAQ) on its Measures for the Identification of Artificial Intelligence Generated and Synthetic Content [machine translated]. We saw the first signs of this policy move last September when the CAC’s draft plans emerged.
This regulation takes effect on September 1, 2025, and will compel all service providers (i.e., AI LLMs) to “add explicit labels to generated and synthesized content.” The directive includes all types of data: text, images, videos, audio, and even virtual scenes. Aside from that, it also orders app stores to verify whether the apps they host follow the regulations.
Users will still be able to ask for unlabeled AI-generated content for “social concerns and industrial needs.” However, the generating app must reiterate this requirement to the user and also log the information to make it easier to trace. The responsibility of adding the AI-generated label and metadata falls on the shoulders of this end-user person or entity.
…I’m…
In full agreement with this*
*with the provision that there are ways to ensure this isn’t weaponized so that dissident or oppositional speech/photos/art isn’t flagged as AI so that it can be filtered out.
Or propaganda that doesn’t have it is taken as legitimate.
So in short you disagree which is reasonable given the circumstances.
Besides, wouldn‘t it make much more sense to verify and mark genuine content rather than the slob which is becoming the majority of content?
I like that approach better. Just like I’d rather know what doesn’t cause cancer in the state of California at this point.
There are no ways of ensuring that. Wanting this is suicide for anyone but authoritarians.
Then it should not be done. I laid out my conditionals for it not being terrible.
rare chinese government w
For all the humans rights abuses, one has to admit that China is at least ruthlessly efficient.
Not really. Sure, China is able to make unpopular decisions better then democracies, but that makes them inefficient in different directions. E.g. high speed rail in areas where it is not needed but greatly lacking freight trains. Or their housing bubble.
China no doubt has it’s problems. It’s just crazy to think how fast the country has progressed in the last 50 years.
Not really rare these days when you compare them to America
deleted by creator
you don’t think this will be exploited to hurt people?
Meanwhile best we can do in America is hide tracking dots in every color printer.
Not a bad law if applied to companies and public figures. Complete wishful thinking if applied to individuals.
For companies it’s actually enforceable but for individuals it’s basically impossible and even if you do catch someone uploading AI-generated stuff: Who cares. It’s the intent that matters when it comes to individuals.
Were they trying to besmirch someone’s reputation by uploading false images of that person in compromising situations? That’s clear bad intent.
Were they trying to incite a riot or intentionally spreading disinformation? Again, clear bad intent.
Were they showing off something cool they made with AI generation? It is of no consequence and should be treated as such.
Would applying a watermark to all the training images force the AI to add a watermark?
Nope. In fact, if you generate a lot of images with AI you’ll sometimes notice something resembling a watermark in the output. Demonstrating that the images used to train the model did indeed have watermarks.
Removing such imaginary watermarks is trivial in image2image tools though (it’s just a quick extra step after generation).
I meant having all training images with a watermark, not only some of them
Anyone’s praising this doesn’t understand that this request is basically impossible and is merely posturing.
I’m a developer and I work a lot with LLM data and the only way to detect LLM text is through watermarks where some words or expressions are statistically preferred over others. This means it’s only effective on large bodies of text that are not modified further.
If you take LLM content and remix it using traditional natural language processing then it’s done - the content is indistinguishable and untraceable and it takes like 50 lines of python code and a few milliseconds of computing.
deleted by creator
100% thats exactly what’s happening and I can’t believe people are so blinded by AI generato hate to praise stuff like that.
It’s impossible because the text these LLM-based models produce would be obtuse to watermark.
Huh?
What about photos and video and audio?! Why are you asking?
What about photos, videos and audio? You should see what the second L means in the LLM before you go at it
The directive includes all types of data: text, images, videos, audio, and even virtual scenes.
LLMs are only one aspect of this, but yeah, probably the most difficult to discern, at least at the moment.
You should read some of the content you’re commenting on before posting a critique.
They don’t want to pollute their training data.