The Chinese Communist Party’s (CCP’s) national internet censor just announced that all AI-generated content will be required to have labels that are explicitly seen or heard by its audience and embedded in metadata. The Cyberspace Administration of China (CAC) just released the transcript for the media questions and answers (akin to an FAQ) on its Measures for the Identification of Artificial Intelligence Generated and Synthetic Content [machine translated]. We saw the first signs of this policy move last September when the CAC’s draft plans emerged.

This regulation takes effect on September 1, 2025, and will compel all service providers (i.e., AI LLMs) to “add explicit labels to generated and synthesized content.” The directive includes all types of data: text, images, videos, audio, and even virtual scenes. Aside from that, it also orders app stores to verify whether the apps they host follow the regulations.

Users will still be able to ask for unlabeled AI-generated content for “social concerns and industrial needs.” However, the generating app must reiterate this requirement to the user and also log the information to make it easier to trace. The responsibility of adding the AI-generated label and metadata falls on the shoulders of this end-user person or entity.

  • Singletona082@lemmy.world
    link
    fedilink
    English
    arrow-up
    49
    arrow-down
    1
    ·
    29 days ago

    …I’m…

    In full agreement with this*

    *with the provision that there are ways to ensure this isn’t weaponized so that dissident or oppositional speech/photos/art isn’t flagged as AI so that it can be filtered out.

    • CosmoNova@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      29 days ago

      So in short you disagree which is reasonable given the circumstances.

      Besides, wouldn‘t it make much more sense to verify and mark genuine content rather than the slob which is becoming the majority of content?

      • Imgonnatrythis@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        29 days ago

        I like that approach better. Just like I’d rather know what doesn’t cause cancer in the state of California at this point.

  • Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    29 days ago

    Not a bad law if applied to companies and public figures. Complete wishful thinking if applied to individuals.

    For companies it’s actually enforceable but for individuals it’s basically impossible and even if you do catch someone uploading AI-generated stuff: Who cares. It’s the intent that matters when it comes to individuals.

    Were they trying to besmirch someone’s reputation by uploading false images of that person in compromising situations? That’s clear bad intent.

    Were they trying to incite a riot or intentionally spreading disinformation? Again, clear bad intent.

    Were they showing off something cool they made with AI generation? It is of no consequence and should be treated as such.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        28 days ago

        Nope. In fact, if you generate a lot of images with AI you’ll sometimes notice something resembling a watermark in the output. Demonstrating that the images used to train the model did indeed have watermarks.

        Removing such imaginary watermarks is trivial in image2image tools though (it’s just a quick extra step after generation).

  • Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    2
    ·
    edit-2
    28 days ago

    Anyone’s praising this doesn’t understand that this request is basically impossible and is merely posturing.

    I’m a developer and I work a lot with LLM data and the only way to detect LLM text is through watermarks where some words or expressions are statistically preferred over others. This means it’s only effective on large bodies of text that are not modified further.

    If you take LLM content and remix it using traditional natural language processing then it’s done - the content is indistinguishable and untraceable and it takes like 50 lines of python code and a few milliseconds of computing.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        27 days ago

        100% thats exactly what’s happening and I can’t believe people are so blinded by AI generato hate to praise stuff like that.

    • qarbone@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      28 days ago

      It’s impossible because the text these LLM-based models produce would be obtuse to watermark.

      Huh?

      What about photos and video and audio?! Why are you asking?

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        28 days ago

        What about photos, videos and audio? You should see what the second L means in the LLM before you go at it

        • NιƙƙιDιɱҽʂ@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          28 days ago

          The directive includes all types of data: text, images, videos, audio, and even virtual scenes.

          LLMs are only one aspect of this, but yeah, probably the most difficult to discern, at least at the moment.

        • qarbone@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          28 days ago

          You should read some of the content you’re commenting on before posting a critique.