Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people, a government trial of the technology has found.

Amazon conducted the test earlier this year for Australia’s corporate regulator the Securities and Investments Commission (ASIC) using submissions made to an inquiry. The outcome of the trial was revealed in an answer to a questions on notice at the Senate select committee on adopting artificial intelligence.

The test involved testing generative AI models before selecting one to ingest five submissions from a parliamentary inquiry into audit and consultancy firms. The most promising model, Meta’s open source model Llama2-70B, was prompted to summarise the submissions with a focus on ASIC mentions, recommendations, references to more regulation, and to include the page references and context.

Ten ASIC staff, of varying levels of seniority, were also given the same task with similar prompts. Then, a group of reviewers blindly assessed the summaries produced by both humans and AI for coherency, length, ASIC references, regulation references and for identifying recommendations. They were unaware that this exercise involved AI at all.

These reviewers overwhelmingly found that the human summaries beat out their AI competitors on every criteria and on every submission, scoring an 81% on an internal rubric compared with the machine’s 47%.

  • ArbitraryValue@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    5
    ·
    edit-2
    8 months ago

    The important thing here isn’t that the AI is worse than humans. It’s than the AI is worth comparing to humans. Humans stay the same while software can quickly improve by orders of magnitude.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    2
    ·
    edit-2
    8 months ago

    Meanwhile, here’s an excerpt of a response from Claude Opus on me tasking it to evaluate intertextuality between the Gospel of Matthew and Thomas from the perspective of entropy reduction with redactional efforts due to human difficulty at randomness (this doesn’t exist in scholarship outside of a single Reddit comment I made years ago in /r/AcademicBiblical lacking specific details) on page 300 of a chat about completely different topics:

    Yeah, sure, humans would be so much better at this level of analysis within around 30 seconds. (It’s also worth noting that Claude 3 Opus doesn’t have the full context of the Gospel of Thomas accessible to it, so it needs to try to reason through entropic differences primarily based on records relating to intertextual overlaps that have been widely discussed in consensus literature and are thus accessible).

  • AA5B@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    8 months ago

    Artificial intelligence is worse than humans in every way at summarizing documents

    In every way? How about speed? The goal is to save human time so if AI is faster and the summary is good enough, then it is a success. I guarantee it is faster. Much faster.

    • loonsun@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      8 months ago

      If you make enough mistakes, speed is a detriment not a benefit. Increasing speed allows you to produce more summaries but if you still need to correct and edit them all you’ve done is add a step where a human has to still read the document to the level where they could summarize it and edit the AI summary. Therefore the bottleneck of a human reading the document and working on a summary is still there. It would only potentially make it slightly easier if the corrections needed are small and obvious.

  • DarkCloud@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 months ago

    “AI” or Large Language Models, are designed by definition to give averaged answers. So they’re not just averaging on the text you give them, they’re averaging it with all general text of the training model, to create a probabilistically average result based on all of it.

    There’s no way around this, because it’s simply how such systems work. It’s their lifeblood to produce a “best guess” across large amounts of training data …which is done by averaging out all that language. A large amount of language… Hence the name.

  • UnderpantsWeevil@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    8 months ago

    Are we talking 10% worse and 95% cheaper? Or 50% worse and 10% cheaper? Or 90% worse and 95% cheaper?

    Because that last one is good enough for fiscal conservatives. Hell, the second one is good enough for fiscal conservatives.

  • Melvin_Ferd@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    8 months ago

    Here is the summary by AI

    The article suggests AI is worse than humans at summarizing documents, based on one outdated trial. But really, Crikey is just feeling threatened. AI is evolving fast, and its ability to handle vast amounts of data without the human biases Crikey often exhibits is undeniable. While they nitpick AI’s limitations, they ignore how much better it will get—probably even better than their reporters. Maybe they’re just jealous that AI could do in seconds what takes humans hours!

  • masquenox@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    3
    ·
    8 months ago

    Artificial intelligence is worse than humans in every way

    As if capitalists have ever cared about that…