ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Lifecoach5000@lemmy.world · 21 days ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

FMT99@lemmy.world · 21 days ago

Did the author thinks ChatGPT is in fact an AGI? It’s a chatbot. Why would it be good at chess? It’s like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.

snooggums@lemmy.world · 21 days ago

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.

Something marketed as AGI should be treated as AGI when proving it isn’t AGI.

pelespirit@sh.itjust.works · 21 days ago

Not to help the AI companies, but why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff? It’s obvious they’re shit at it, why do they answer anyway? It’s because they’re programmed by know-it-all programmers, isn’t it.

rebelsimile@sh.itjust.works · 21 days ago

Because they’re fucking terrible at designing tools to solve problems, they are obviously less and less good at pretending this is an omnitool that can do everything with perfect coherency (and if it isn’t working right it’s because you’re not believing or paying hard enough)

MrJgyFly@lemmy.world · 21 days ago

Or they keep telling you that you just have to wait it out. It’s going to get better and better!

ImplyingImplications@lemmy.ca · 20 days ago

why don’t they program them

AI models aren’t programmed traditionally. They’re generated by machine learning. Essentially the model is given test prompts and then given a rating on its answer. The model’s calculations will be adjusted so that its answer to the test prompt will be closer to the expected answer. You repeat this a few billion times with a few billion prompts and you will have generated a model that scores very high on all test prompts.

Then someone asks it how many R’s are in strawberry and it gets the wrong answer. The only way to fix this is to add that as a test prompt and redo the machine learning process which takes an enormous amount of time and computational power each time it’s done, only for people to once again quickly find some kind of prompt it doesn’t answer well.

There are already AI models that play chess incredibly well. Using machine learning to solve a complexe problem isn’t the issue. It’s trying to get one model to be good at absolutely everything.

NoiseColor @lemmy.world · 21 days ago

…or a simple counter to count the r in strawberry. Because that’s more difficult than one might think and they are starting to do this now.

NobodyElse@sh.itjust.works · 21 days ago

Because the LLMs are now being used to vibe code themselves.

CileTheSane@lemmy.ca · 20 days ago

why don’t they program them to look up math programs and outsource chess to other programs when they’re asked for that stuff?

Because the AI doesn’t know what it’s being asked, it’s just a algorithm guessing what the next word in a reply is. It has no understanding of what the words mean.

“Why doesn’t the man in the Chinese room just use a calculator for math questions?”

MajorasMaskForever@lemmy.world · 20 days ago

From a technology standpoint, nothing is stopping them. From a business standpoint: hubris.

To put time and effort into creating traditional logic based algorithms to compensate for this generic math model would be to admit what mathematicians and scientists have known for centuries. That models are good at finding patterns but they do not explain why a relationship exists (if it exists at all). The technology is fundamentally flawed for the use cases that OpenAI is trying to claim it can be used in, and programming around it would be to acknowledge that.

NoiseColor @lemmy.world · 21 days ago

I don’t think ai is being marketed as awesome at everything. It’s got obvious flaws. Right now its not good for stuff like chess, probably not even tic tac toe. It’s a language model, its hard for it to calculate the playing field. But ai is in development, it might not need much to start playing chess.

vinnymac@lemmy.world · 21 days ago

What the tech is being marketed as and what it’s capable of are not the same, and likely never will be. In fact all things are very rarely marketed how they truly behave, intentionally.

Everyone is still trying to figure out what these Large Reasoning Models and Large Language Models are even capable of; Apple, one of the largest companies in the world just released a white paper this past week describing the “illusion of reasoning”. If it takes a scientific paper to understand what these models are and are not capable of, I assure you they’ll be selling snake oil for years after we fully understand every nuance of their capabilities.

TL;DR Rich folks want them to be everything, so they’ll be sold as capable of everything until we repeatedly refute they are able to do so.

NoiseColor @lemmy.world · 21 days ago

I think in many cases people intentionally or unintentionally disregard the time component here. Ai is in development. I think what is being marketed here, just like in the stock market, is a piece of the future. I don’t expect the models I use to be perfect and not make mistakes, so I use them accordingly. They are useful for what I use them for and I wouldn’t use them for chess. I don’t expect that laundry detergent to be just as perfect in the commercial either.

BassTurd@lemmy.world · 21 days ago

Marketing does not mean functionality. AI is absolutely being sold to the public and enterprises as something that can solve everything. Obviously it can’t, but it’s being sold that way. I would bet the average person would be surprised by this headline solely on what they’ve heard about the capabilities of AI.

NoiseColor @lemmy.world · 21 days ago

I don’t think anyone is so stupid to believe current ai can solve everything.

And honestly, I didn’t see any marketing material that would claim that.

BassTurd@lemmy.world · 21 days ago

You are both completely over estimating the intelligence level of “anyone” and not living in the same AI marketed universe as the rest of us. People are stupid. Really stupid.

NoiseColor @lemmy.world · 20 days ago

I don’t understand why this is so important, marketing is all about exaggerating, why expect something different here.

BassTurd@lemmy.world · 20 days ago

It’s not important. You said AI isn’t being marketed to be able to do everything. I said yes it is. That’s it.

petrol_sniff_king@lemmy.blahaj.zone · 21 days ago

The Zoom CEO, that is the video calling software, wanted to train AIs on your work emails and chat messages to create AI personalities you could send to the meetings you’re paid to sit through while you drink Corona on the beach and receive a “summary” later.

The Zoom CEO, that is the video calling software, seems like a pretty stupid guy?

Yeah. Yeah, he really does. Really… fuckin’… dumb.

jubilationtcornpone@sh.itjust.works · 20 days ago

Same genius who forced all his own employees back into the office. An incomprehensibly stupid maneuver by an organization that literally owes its success to people working from home.

iAvicenna@lemmy.world · 20 days ago

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like “can chatgpt prove the Riemann hypothesis”

x00z@lemmy.world · 21 days ago

In all fairness. Machine learning in chess engines is actually pretty strong.

AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).

https://www.chess.com/terms/alphazero-chess-engine

FMT99@lemmy.world · 20 days ago

Oh absolutely you can apply machine learning to game strategy. But you can’t expect a generalized chatbot to do well at strategic decision making for a specific game.

jeeva@lemmy.world · 20 days ago

Sure, but machine learning like that is very different to how LLMs are trained and their output.

saltesc@lemmy.world · 21 days ago

I like referring to LLMs as VI (Virtual Intelligence from Mass Effect) since they merely give the impression of intelligence but are little more than search engines. In the end all one is doing is displaying expected results based on a popularity algorithm. However they do this inconsistently due to bad data in and limited caching.

NeilBrü@lemmy.world · edit-2 20 days ago

An LLM is a poor computational/predictive paradigm for playing chess.

surph_ninja@lemmy.world · 20 days ago

This just in: a hammer makes a poor screwdriver.

WhyJiffie@sh.itjust.works · 20 days ago

LLMs are more like a leaf blower though

Takapapatapaka@lemmy.world · 20 days ago

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).

NeilBrü@lemmy.world · 20 days ago

I’m impressed, if that’s true! In general, an LLM’s training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.

Takapapatapaka@lemmy.world · 20 days ago

Oh yes, cost of training are ofc a great loss here, it’s not optimized at all, and it’s stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you’re interested.

NeilBrü@lemmy.world · edit-2 20 days ago

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don’t really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I’m only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!

Takapapatapaka@lemmy.world · 19 days ago

Here is the main blog post that i remembered : it has a follow up, a more scientific version, and uses two other articles as a basis, so you might want to dig around what they mention in the introduction.

It is indeed a quite technical discovery, and it still lacks complete and wider analysis, but it is very interesting for the fact that it kinda invalidates the common gut feeling that llms are pure lucky random.

Bleys@lemmy.world · 20 days ago

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.

NeilBrü@lemmy.world · edit-2 19 days ago

Yes, I agree wholeheartedly with your clarification.

My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.

Thus, large language models are well out of my area of expertise in terms of the architecture of their models.

However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to “win”/“survive”.

(I admit that I’m just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don’t come off too pompously when talking about this subject to others; forgive my tedium.)

sugar_in_your_tea@sh.itjust.works · edit-2 20 days ago

Yeah, a lot of them hallucinate illegal moves.

AlecSadler@sh.itjust.works · 20 days ago

ChatGPT has been, hands down, the worst AI coding assistant I’ve ever used.

It regularly suggests code that doesn’t compile or isn’t even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

arc99@lemmy.world · 20 days ago

All AIs are the same. They’re just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They’re super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.

NateNate60@lemmy.world · 20 days ago

One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:

It left database secrets in the code
The design of the website meant that it was impossible to operate securely
The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
It did not break the code into multiple files. It piled everything into a single file

AlecSadler@sh.itjust.works · 20 days ago

I’ve used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.

I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.

ILikeBoobies@lemmy.ca · 20 days ago

I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself

I wouldn’t use it to generate stuff though

floofloof@lemmy.ca · edit-2 21 days ago

I suppose it’s an interesting experiment, but it’s not that surprising that a word prediction machine can’t play chess.

otp@sh.itjust.works · 21 days ago

Because people want to feel superior because they ~~don’t know how to use a ChatBot~~ can count the number of "r"s in the word “strawberry”, lol

electricyarn@lemmy.world · 21 days ago

Yeah, just because I can’t count the number of r’s in the word strawberry doesn’t mean I shouldn’t be put in charge of the US nuclear arsenal!

otp@sh.itjust.works · 21 days ago

That is more a failure of the person who made that decision than a failing of ChatBots, lol

petrol_sniff_king@lemmy.blahaj.zone · 21 days ago

Anyone who puts a chatbot anywhere is definitely a failure, yeah.

anubis119@lemmy.world · 21 days ago

A strange game. How about a nice game of Global Thermonuclear War?

Lifecoach5000@lemmy.world · 21 days ago

Lmao! 🤣 that made me spit!!

MadMadBunny@lemmy.ca · 21 days ago

Frak off, toaster

Xanthobilly@lemmy.world · 21 days ago

Furbag@lemmy.world · 20 days ago

Can ChatGPT actually play chess now? Last I checked, it couldn’t remember more than 5 moves of history so it wouldn’t be able to see the true board state and would make illegal moves, take it’s own pieces, materialize pieces out of thin air, etc.

skisnow@lemmy.ca · edit-2 20 days ago

It can’t, but that didn’t stop a bunch of gushing articles a while back about how it had an ELO of 2400 and other such nonsense. Turns out you could get it to have an ELO of 2400 under a very very specific set of circumstances, that include correcting it every time it hallucinated pieces or attempted to make illegal moves.

cley_faye@lemmy.world · 20 days ago

Ah, you used logic. That’s the issue. They don’t do that.

arc99@lemmy.world · 20 days ago

Hardly surprising. Llms aren’t -thinking- they’re just shitting out the next token for any given input of tokens.

Steve Dice@sh.itjust.works · 19 days ago

That’s exactly what thinking is, though.

arc99@lemmy.world · edit-2 17 days ago

An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It’s like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness (“temperature”) when choosing the next token so the responses are not identical each time.

But it is not thinking. Not even remotely so. It’s a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.

ollama run gemma3:4b
>>> /set parameter temperature 0
>>> what is a leaf

You will get the same answer every single time.

Steve Dice@sh.itjust.works · edit-2 14 days ago

I know what an LLM is doing. You don’t know what your brain is doing.

oni ᓚᘏᗢ@lemmy.world · 21 days ago

This made my day

hogmomma@lemmy.world · 21 days ago

Get your booty on the floor tonight.

finitebanjo@lemmy.world · 20 days ago

All these comments asking “why don’t they just have chatgpt go and look up the correct answer”.

That’s not how it works, you buffoons, it trains off of datasets long before it releases. It doesn’t think. It doesn’t learn after release, it won’t remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don’t understand that it generates statistical representations of an answer based on answers given in the past.

seven_phone@lemmy.world · 21 days ago

You say you produce good oranges but my machine for testing apples gave your oranges a very low score.

Nurse_Robot@lemmy.world · 21 days ago

I’m often impressed at how good chatGPT is at generating text, but I’ll admit it’s hilariously terrible at chess. It loves to manifest pieces out of thin air, or make absurd illegal moves, like jumping its king halfway across the board and claiming checkmate

Blaster M@lemmy.world · 21 days ago

ChatGPT is playing Anarchy Chess

Lifecoach5000@lemmy.world · 21 days ago

Yeah! I’ve loved watching Gothem Chess’ videos on these. Always have been good for a laugh.

Sidhean@lemmy.blahaj.zone · 20 days ago

Can i fistfight ChatGPT next? I bet I could kick its ass, too :p

Steve Dice@sh.itjust.works · 19 days ago

2025 Mazda MX-5 Miata ‘got absolutely wrecked’ by Inflatable Boat in beginner’s boat racing match — Mazda’s newest model bamboozled by 1930s technology.

vane@lemmy.world · 20 days ago

It’s not that hard to beat dumb 6 year old who’s only purpose is mine your privacy to sell you ads or product place some shit for you in future.