A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

assassin_aragorn@lemmy.world · 2 years ago

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

Primarily0617@kbin.social · edit-2 2 years ago

it’s crazy that “it’s too hard :(” has become an acceptable justification for just ignoring the law within tech circles

BrianTheeBiscuiteer@lemmy.world · 2 years ago

I’m not an AI expert, and I wouldn’t say it is too hard, but I believe removing a specific piece of data from a model is like trying to remove excess salt from a stew. You can add things to make the stew less salty but you can’t really remove the salt.

The alternative, which is a lot of effort but boo-hoo for big tech, is to throw out the model and start over without the data in question. These companies would do well to start with models built on public or royalty free data and then add more risky data on top of that (so you only have to rebake starting from the “public” version).

Primarily0617@kbin.social · 2 years ago

sounds like big tech shouldn’t have spent the last decade investing in a kitchen refit so that they could make stew really well but nothing else

GoosLife@lemmy.world · edit-2 2 years ago

If there’s something illegal in your dish, you throw it out. It’s not a question. I don’t care that you spent a lot of time and money on it. “I spent a lot of time preparing the circumstances leading to this crime” is not an excuse, neither is “if I have to face consequences for committing this crime, I might lose money”.

Grandwolf319@sh.itjust.works · 2 years ago

Replace salt with poison or an allergenic substance and if fully holds. If a batch has been contaminated, then yes, you should try again.

But now that the cat is out of the bag, other companies are less willing to let something be scrap able due to how valuable it can be.

I think big tech knew this, that they can only build these models on unfiltered data before the AI craze.

Zeth0s@lemmy.world · edit-2 2 years ago

It’s actually a pretty normal thing in law. Laws are created with common sense in mind and compromises.

Currently EU laws do not cover generative AI. Now EU needs to decide how to deal with it. If consider it as a “lossy compressed database”, trying to enforce a variation of gdpr with added fuzziness, or do something else

garyyo@lemmy.world · 2 years ago

Always has been. The laws are there to incentivize good behavior, but when the cost of complying is larger than the projected cost of not complying they will ignore it and deal with the consequences. For us regular folk we generally can’t afford to not comply (except for all the low stakes laws that you break on a day to day basis), but when you have money to burn and a lot is at stake, the decision becomes more complicated.

The tech part of that is that we don’t really even know if removing data from these sorts of model is possible in the first place. The only way to remove it is to throw away the old one and make a new one (aka retraining the model) without the offending data. This is similar to how you can’t get a person to forget something without some really drastic measures, even then how do you know they forgot it, that information may still be used to inform their decisions, they might just not be aware of it or feign ignorance. Only real way to be sure is to scrap the person. Given how insanely costly it can be to retrain a model, the laws start looking like “necessary operating costs” instead of absolute rules.

FaceDeer@kbin.social · 2 years ago

It’s more like the law is saying you must draw seven red lines, all of them strictly perpendicular, some with green ink and some with transparent ink.

It’s not “virtually” impossible, it’s literally impossible. If the law requires that it be possible then it’s the law that must change. Otherwise it’s simply a more complicated way of banning AI entirely, which means that some other jurisdiction will become the world leader in such things.

Ottomateeverything@lemmy.world · 2 years ago

It’s more like the law is saying you must draw seven red lines, all of them strictly perpendicular, some with green ink and some with transparent ink.

No, it’s more like the law is saying you have to draw seven red lines and you’re saying, “well I can’t do that with indigo, because indigo creates purple ink, therefore the law must change!” No, you just can’t use indigo. Find a different resource.

It’s not “virtually” impossible, it’s literally impossible. If the law requires that it be possible then it’s the law that must change.

There’s nothing that says AI has to exist in a form created from harvesting massive user data in a way that can’t be reversed or retracted. It’s not technically impossible to do that at all, we just haven’t done it because it’s inconvenient and more work.

The law sometimes makes things illegal because they should be illegal. It’s not like you run around saying we need to change murder laws because you can’t kill your annoying neighbor without going to prison.

Otherwise it’s simply a more complicated way of banning AI entirely

No it’s not, AI is way broader than this. There are tons of forms of AI besides forms that consume raw existing data. And there are ways you could harvest only data you could then “untrain”, it’s just more work.

Some things, like user privacy, are actually worth protecting.

Primarily0617@kbin.social · 2 years ago

ok i guess you don’t get to use private data in your models too bad so sad

why does the capitalistic urge to become “the world leader” in whatever technology-of-the-month is popular right now supersede a basic human right to privacy?