Internet Archive breached again through stolen access tokens

Dju@lemmy.world · 7 months ago

Internet Archive breached again through stolen access tokens

grue@lemmy.world · 7 months ago

Okay, enough is enough. The Internet Archive is both essential infrastructure and irreplaceable historical record; it cannot be allowed to fall. Rather than just hoping the Archive can defend itself, I say It’s time to hunt down and counterattack the scum perpetrating this!

dovahking@lemmy.world · 7 months ago

Where are the anonymous group and 4chan autists? They should attack these assholes. Attacking internet archive is like kicking a kitten. Everyone will hate you for it.

DudeImMacGyver@sh.itjust.works · 7 months ago

Why are people fucking with the Internet Archive? Who benefits?

Draconic NEO@lemmy.world · 7 months ago

Well right wingers want to ban books and services like IA make that harder since they provide easy access to download or digitally borrow those books. It makes it harder for them to deny people access to those books since they can find them online. Of course, there are other ways people can still obtain those books, IA isn’t the only one, but it’s the easiest and the most convent.

rottingleaf@lemmy.world · 7 months ago

I’ll give you my opinion though you haven’t asked for it:

Some right wingers (libertarian mostly) don’t want to ban books, they want books in fact to be reliably available, and having one centralized Internet Archive to store all of them is not reliable.

(Or in the same logic for humanity to be knowledgeable and resistant to propaganda, and treating sources’ availability as a given being harmful towards that goal - naive people can believe wrong things.)

See Babylon V example with kicking the ant hive again and again to some well-meaning goal, of the evolution kind.

Mind that I don’t think these people have such an intent.

It’s just in my childhood someone has gaslighted me into trying to be optimistic in such cases. Like “if someone is digging a grave for you, just wait till they’re done, you’ll get a nice pond”. Same as a precedent that is created with one intent and interpretation, but works for all possible intents and interpretations, because it’s a real world event.

So, other than gaslighting, real effects are real. Including positive ones, like all of us right now realizing that a centralized IA is unacceptable, we need something like “IA@home”, with a degree of forkability without duplicating the data, so that someone who’d somehow hijack the private key or whatever identifying said new IA’s authority wouldn’t be able to harm existing versions and they wouldn’t require much more storage.

Shit, I can’t stop thinking about that “common network and identities and metadata exchange, but data storage shared per communities one joins, Freenet-like” idea, but I don’t even remotely know where to start developing it and doubt I’ll ever.

towerful@programming.dev · 7 months ago

4 years ago (best number I can find, considering IAs blog pages are down) IA used about 50 petabytes on servers that have 250 terabytes of storage and 2gbps network.
From this, we can conclude that 1 TB of storage requires 8mbps of network speed.
Let’s just say that average/all residential broadband has spare bandwidth for 8mbps symmetrical.
We would need 50,000 volunteers to cover the absolute minimum.
Probably 100k to 200k to have any sort of reliability, considering it’s all residential networking and commodity hardware.

In the last 4 years, I imagine IA has increased their storage requirements significantly.
And all of that would need to be coordinated, so some shards don’t get over-replicated

rottingleaf@lemmy.world · 7 months ago

This seems to confirm my critique of “manual” solutions with torrents and such offered in other comments, resulting in the idea shortly described in the comment you were answering.

Yes, this would require a lot of people, but some would contribute more and some less, just like with other public P2P solutions.

From my POV the biggest problem is synchronizing indexes (similar to superblock maybe) of such a storage, and balancing replication based on them, in a decentralized way. Because it would seem that those indexes by themselves would be not small.

There should also be all the usual stuff with controlling data integrity.

I think it’s realistic to attract many volunteers, if the thing in question will also be the user client, similar to Freenet and torrents socially, and bigger storage will allow them to faster get things they access more often, as a cache. But then balancing between that and storing necessary, but unpopular parts of the space, is a question.

I think I need to read up.

zlatiah@lemmy.world · 7 months ago

This again??

This time once archive.org is back online again… is it possible to get torrents of some of their popular data storage? For example I wouldn’t imagine their catalog of books with expired copyright to be very big. Would love a community way to keep the data alive if something even worse happens in the future (and their track record isn’t looking good now)

Exeous@lemmy.world · 7 months ago

Like this idea

njordomir@lemmy.world · 7 months ago

Yep, that seems like the ideal decentralized solution. If all the info can be distributed via torrent, anyone with spare disk space can help back up the data and anyone with spare bandwidth can help serve it.

rottingleaf@lemmy.world · 7 months ago

There’s an issue with torrents, only the most popular ones get replicated and the process is manual\social.

Something like Freenet is needed, which automatically “spreads” data over machines contributing storage, but Freenet is an unreliable storage, basically like a cache where older and unwanted stuff gets erased.

So it should be something like Freenet, but possibly with some “clusters” or “communities” with a central (cryptography-enabled) authority of each being able to determine the state of some collection of data as a whole, and pick priorities. My layman’s understanding is that this would be similar to something between Freenet and Ceph, LOL. More like a cluster filesystem spread over many nodes, not like cache.

njordomir@lemmy.world · edit-2 7 months ago

You have more knowledge on this than I did. I enjoyed reading about Freenet and Ceph. I have dealt with cloud stuff, but not as much on a technical-underpinnings level. My first freenet impression from reading some articles gives me 90s internet vibes based on the common use cases they listed.

I remember ceph because I ended up building it from the AUR once on my weak little personal laptop because it got dropped from some repository or whatever but was still flagged to stay installed. I could have saved myself an hours long build if I had read the release notes.

rottingleaf@lemmy.world · 7 months ago

My first freenet impression from reading some articles gives me 90s internet vibes based on the common use cases they listed.

That’s correct, I meant the way it works.

_sideffect@lemmy.world · 7 months ago

IndustryStandard@lemmy.world · 7 months ago

Hope they had a backup

Internet Archive breached again through stolen access tokens

Internet Archive breached again through stolen access tokens

Just a moment...