cross-posted to: https://lemmy.world/post/2499861
As I said, I made a lossy reformat of the database and a lossless one for 6.0 Gib (6,477,905,920). compared to ~26GIB from Reddit, where fields are almost intentionally anti-compressed to take up more room.
If there is somewhere I can host it, let me know.
also, I couldn’t figure this out, do sqlite databses store any information on the creator or editor of a document?
why it's lossy
It’s missing a large table of base64 urandom technically required to recreate the document fully
!datahoarder@lemmy.ml looks active and seems like a good place for it
thanks, how do I crosspost/ move this one?
here are a few options that I see but never actually use.
- contact https://wiki.archiveteam.org/; they have some reddit related archival activities going on
- more academic type and will be very helpful for researchers:
- https://zenodo.org/
- https://figshare.com/ (this I’ve used and very easy)
- https://datadryad.org/
- https://academictorrents.com/ (pushshift archives are also on here)
- https://socialmediaarchive.org/
Your data don’t seem to be massive compared to the types of data people store on there. So I don’t think it’s gonna be an issue. Plus, if you deposit your data in 1 archivist place + 1 research place, the data may be used by more people. Don’t forget about licenses btw.
EDIT: added https://socialmediaarchive.org/ to the list, just found out about that.
Is this derived directly from the data reddit stored/created or is it a reconstruction of some kind from observing the r/place output? I’m tempted to look at the table structures but not tempted enough to download 4 gigs of it just yet.
rebuilt from reddit’s offitial sources, still messing with optomizations, is adding a color definitions table worth it?
edit, YES, only 32 unique colors ever
Last I heard with large projects like this, people usually upload them to DEEZ
deleted by creator