Data Exports/GDPR


Some thoughts on how easy to parse/use GDPR/get data exports from different services. A lot of these I did just because I was curious what information/context I could glean into the past about

Repo with links to export data

If not mentioned, code to parse this stuff is probably located at HPI

  • Amazon - Absolutely horrendous, gives me a mix of PDFs, CSVs, JSON files in 61 individual zip files, which have conflicting file names. Takes days to deliver, and doesn’t have tons of useful information. Probably better to just use finance-dl to grab metadata on amazon orders
  • Reddit - only recently exported using their data request. Seems pretty complete, but have yet to parse it. Currently use a combination of rexport and pushshift, and rarely use reddit anymore so unlikely I will do much more here.
  • Google Takeout - annoying, but since it has so much data, is sort of worth the hassle. Recently they’ve started adding JSON endpoints where previously it would only be HTML, which was a pain to parse. Export process is semi-manual and tends to break, so I typically end up going to the page and doing it manually every few months… Was parsed by a couple files in my HPI module before, but now is handled by this module
  • spotify export - the gdpr/data export at least gives me playlists, doesn’t give me any history — believe I’d have to do overlapping API exports pretty regularly to handle that. Dont use spotify anymore so this is just some nice context
  • blizzard games - gives me a giant HTML file, which is quite annoying to parse. Has some useful information, like when I purchased packs in HS, first time I played games, some IP addresses w/ timestamps I can convert to locations. Not planning to do this periodically, just offers some context into the past
  • apple - is below average — gives some nice semantic location history, some of my old notes/calendar info (which I’ve yet to parse), but in an annoying xml format. Since I barely use any apple devices/services anymore, probably won’t do multiple exports
  • chess.com - pretty trivial to do with the public unauthenticated API, great overall
  • discord - Regarding the data itself, if you’re a discord user, this has tons of information in there, thousands of ‘Activity’ blobs which include which device/IP/ISP etc. you were using. Could be used for lifelogging/context in a lot of places. Also, just being able to search the tens of thousands of messages I’ve sent is very nice. Exporting is manual, and you have to download/unzip the data, but once you have code to parse it, its pretty useful
  • facebook data export - has a decent amount of data, some spam from random companies etc… Depending on how much you use facebook this might be more worth it. Personally gives me access to a couple dozen messages and some old IPs which I can convert to locations
  • league of legends - is API based, using lolexport but you have to refresh a token daily, so its a somewhat manual process. Amount of information (like champion kills, objectives; game data) is pretty insane though — could do some analysis, but personally I’m not that interested. Gives me couple hundred games going back a few years; cool to be able to see when I was playing league of legends games
  • skype export - not great, maybe its just been too long since I used skype? Could only grab a few timestamps (not even the messages?) from when I was last using this…
  • steam - doesn’t seem to be a simple way to download your data using the GDPR site - wish there was… Threw together a scraper to download information I was interested in in the meantime
  • Trakt (movies/tv shows) - wrote an automatic OAuth/API based exporter, relatively new but seems to be working well enough, gives me tons of history info which is very nice. Also gives site IDs, so I’m able to connect these to tmdb/moviedb etc - example of using that data to fetch images on my feed
  • Twitch (see here) - Incredibly manual process - feels like a chat with someone working there it seems - gave me notifications that it was in process etc. However, the data here for lifelogging purposes is pretty cool, has timestamps of every time I was watching a stream/for how long, when I followed/unfollowed channels, pageviews (i.e. browser history for twitch) - all going back to 2015. Sadly no chat logs — you could probably get a subset of those by downloading the OverRustle archive and extracting the info from there…