⚠️ The intention here is not to point fingers at products or an organization.
TLDR; I tried to "cleanse" my gmail inbox, and, gave up. But I discovered some pythonic support related things along the way. Respect for that. 🙏
If you are someone from my generation, I am sure statistically speaking there must be around 50k unread emails in your inbox. I do not want to explain why that is so. It is a fun exercise to try to understand why the dire state of such email inboxes. But in this blog post, I guess it is enough to sum it up by stating that most of us are victims of capitalism and the communication revolution.
I set out on the journey of reducing my Google One storage foot print. Somewhere in one of Google One's web pages, we can see product wise usage of the storage. (Between Photos, Gmail, Drive, etc). Gmail wasn't the culprit here. Gmail had some sizable chunk of data, but, Photos was taking 7x more than that.
Therefore I figured out what I could cull from my gmail inbox. There are a lot of things I need to keep - like bank statements, receipts, personal communications with people in my friends circle, etc. There are a lot of things I can discard - all of those facebook notification emails, quora email digests, etc.
At this stage, there is no point in wondering why the inbox is flooded with these. At one point I was active into social media, and, therefore I never considered those emails as junk. But now they are! Oh, how the times have changed!!
I assumed that there could be a way to estimate what I am about to delete, and, then actually delete it. In other words, I could say that I wanted to delete which I deemed unnecessary, and, not something accidentally.
My plan was to do a kind of data analysis on the data. In simpler words, find out who is spamming me. Then gather the data, and, write a google app scripts to delete the emails. But I was in for a shock. Highlighting two important points below:
1. Many mail items had received time, but they are not timezone normalized. In some cases a few the formats itself were varying. There is no reason why one should expect data that way.
2. You cannot trace a mail item to a thread. Gmail servers organize emails as conversation threads. Each thread has a unique identifier. I know this since I have played around gmail inboxes via google app scripts. There looks to be an identifier but there is no correlation with what is on gmail servers.
These two reasons make it absolutely impossible for me to decide what to delete before I delete.
I used jupyter notebooks and pandas to help we out with this exercise. I was surprised that python had built-in support for working with mbox files. My choice to use a programming language such as python is personal. The language and its ecosystem is quite mature towards data analysis. I am sure others language ecosystems exist. However, after saying all this I am not trying to evangelize usage of some product or another.
It has been more than 10+ years single google takeout released. It is natural for a bloke like to be shocked. Why is there no support the way you want it to be. I think it is probably because it takes efforts to design a archival system that supports your email server. And my use case happens to be niche. People may think more about backing up their data. Not find a convoluted way to "cleanse" it.
However, I am hardly discouraging others to walk this path. In fact I encourage it. Who knows, a fresh set of eyes might discover something I could not. Just be happy to let people know if it comes to that.
No comments:
Post a Comment