Saturday, September 13, 2025

[Rant] A gmail google takeout frustration

 ⚠️ The intention here is not to point fingers at products or an organization. 

TLDR; I tried to "cleanse" my gmail inbox, and, gave up. But I discovered some pythonic support related things along the way. Respect for that. 🙏

If you are someone from my generation, I am sure statistically speaking there must be around 50k unread emails in your inbox. I do not want to explain why that is so. It is a fun exercise to try to understand why the dire state of such email inboxes. But in this blog post, I guess it is enough to sum it up by stating that most of us are victims of capitalism and the communication revolution.

I set out on the journey of reducing my Google One storage foot print. Somewhere in one of Google One's web pages, we can see product wise usage of the storage. (Between Photos, Gmail, Drive, etc). Gmail wasn't the culprit here. Gmail had some sizable chunk of data, but, Photos was taking 7x more than that. 

Therefore I figured out what I could cull from my gmail inbox. There are a lot of things I need to keep - like bank statements, receipts, personal communications with people in my friends circle, etc. There are a lot of things I can discard - all of those facebook notification emails, quora email digests, etc.

At this stage, there is no point in wondering why the inbox is flooded with these. At one point I was active into social media, and, therefore I never considered those emails as junk. But now they are! Oh, how the times have changed!!

I assumed that there could be a way to estimate what I am about to delete, and, then actually delete it. In other words, I could say that I wanted to delete which I deemed unnecessary, and, not something accidentally.

My plan was to do a kind of data analysis on the data. In simpler words, find out who is spamming me. Then gather the data, and, write a google app scripts to delete the emails. But I was in for a shock. Highlighting two important points below:

1. Many mail items had received time, but they are not timezone normalized. In some cases a few the formats itself were varying. There is no reason why one should expect data that way. 

2. You cannot trace a mail item to a thread. Gmail servers organize emails as conversation threads. Each thread has a unique identifier. I know this since I have played around gmail inboxes via google app scripts. There looks to be an identifier but there is no correlation with what is on gmail servers. 

These two reasons make it absolutely impossible for me to decide what to delete before I delete.

I used jupyter notebooks and pandas to help we out with this exercise. I was surprised that python had built-in support for working with mbox files. My choice to use a programming language such as python is personal. The language and its ecosystem is quite mature towards data analysis. I am sure others language ecosystems exist. However, after saying all this I am not trying to evangelize usage of some product or another. 

It has been more than 10+ years single google takeout released. It is natural for a bloke like to be shocked. Why is there no support the way you want it to be. I think it is probably because it takes efforts to design a archival system that supports your email server. And my use case happens to be niche. People may think more about backing up their data. Not find a convoluted way to "cleanse" it. 

However, I am hardly discouraging others to walk this path. In fact I encourage it. Who knows, a fresh set of eyes might discover something I could not. Just be happy to let people know if it comes to that.

Friday, June 13, 2025

My 2 paisa (cents) on digipin

It is a geo-coded addressing framework. However, to appreciate the concept and the acceptance of the digipin, you need to contrast it with the concept of India Post's PINCODE. I am purposefully omitting all that in this post. 

I feel this addressing system helps a lot of "machine readable" systems. This term is a very broad term; too esoteric for the common man to understand. Ultimately, I think "machine readability" is what enables common man to shop online, order food, get turn-by-turn directions while navigation, etc. It is the whole reason why you are able to read this blog post today. However, let's not go there either.

Again, the next few lines are perhaps what a common man would not appreciate all that much...the crux of this addressing framework is a system for encoding/decoding gps coordinates mainly latitude and longitude. Somebody or a handful of people, thought of it, and, fought to make it relevant. Kudos for that.

Every Indian would want to tout this as a novel idea; its relatively novel. There is already a manifestation of an addressing framework in Google Maps. (Good luck finding what it is, much less using it). There are many others - like What3Words. (Car enthusiasts might know). You can find more if you wikipedia geocoding.

A common man can find more information on digipin. (If he or she is ⚠️determined). I found a github repo which gives a neat technical documentation. There is also DHRUVA. It is an acronym. That document details the idea behind digipin. All of these can be found of the website of India Post after a little internet search.

Final Note

All this is good innovation. But don't forget we are still a savage species. Some individuals in this species can give birth to increase its relative population. The mechanism concerned has been the same for millions of years. And it is going to be the same for millions more. Of course, today there are technologies like IVF, and, c-sections, etc. (But that is beside the point). 

A subset of that population gets to witness these kinds of innovations over their lifetimes. In one sense, that is a blessing. But forget that what "these individuals" did over the years to get there. They have lied, rescued, murdered, fought, escaped, ate food, polluted, cheated, innovated, raped, invented, travelled, fake news-ed, voted, blogged, judged, punished, etc. They've done a myriad of things both good and bad.

Realize that humanity even today still continues to do these same things.

And oh yes, there is digipin for places in the Indian/Arabian oceans too.🌊