In this post I discuss the path I took to enable TrackAbout to react more quickly to failures in our application’s email delivery. I wrote a service using Node.js which relays email delivery failure events from SendGrid to a HipChat chat room that our support staff monitors. The project is open source and available on GitHub as sendgrid-webhook-server.
Live to Code, Code to Live
I’ve written previously about our policy of giving our developers self-directed time each sprint to scratch an itch, sharpen the saw, study, etc. I recently took some self-directed time myself to scratch my own itch.
As CTO of TrackAbout, I don’t get to write code like I used to. In the early years of TrackAbout, I would spend 90% of my conscious time coding, from rolling out of bed until the time I passed out around 2AM (minus meals, bio breaks and an occasional trip to the gym). Programming is my first love but other duties now keep me from it. Yet every so often I find an opportunity to flex my atrophying coding muscle.
Delivering Email is Hard
Our web application, like most, sends email. It sends email to employees at TrackAbout, to our users, and to people of our users’ choosing.
Getting email delivered is significantly harder than you might think, largely because you have zero control over the whims of the receiving email server. One ill-configured, over-zealous “network security device” on the other end and your email gets eaten and your IP address lands on some global blacklist, impacting deliverability to all your other customers (I’m looking at you, Barracuda).
About a year and a half ago, we decided to outsource the sending of email to someone who was doing it right, SendGrid. Prior, we’d been sending email through a basic SMTP mailer service. It provided little visibility into drops and bounces and did nothing to help us understand if we were on blacklists. It also wouldn’t DKIM sign our email, which is an important ingredient in ensuring successful delivery (see also: SPF DNS records). We could have built up capabilities around our existing SMTP service or chosen a new email server, but we’re not in the business of writing email tools or managing email servers. SendGrid is.
Even with SendGrid, the mail doesn’t always get through. When it fails, SendGrid gives us some really nice reports to determine the cause of various kinds of failures. Often, a user has fat-fingered an email address, and it’s just not routable. That mail will never get there.
We recently had a situation where an employee of TrackAbout configured a demo instance with a fake but routable email address, which caused us to send gobs of email to an unsuspecting company, to a user/mailbox that doesn’t exist. It’s a great way to get blacklisted for bad behavior. Don’t recommend it.
We’ve also had users set up email alerts in our application that trigger based on some event happening with their tracked assets. When the intended target leaves their company, the email starts to fail. Again, pounding an innocent mail server can get you blacklisted.
Having a member of our Support staff monitor SendGrid reports is one way to stay on top of these kind of email failures. But a push notification system would be much nicer.
Here comes my itch.
Building a Better Mousetrap
We learned that SendGrid, in addition to having a comprehensive REST-based web API, has this wonderful feature called a “web hook”. The gist is that you can make SendGrid call a URL of your choosing when email is processed, dropped, deferred, delivered, bounced, clicked, opened, unsubscribed from or marked as spam.
We use HipChat as our chat room solution at TrackAbout and we love it. We’re in there all day. How nice would it be to have a chat room for being notified of SendGrid mail failure events? Very nice indeed.
So what I wanted was a lightweight service to receive web hook calls from SendGrid and transform the posted data into a HipChat message to the room of my choosing.
There are some cool new companies out there that hook various cloud services together. The two big ones I know about are If This Then That and Zapier. Neither supported the exact scenario I wanted, and besides, this isn’t a terribly hard problem to solve by rolling your own. And I had this itch.
The result of eight highly-interrupted hours of effort is now live and working. The project, sendgrid-webhook-server, is available on GitHub and open sourced under the MIT license.
I began writing my service using a model of an HTTP server application as laid out in Manuel Kiessling’s The Node Beginner Book. I ultimately bought a bundle including his book and Hands-On Node.js by Pedro Teixeira. Once I felt I understood the basic HTTP server model and had my solution working, I gutted it and rewrote it using the Express framework. A ton of code fell away.
I used the Chrome Advanced REST client for testing the service.
Huge thanks to the brilliant Nate Kohari for his node-hipchat package which did most of my work for me.
Although we’re using Mercurial (Hg) internally for DVCS, I put the code on GitHub to enable me to deploy the live service easily to Heroku and to facilitate the ultimate open-sourcing of the project. If Heroku ever starts costing real money, I can run Node on any one of our existing TrackAbout servers easily enough. It was fun learning how to quickly deploy code from GitHub to Heroku.
The code isn’t perfect and I make no warranties or representations about it. Unit tests are sorely needed. Pull requests are welcome. See the README for to dos. As I get time, I’ll extend and improve the project further. Our support manager already wants us to funnel the events into our trouble ticketing system so that we can track each issue to resolution.
Until the next itch…
Chief Technology Officer