The GCE outage on June 2 2019

I happened to notice the GCE outage on June 2 for an odd reason. I have a number of motion activated cameras that continually stream to a small Raspberry Pi cluster (where tensor flow does some nifty stuff). This cluster pushes some more serious processing onto GCE. Just as a fail-safe, I have the system also generate an email when they notice an anomaly, some unexplained movement, and so on.

And on June 2nd, this all went dark for a while, and I wasn’t quite sure why. Digging around later, I realize that the issue was that I relied on GCE for the cloud infrastructure, and gmail for the email. So when GCE had an outage, the whole thing came apart – there’s no resiliency if you have a single-point-of-failure (SPOF) and GCE was my SPOF.

WhiScreen Shot 2019-06-05 at 7.17.17 AMle I was receiving mobile alerts that there was motion, I got no notification(s) on what the cause was. The expected behavior was that I would receive alerts on my mobile device, and explanations as email. For example, the alert would read “Motion detected, camera-5 <time>”. The explanation would be something like “NORMAL: camera-5 motion detected at <time> – timer activated light change”,  “NORMAL: camera-3 motion detected at <time> – garage door closed”, or “WARNING: camera-4 motion detected at <time> – unknown pattern”.

I now realize that the reason was that the email notification, and the pattern detection relied on GCE and that SPOF caused delays in processing, and email notification. OK, so I fixed my error and now use Office365 for email generation so at least I’ll get a warning email.

But, I’m puzzled by Google’s blog post about this outage. The summary of that post is that a configuration change that was intended for a small number of servers ended up going to other servers, shit happened, shit cleanup took longer because troubleshooting network was the same as the affected network.

So, just as I had a SPOF, Google appears to have had an SPOF. But, why is it that we still have these issues where a configuration change intended for a small number of servers ends up going to a large number of servers?

Wasn’t this the same kind of thing that caused the 2017 Amazon S3 outage?

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

Shouldn’t there be a better way to detect the intended scope of a change, and a verification that this is intended? Seems like an opportunity for a different kind of check-and-balance?

Building completely redundant systems sounds like a simple solution but at some point the cost of this becomes exorbitant. So building completely independent control and user networks may seem like the obvious solution but is it cost effective to do that?

Try this DIY Neutral Density Filter for Long Exposure Photos

I have heard of this trick of using welders glass as a cheap ND filter. But from my childhood experience of arc welding, I was not sure how one would deal with the reality that welders glasses are not really precision optics.

This article addresses at least the issue of coloration and offers some nice tips for adjusting color balance in general.

https://digital-photography-school.com/diy-neutral-density-filter/

Automate everything

I like things to be automated, everything. Coffee in the morning, bill paautomatoryment, cycling the cable modem when it goes wonky, everything. The adage used to be, if you do something twice, automate it. I think it should be, “if you do anything, automate it, you will likely have to do it one more time”.

So I used to automate stuff like converting DOCX to PDF and PPTX to PDF on Windows all the time. But for the past two years, after moving to a Mac this is one thing that I’ve not been able to automate, and it bugged me, a lot.

No longer.

I had to make a presentation which went with a descriptive document, and I wanted to submit the whole thing as a PDF. Try as I might, Powerpoint and Word on the Mac would not make this easy.

It is disgusting that I had to resort to Applescript + Automator to do this.

I found this, and this.

It is a horrible way to do it, but yes, it works.

Now, before the Mac purists flame me for using Microsoft Word, and Microsoft Powerpoint, let me point out that the Mac default tools don’t make it any easier. Apple Keynote does not appear to offer a solution to this either, you have to resort to automator for this too.

So, eventually, I had to resort to automation based on those two links to make two PDFs and then this to combine them into a single PDF.

This is shitty, horrible, and I am using it now. But, do you know of some other solution, using simple python, and not having to install LibreOffice or a handful of other tools? Isn’t this a solved problem? If not, I wonder why?