STAGGERING CHEF CLIENT RUNS

One of the new tools I’ve discovered is Chef to manage the configuration and software on Storehouse’s fleet of virtual machines. Chef makes it really handy to update and track config changes, since everything can be tracked using Git or similar. One issue we ran into was having chef-client run at the same time for multiple machines. This issue is kinda subtle, but makes a lot of sense when you think about it.

Read more

MAKE A SITE PRIVATE BUT ALLOW LETS ENCRYPT

This is a pretty straightforward thing I’ve wanted to do for some time. Basically, I have a number of sites that I use internally that I wanted to get certificates via Let’s Encrypt, but I also wanted to keep them restricted to only a few IP addresses. The solution is quite simple and works perfectly. We accomplish this with two .htaccess files. One at the site root to restrict IP address that can access the site, the second to disable that restriction on the directory where the Let’s Encrypt challenge is stored.

Read more

MONITORING A MOUNT POINT WITH ZABBIX

A subtle issue I ran into was the issue that Proxmox VE would sometimes unmount a GlusterFS volume and would fail to backup. This issue was a bit sneaky though, since the PVE backup program wouldn’t execute it wouldn’t send an email notifying me of the failure. This would make it so the backups would fail silently for some time, until I happened to login and see the errors in the cluster’s log.

Read more

A MEMORY LEAK VISUALIZED

Graph of free memory on a node with a leaking piece of software.

OUTAGES FEB 16-18 2017

So I’m a human, and I have outages. My goal is to be more transparent, not only with my customers, but with myself about why the outage occurred and what I can do to keep it from happening again. From February 16 to 18, Storehouse had a few intermittent outages that lasted anywhere from 1 hour to 3 hours. So this post is long overdue, heck it’s even March now. I don’t have a good excuse for the delay: I know what caused the outages and had taken corrective action but I simply put off writing this.

Read more

MYSQL (MARIADB) GALERA CLUSTER RESTART

This is a scary problem when you’re recovering from an outage of your database machines. If you’re running a Galera cluster and they all go offline, you’ll need to do a bit of work to restart the cluster and make it safe. Galera relies on the fact that there’s at least one node running in your cluster at all times. If your entire cluster goes offline, you won’t be able to start it again, even with the –wsrep-new-cluster option.

Read more

ZABBIX MYSQL (MARIADB) MONITORING

This is another one of those things that is pretty straightforward, but requires culminating information from a different sources in order to get things up and running. The goal here is to get Zabbix to monitor our MariaDB (MariaDB is a drop in replacement for MySQL, I’ll refer to either as MariaDB here) server’s status. There’s a built in template, but a few other files and settings need setup before you can get the juicy data flowing.

Read more

PROXMOX 3 TO 4 UPGRADE NETWORK ISSUE

This is a problem that showed itself when upgrading our Proxmox 3.2 Nodes up to Proxmox 4. About halfway through the upgrade, our network adapters suddenly stopped being able to communicate with any local addresses, but could still ping outside addresses. The cause was a minor config change that gets added in pretty stealthy. When this happens, simply add the following line to the bridge config in /etc/network/interfaces: bridge_vlan_aware yes To make the entire config section resemble:

Read more

MOUNTAINS TO METROS

A scenario for Locomotion, that old train game from the same guy who made Roller Coaster Tycoon. Your goal is to create a transportation network moving valuable resources out of mountains into nearby cities and industries. It’s not great, but I like playing it. Mountains to Metros

MY THOUGHTS ON GOOGLES PAGE SPEED INSIGHTS

Google’s Page Speed measure is a tool to give developers feedback as to how their web page is performing. It rates the pages on a scale of 0 to 100, with 100 being “perfect.” In my opinion this system is very flawed and it creates an ambiguous number that encourages developers and clients to waste time and money chasing after unobtainable goals. Obligatory Disclaimer: These opinions don’t reflect my employer at all.

Read more