Friday, December 22, 2006

BASH Hacks: Using 2 Comma Delimited Lists to Replicate Simple 2D Array

As we have stated, the data stuctures in BASH are extremely limited. BASH has just introduced a single dimensional array data structure, but it is tricky and hard to use. They have yet to implement a multi-dimensional array data structure, and therefore we are forced to do a lot of trickery in order to replicate the functionality of multi-dimensional arrays. A very simple type of 2 dimensional array that is highly useful for writing test scripts, is a 2 dimensional array with on dimension holding any type of data, and the second dimension relating a boolean value to the values in the first dimension of the array. Say for example we have a bunch of IPs that represent webservers in a cluster, and we're writing a script to test to see if the webservers are up or down. It would come in handy to know wether the last time through the loop the servers were up or down, so we might want an array like this:












192.168.1.1192.168.1.2192.168.1.3
110


Unfortunatly with BASH, this isn't possible. To replicate the simple two dimensional array of this type I have developed a little bit of code that makes extensive use of the Comma Delimited list code that I wrote about in a previous post: BASH Hacks: The Comma Delimited List. For this solution we implement 2 of these lists, the first list holds all of the variables that you are going to run a series of actions on. The second list will be populated by values from the first list that fail these actions or tests, and each item in this list will be equivilant to having a value of "0" in the boolean dimension of the simple 2D array, and conversely a lack of a value on this list is equivilant to a "1" in the boolean dimension. This is a template of the code that I use:

#!/bin/bash
###################################
## 2commas.sh
##
## A template for implementing 2 comma
## delimited lists to replicate the functionality
## of a simple 2D array
###################################

#######VARIABLE DEFINITIONS########
LIST=1,2,3,4,5

# We just initialize this variable here, it won't be used untill much later
DOWNLIST=''
###################################

# loop continuously
while [ 1 ]
do

# At the beginning of every cycle, tack a trailing comma to the end of the list
LOOPVAR=${LIST},

# Loop through the contents of the comma delimited list
while echo $LOOPVAR | grep \, &> /dev/null
do

# grab the first server name out of the comma-delimited list
LOOPTEMP=${LOOPVAR%%\,*}

# remove the first server name from the LOOPVAR comma-delimited list
LOOPVAR=${LOOPVAR#*\,}

if <your test>
then

## possibly some actions here
## possibly not

# if it's on the downlist, take it off and do some stuff
if echo $DOWNLIST | grep $LOOPTEMP &> /dev/null
then

## Some actions
## maybe no actions
## who knows?

# Initialize the NEWDOWNLIST variable,
# which is a temp var that will hold the downlist minus the now up server
NEWDOWNLIST=''

# This loop will take the now up server off the downlist
while echo $DOWNLIST | grep \, &> /dev/null
do

# grab each value in the downlist from the comma delimited list
DOWNLOOPTEMP=${DOWNLIST%%\,*}

# trim the value we just grabbed from the downlist
DOWNLIST=${DOWNLIST#*\,}

# This variable will be empty if the DOWNLOOPTEMP
# variable is different than the LOOPTERM variable
# if they are the same the variable will contain the value of DOWNLOOPTEMP
CONDITION=$(echo ${DOWNLOOPTEMP} | grep ${LOOPTEMP})

# If the variable is empty the value stays on the downlist
# and we add the server to the new list
if [ -z $CONDITION ]
then

# add the server to the new list
NEWDOWNLIST=${NEWDOWNLIST}${DOWNLOOPTEMP},

fi

done

# Set the downlist variable to the value of the New downlist variable
DOWNLIST=${NEWDOWNLIST}

fi

# If the test fails then
else

# This variable will be empty if the DOWNLIST variable
# doesn't contain the LOOPTEMP variable
# otherwise the value of the variable will be the value of the DOWNLIST variable
CONDITION=$(echo ${DOWNLIST} | grep ${LOOPTEMP})

# If the above variable is empty, the value needs to be added to the DOWNLIST
# and some actions performed, otherwise we already noted that it's down, you might
# still want to do some actions here though
if [ -z $CONDITION ]
then

# Add the value to the downlist
DOWNLIST=${DOWNLIST}${LOOPTEMP},

## maybe do some more actions here if you want

fi

fi

done

done

exit 0


For some reason Blogger doesn't respect any attempt on my part to use tabbing, so this code looks like crap, appologies for this. Unless i use the pre tags it won't work, so when i get an elegant solution for tabbing with blogger my script code is going to look pretty nasty. Appologies.

Monday, December 18, 2006

BASH Hacks: The Comma Delimited List

BASH is by no means a complete language, and for all but a very few things it's not the right one to go with. But for scripts that make intense use of external programs it's really the best, and so this will be the first article in a series of many that deals with simple BASH hacks to make your scripting life easier.

One of the biggest problems with BASH is that the data structures are fairly primative, and so writing a script that expands with your infrastructure is often times very hard. You end up having to do massive hacks to the script every time you add a new server, or customize the script per server etc. I've developed a bit of code that I use in any script that deals with potential areas for growth using a comma delimited list to perform a set of actions on each member of that list. This makes the code much more portable and transparent, and as your infrastructure grows or shrinks all you have to do is edit a variable definition at the beginning of one file, and the script is ready to go again. Here's the basic template:


#!/bin/bash
#################################
## commadelimited.sh
##
## A template for implementing a comma
## delimited list in BASH
#################################

######VARIABLE DEFINITIONS#######
LIST=1,2,3,4,5
#################################

# Add a trailing comma to the list variable
LOOPVAR=${LIST},

# Loop as long as there is a comma in the variable
while echo $LOOPVAR | grep \, &> /dev/null
do

# Grab one item out of the list
LOOPTEMP=${LOOPVAR%%\,*}

# Remove the item we just grabbed from the list,
# as well as the trailing comma
LOOPVAR=${LOOPVAR#*\,}

# some action with your variable
#
# echo $LOOPTEMP
#
# for example

done

exit 0


So what exactly does this do . . . well, here we go in detail. After defining the comma delimited list in a variable:


LIST=1,2,3,4,5


We have to add a comma to the end of this list, because we grep for a comma in the loop, and the loop would therefore terminate before processing the last item in the list, as there would no longer be a comma, because we trim the trailing comma from the list with the item itself:


LOOPVAR=${LIST},


Now we're ready to rock. If you havn't had the pleasure of using an if grep or while grep statement, then I would suggest you start to use it, this is one of the more usefull features of BASH:


while echo $LOOPVAR | grep \, &> /dev/null
do



This will loop for as long as the loopvar variable contains a comma in it.


LOOPTEMP=${LOOPVAR%%\,*}


This grabs the first item from what's left of the LOOPVAR list. The way this works is the % in the ${} indicates that we want to trim the string from the end. The fact that there are two of them (%%) tells it to trim greedy, and match as much as possible. The regular expression that follows the %% says to chop from the furthest comma to the right, followed by anything.


LOOPVAR=${LOOPVAR#*\,}


This statement is similar to the last one, but this one takes out the first value and the trailing comma and shortens the list by one. The # tells it to trim the string from the beginning, and only one # tells it to be non-greedy and grab everything up through the first comma from the left, as defined by the regular expression that follows.

So as we loop, the list gets shorter by one item per iteration, and eventually cycles down to an empty string, where the grep \, fails and the loop breaks.

The following script is an application of the comma delimited list idea that I use in my everyday work:


#!/bin/bash
################################
## sendadminmail.sh
##
## This script is called by monitoring scripts
## to send mail to all of the administrators
## of our company should something go
## wrong. Which given my skills, should
## never happen.
################################

#######VARIABLE DEFINITIONS########
#
# Comma Delimited List of Administrative E-mail addresses
ADMINLIST=test@test.com,test@test2.com,test@test3.com
#
###################################

# initialize the loop variable, the trailing comma is
# so that the last entry will be honored
LOOPVAR=${ADMINLIST},

# this loop loops through each individual entry on the list
while echo ${LOOPVAR} | grep \, &> /dev/null
do

# Grab the first e-mail address out of the comma-delimited list
LOOPTEMP=${LOOPVAR%%\,*}

# Remove the first e-mail address from the
# LOOPVAR comma-delimited list
LOOPVAR=${LOOPVAR#*\,}

# send the e-mail
echo "$1" | mail -s "$2" "$LOOPTEMP"

done

exit 0

A Hate Letter to MySQL

Normally I just throw up code snippets and cool stuff, but having to work with MySQL on a daily basis behooves me to throw up a rant to the MySQL developers here. I was strongly in the PostgreSQL camp, as that's what I learned on, and it had nice features that MySQL lacked . . . like referential data integrity for example. But with MySQL 5.0.x they announced that MySQL had matured and was ready to be a real SQL server with the purchase of the InnoDB storage engine.

So I switched to MySQL for 2 reasons: the first being that MySQL 5.0.x seemed to be up to snuff and ready to compete with the big boys (namely Oracle and PostgreSQL), and 2, unfortunatly it's called the LAMP stack and not the LAPP stack, and every web app out there has MySQL support, but only a few have support for PostgreSQL, and as my job forced me to deal with these programs, I was forced to migrate. So here I will bring my complaints against MySQL, and suggest directions for improvement.

1. Kill Off MyISAM

It would be a mercy killing. For fuck's sake, this thing doesn't adhere to much of anything in the SQL 95 or SQL 99 standard, so stop claiming that it's a storage engine. Despite the fact that InnoDB has nice features like referencial data integrity and Transaction handling, the majority of database designers out there (if we may even call them that) don't take the time to write InnoDB databases. Aparently that ENGINE=INNODB is just too much strain on the wrist to type, and in addition to this, the default storage engine in the my.cnf file is . . . you guessed it: MyISAM.

MyISAM is a disgrace to SQL, it doesn't support FOREIGN KEY constraints, which is, by the way, like the key feature of SQL, and it doesn't support transactions, which means if you write a web app with a complicated series of inserts, and the clown end user hits the stop button a half second into this, your data set is completely fucked. I realize that most web developers don't put transaction support in their code even when it's available, but let that be their mistake, don't make the lowly DBA . . . me . . . tell my hot shot developer that he can't put transaction support in because we're running MyISAM. You can of course specify a FOREIGN KEY in MyISAM, but as it notes in the manual, this does not enforce the constraint, it merely serves as a mental note to the database designer. ARE YOU KIDDING ME?! It's time to put this baby to bed . . . permanantly. I'll tell you what you do MySQL, you take the only good feature of this thing: full text indexing, and you fold it into InnoDB and turn out the lights.

2. Kill phpMyAdmin

If anyone intelligent is reading this, you can imagine my disgust when i walk into my first day on the job and see my first developer using phpMyAdmin to interact with the database. The idea behind this is pretty cool I'll admit, but it's this nasty little program that lead to the rise of the MyISAM storage engine, and we should put it to bed along with it's progeny.

Aside from the fact that this program is the single most notorious security hole since portmapper, and putting it on the front of your DB is like giving a stranger the keys to your car and hoping he doesn't steal it, this program sucks. If you know MySQL, it's about a hundred times faster to use the mysql CLI client program to build your database and query it. This thing is terribly slow and terribly insecure. I have no problem with the novice database designer using it for playing around, or for putting up a personal website, but I have personally seen this thing used all over the place in production level deployments. Luckily enough I killed it at my current job, but I can assure you that the University of Chicago uses it for it's mega-database, so if you want to change your grade . . . well, get inside the UofC network and find the phpMyAdmin folder, and do what you want.

This program is the number one reason why there are so many crappy databases around, and why they are mostly for MySQL. This thing allows a novice database designer to design a MySQL database, without knowing anything about SQL at all. If you want to design a database, pick up the PostgreSQL book by Bruce Momijian, or read the MySQL 5.0.x documentation from cover to cover . . . I've read both, and I assure you they are compelling reads, and you'll never make the clown mistakes that these phpMyAdmin designers make, and you'll code your database in 10% of the time.

3. MySQL Cluster Is Not, I repeat NOT Ready For Deployment

So stop claiming that it is. First of all the NDB storage engine is a big pile of trash, and they need to do a complete rewrite. NDB has several problems the first of which being that it is an in memory storage system. Yes, this does lead to faster query execution, but it comes at a costly price. Firstly, to scale this thing takes a lot of money, once your data set gets over about 5 gigs, which can happen very quickly indeed. Secondly, the system works by storing stuf in memory and periodically writing down to the hard drive, which means if you have a hard crash in between those two point your data is fucked, and have fun recovering from memory . . . I assure you that this is no picnic.

The next problem that comes from this are that limits on memory usage are hardcoded in the my.cnf file, in order to make changes to this, you are required to do a complete restart of the cluster . . . which means downtime, and avoiding downtime is the sole purpose you went with the cluster in the first place. And there are no utilities that easily track what kind of space you're using, so you'd better be a master of perl or python, because you have some nasty scripting ahead of you, to keep track of your data set . . . otherwise you'll max your memory usage and the database will refuse further updates to the dataset . . . oh yes it's fun.

In addition to this NDB seems to fail for a lot of the reasons that MyISAM fails, and more. NDB doesn't support foreign key constraints, nor does it support transaction handling, and in addtion to this it doesn't support multi-column unique indexes, or multiple auto_increments on a table. NDB also requires each table to have a primary key, which isn't so bad, but it's rediculous that this is required by the engine itself. NDB does not support full text indexing either, so as the bastard child of MyISAM it even lacks the one feature that is good about MyISAM. Please for the love of God do a rewrite, make it a non-memory based storage engine with transaction handling, and referencial data integrity and multi-column unique indices. I'm so tired of reading all these articles basking in the glow of MySQL cluster, these articles convinced me to deploy it, and really, for all it's faults, this is not nearly as good a solution as MySQL circular replication, which is sad.

4. InnoDB's Future

The reason for switching to MySQL in the first place is InnoDB, but this baby has a long way to go before i'm pleased with it. Firstly . . . the defining of a foreign key constraint is so unbelievably annoying it's absurd. The only way to define a foreign key is to do it at the end of a table definition with FOREIGN KEY (id) REFERENCES table(id). This kills me, allow us to do it the way PostgreSQL and Oracle and even MSSQL does it . . . id INTEGER NOT NULL references table(id). See how nice that is . . . see how you just do it when you define the column, so that you're not forced to look back and find out what the column name was and write a bunch of rediculous nonsense at the end. All of us DBA's and architects would give you a big kiss if you would just do this bit of syntatic tom-foolery for us.

Fold in Full Text indexing and make InnoDB the default storage engine in the my.cnf file. So what . .. an army of terrible database designers will have to learn InnoDB, oh my god that would be so terrible. Syntactically the basic functionality is no different, it just adds some sweet stuff so do it, for the love of code. Make anyone who wants to write MyISAM databases strain their wrists with the extra ENGINE=MyISAM, they deserve carpul tunnel if they're going to write databases with that nonsense storage engine anyway.

Finally, beef up PLmySQL, it sucks. For those of you unfamiliar with PLSQL it means Procedural Language SQL, and it's a language for triggers and such. Triggers are widely considered to be bad practice . . . but at the same time, they are very useful for administrative data collection and for lots of other things too. PostgreSQL has a robust PLpgSQL that works amazingly well and makes coding triggers really easy. PLmySQL sucks and could use a lot of beefing up. Specifically you need a much better system of passing data into the trigger, as the system right now is almost indeciferable and a lot of data just isnt' available to the trigger or procedure.

If you do all of this, you might just have me on your side of the debate because I want to be there . . . as opposed to having to be there because of your marketshare.

BASH Script Background Music

So I was sitting here at work, writting a massive script to handle the increasing complexity of our subversion repository for my co-worker Sean (of seancode.blogspot.com fame), and the script did so much I didn't know what to call it. For the sake of naming it, I just named it the most rediculously epic sounding thing I could think of . . . ultramega.sh, which, as it happens, is also the name of a PowerMan 5000 song. Sean and I went out for a smoke and I told him that the name of his script was ultramega.sh, after the PowerMan 5000 song, and I joked that i should have my script play the song in the background while it was chugging along . . . as this bad mother of a script takes a long time to do everything that it does. Sean was like . . . AWESOME! Have it start with the script and stop with the script . . . but we need something more epic than UltraMega, have it play "The Final Countdown" by Europe on repeat. Sweet god I thought . . . that is awesome, let's do it! So I did, with an assist from sean, who figured out how to put VLC into quiet mode, crank the volume and have it repeat. So if you've ever wanted epic background music for a script, or wanted to fuck with one of your programmers . . . here's how we pulled it off:

#!/bin/bash
##########################################################
## FinalCountdown.sh
##
## A template for adding background music to your script
##########################################################

#########VLC VARIABLES###########
VLCFILE=~/finalcountdown.mp3
VLCCOMMAND="vlc ${VLCFILE} --volume 500 -q -L"
GREPCOMMAND=$(echo "$VLCCOMMAND" | sed 's/\ /\\ /g')
##################################

# Start VLC
$VLCCOMMAND &

# PUT THE CONTENTS OF YOUR SCRIPT HERE
#
# do some crap
#
# some more crap
#
# finally back to VLC

# stop VLC
GREPRESULT=$(ps aux | grep "$GREPCOMMAND")
GREPRESULT=${GREPRESULT#*\ }
GREPRESULT=${GREPRESULT%%\ *}
kill ${GREPRESULT}

# end script
exit 0

In case you havn't figured it out yet . . . this is for linux, but it should work equally well for all *nix's. I'll go over the script in detail for those of you not fluent in the language of the gods . . . BASH.

First - the variables:

VLCFILE=~/finalcountdown.mp3

This is the location of the song you want to play with VLC.

VLCCCOMMAND="vlc $VLCFILE --volume 500 -q -L"

This is the command we want to execute . . . this is here in case we want to use a non-vlc audio program, it makes the code transparent enough such that you just enter in the command of whatever audio program you want and it will work. Specifically for VLC the volume directive tells VLC to put volume to 500%, which is the max VLC volume, the -q tells VLC to not output much to the terminal, and the -L loops the audio output, so that if our script takes longer than the song, we're not left with an awkward silence.

GREPCOMMAND=$(echo "$VLCCOMMAND" | sed 's/\ /\\ /g')

This variable is less obvious than the first two, but what it does is it optimizes the command string for a grep by replacing all of the spaces with \spaces, so that it works regular expression-wise.

We then simply start VLC in the background by executing the VLCCOMMAND variable followed by the trailing &.

When we want to kill VLC we use ps aux with our optimized GREPCOMMAND variable to grab the one process that we're looking for . . . afterall you don't want to issue killall vlc and kill all the other VLC processes running too . . . that would suck:

GREPRESULT=$(ps aux | grep "$GREPCOMMAND")

We then need to trim all the extra stuff off the result of this to get just the PID . .. we do that with the following two commands:


GREPRESULT=${GREPRESULT#*\ }
GREPRESULT=${GREPRESULT%%\ *}


Then kill the process with the PID that we just grabbed:


kill ${GREPRESULT}


Pretty friggin' sweet huh? We think so . . . for a real laugh, put this in a script on a machine in your data center . . . and drive those techs crazy with the seemingly random Europe rips coming from your boxes at all hours.