Monday, December 18, 2006

A Hate Letter to MySQL

Normally I just throw up code snippets and cool stuff, but having to work with MySQL on a daily basis behooves me to throw up a rant to the MySQL developers here. I was strongly in the PostgreSQL camp, as that's what I learned on, and it had nice features that MySQL lacked . . . like referential data integrity for example. But with MySQL 5.0.x they announced that MySQL had matured and was ready to be a real SQL server with the purchase of the InnoDB storage engine.

So I switched to MySQL for 2 reasons: the first being that MySQL 5.0.x seemed to be up to snuff and ready to compete with the big boys (namely Oracle and PostgreSQL), and 2, unfortunatly it's called the LAMP stack and not the LAPP stack, and every web app out there has MySQL support, but only a few have support for PostgreSQL, and as my job forced me to deal with these programs, I was forced to migrate. So here I will bring my complaints against MySQL, and suggest directions for improvement.

1. Kill Off MyISAM

It would be a mercy killing. For fuck's sake, this thing doesn't adhere to much of anything in the SQL 95 or SQL 99 standard, so stop claiming that it's a storage engine. Despite the fact that InnoDB has nice features like referencial data integrity and Transaction handling, the majority of database designers out there (if we may even call them that) don't take the time to write InnoDB databases. Aparently that ENGINE=INNODB is just too much strain on the wrist to type, and in addition to this, the default storage engine in the my.cnf file is . . . you guessed it: MyISAM.

MyISAM is a disgrace to SQL, it doesn't support FOREIGN KEY constraints, which is, by the way, like the key feature of SQL, and it doesn't support transactions, which means if you write a web app with a complicated series of inserts, and the clown end user hits the stop button a half second into this, your data set is completely fucked. I realize that most web developers don't put transaction support in their code even when it's available, but let that be their mistake, don't make the lowly DBA . . . me . . . tell my hot shot developer that he can't put transaction support in because we're running MyISAM. You can of course specify a FOREIGN KEY in MyISAM, but as it notes in the manual, this does not enforce the constraint, it merely serves as a mental note to the database designer. ARE YOU KIDDING ME?! It's time to put this baby to bed . . . permanantly. I'll tell you what you do MySQL, you take the only good feature of this thing: full text indexing, and you fold it into InnoDB and turn out the lights.

2. Kill phpMyAdmin

If anyone intelligent is reading this, you can imagine my disgust when i walk into my first day on the job and see my first developer using phpMyAdmin to interact with the database. The idea behind this is pretty cool I'll admit, but it's this nasty little program that lead to the rise of the MyISAM storage engine, and we should put it to bed along with it's progeny.

Aside from the fact that this program is the single most notorious security hole since portmapper, and putting it on the front of your DB is like giving a stranger the keys to your car and hoping he doesn't steal it, this program sucks. If you know MySQL, it's about a hundred times faster to use the mysql CLI client program to build your database and query it. This thing is terribly slow and terribly insecure. I have no problem with the novice database designer using it for playing around, or for putting up a personal website, but I have personally seen this thing used all over the place in production level deployments. Luckily enough I killed it at my current job, but I can assure you that the University of Chicago uses it for it's mega-database, so if you want to change your grade . . . well, get inside the UofC network and find the phpMyAdmin folder, and do what you want.

This program is the number one reason why there are so many crappy databases around, and why they are mostly for MySQL. This thing allows a novice database designer to design a MySQL database, without knowing anything about SQL at all. If you want to design a database, pick up the PostgreSQL book by Bruce Momijian, or read the MySQL 5.0.x documentation from cover to cover . . . I've read both, and I assure you they are compelling reads, and you'll never make the clown mistakes that these phpMyAdmin designers make, and you'll code your database in 10% of the time.

3. MySQL Cluster Is Not, I repeat NOT Ready For Deployment

So stop claiming that it is. First of all the NDB storage engine is a big pile of trash, and they need to do a complete rewrite. NDB has several problems the first of which being that it is an in memory storage system. Yes, this does lead to faster query execution, but it comes at a costly price. Firstly, to scale this thing takes a lot of money, once your data set gets over about 5 gigs, which can happen very quickly indeed. Secondly, the system works by storing stuf in memory and periodically writing down to the hard drive, which means if you have a hard crash in between those two point your data is fucked, and have fun recovering from memory . . . I assure you that this is no picnic.

The next problem that comes from this are that limits on memory usage are hardcoded in the my.cnf file, in order to make changes to this, you are required to do a complete restart of the cluster . . . which means downtime, and avoiding downtime is the sole purpose you went with the cluster in the first place. And there are no utilities that easily track what kind of space you're using, so you'd better be a master of perl or python, because you have some nasty scripting ahead of you, to keep track of your data set . . . otherwise you'll max your memory usage and the database will refuse further updates to the dataset . . . oh yes it's fun.

In addition to this NDB seems to fail for a lot of the reasons that MyISAM fails, and more. NDB doesn't support foreign key constraints, nor does it support transaction handling, and in addtion to this it doesn't support multi-column unique indexes, or multiple auto_increments on a table. NDB also requires each table to have a primary key, which isn't so bad, but it's rediculous that this is required by the engine itself. NDB does not support full text indexing either, so as the bastard child of MyISAM it even lacks the one feature that is good about MyISAM. Please for the love of God do a rewrite, make it a non-memory based storage engine with transaction handling, and referencial data integrity and multi-column unique indices. I'm so tired of reading all these articles basking in the glow of MySQL cluster, these articles convinced me to deploy it, and really, for all it's faults, this is not nearly as good a solution as MySQL circular replication, which is sad.

4. InnoDB's Future

The reason for switching to MySQL in the first place is InnoDB, but this baby has a long way to go before i'm pleased with it. Firstly . . . the defining of a foreign key constraint is so unbelievably annoying it's absurd. The only way to define a foreign key is to do it at the end of a table definition with FOREIGN KEY (id) REFERENCES table(id). This kills me, allow us to do it the way PostgreSQL and Oracle and even MSSQL does it . . . id INTEGER NOT NULL references table(id). See how nice that is . . . see how you just do it when you define the column, so that you're not forced to look back and find out what the column name was and write a bunch of rediculous nonsense at the end. All of us DBA's and architects would give you a big kiss if you would just do this bit of syntatic tom-foolery for us.

Fold in Full Text indexing and make InnoDB the default storage engine in the my.cnf file. So what . .. an army of terrible database designers will have to learn InnoDB, oh my god that would be so terrible. Syntactically the basic functionality is no different, it just adds some sweet stuff so do it, for the love of code. Make anyone who wants to write MyISAM databases strain their wrists with the extra ENGINE=MyISAM, they deserve carpul tunnel if they're going to write databases with that nonsense storage engine anyway.

Finally, beef up PLmySQL, it sucks. For those of you unfamiliar with PLSQL it means Procedural Language SQL, and it's a language for triggers and such. Triggers are widely considered to be bad practice . . . but at the same time, they are very useful for administrative data collection and for lots of other things too. PostgreSQL has a robust PLpgSQL that works amazingly well and makes coding triggers really easy. PLmySQL sucks and could use a lot of beefing up. Specifically you need a much better system of passing data into the trigger, as the system right now is almost indeciferable and a lot of data just isnt' available to the trigger or procedure.

If you do all of this, you might just have me on your side of the debate because I want to be there . . . as opposed to having to be there because of your marketshare.

No comments: