I have decided to move my development efforts from MySQL to PostgreSQL. Why? There’s a number of reasons, but there’s one main reason:
Schemas.
The concept of the schema is pretty unknown in the MySQL world. I admit I’d pretty much forgotten they existed even though I’ve learnt about them in the past setting up other databases (MS SQL Server – actually a pretty good product). Anyway, in MySQL a schema is nothing but the structure of your database. In PostgreSQL, a schema is a powerful feature for creating multiple “views” into the same database, but with ability to share between them.
Here’s an example. Say you have two applications, which you want to share a Users table but still have their own tables for “local” settings. Here are your options on MySQL:
- Put both applications into the same database, mixing the tables in with each other, perhaps with different prefixes for the tables, and overriding in the case of Users. Make Users a giant catch-all table with preferences for both apps, with a namespace for those fields inside the table. Pros: easy, can join into the shared table. Cons: Security is poor (I want to grant on a per-database level, not per-table), ugly as hell.
- Put each application inside its own database and make a third database for shared tables. Set your app to normally look inside its own database, and connect to the shared database when it needs to access the Users table. Pros: Better security compartmentalisation. Better looking, more intuitively named tables. Possibility of easier scaling since you can host the DBs on different machines. Cons: Loss of ability to join into the shared tables without nasty hacks. Constrains the kind of lookups you can do without severe performance penalties. More complex, loss of a single authorative logfile.
- Like number 2 but replicating the shared tables into and out of both apps by any of a number of means. Pros: solves the problem nicely. Cons: Complex, nasty solution which seems to be asking for trouble.
For the record, I’ve tried all three. I’d settled on number 2 as the better of three evils.
Here’s what you would do on PostgreSQL:
Create a single database with 3 users and three schemas. Name the Users App1, App2 and Shared, and the Schemas likewise, granting access to the matching users. Create the shared tables in the Shared schema, and the App1 and App2 tables in their schemas. Note that as far as the Schemas are concerned, they are in their own little world – no namespace conflicts.
Now set App1 and App2’s search paths to read App1/App2,Shared. There you go – as far as App1 and App2 is concerned, the table is right there – no complexity required. Set your app to use the appropriate schema and you’re done. It’s like editing your path in unix.
This might seem like overkill for such a small issue – but actually I’ve got a number of shared tables and more apps than that. The ability to use Schemas to solve all my problems here is a godsend, one that I wish I’d thought of earlier.
PostgreSQL has some other nice features as well, such as TableSpaces, which allows easy distribution of its storage by table onto different disks: you might want to put your ultra-high-activity Users table on the fast but expensive and small SCSI disk, for example, and the much larger but lover volume CatPictures table on a big, cheap SATA drive. There’s support for millisecond timestamps – MySQL, unbelievably, doesn’t go beyond 1 second accuracy. I’ve mentioned the much more strict SQL syntax requirements below – it’s taken me hours to clean up a lot of the junk MySQL happily allowed me to store (although I’m not going to claim it wasn’t my own fault; it was). And the new native data type of UUID makes me very happy, since I’ve come to believe that basically everything in a database of any important should have a UUID (synchronising two databases on different continents primary keyed on an integer = nightmare, keyed on a UUID = doable). And the backup facilities are far improved – it can easily write out full transaction logs while live, allowing full recoverability – something I’d been pretty worried about with MySQL. And its user rights system seems much more intuitive than MySQL’s.
I’d resisted PgSQL for quite some time, but one by one those reasons have disappeared. For one, it always had a reputation for being slow – now pretty thoroughly disproved. It seemed quite alien and unfamiliar, and I had trouble even getting it running the last time I tried it. Well, either I’ve become more knowledgeable or it’s easier to install, because I had no problems at all this time. And I worried that I didn’t know how to configure it properly – I discarded this reason upon realising I don’t really know jack shit about configuring MySQL properly either, and MySQL has hundreds of opaque options I know next to nothing about. In fact, I’ve had more trouble with MySQL! Even now, I can’t seem to force the MySQL on my local machine here to reliably use UTF8 as its default language in all situations.
I definitely won’t be deleting MySQL or anything like that. MySQL works fine for this blog and a few others. My MediaWiki is installed on it, plus a number of other apps use it. I’m of the “if it ain’t broke, don’t fix it” school when it comes to things like this so I’m just going to run them concurrently for the time being. I have nothing against MySQL, it’s served me extremely well for years – but the Schemas feature was the straw that broke the camel’s back.
I still don’t know for sure if I’ll stick with it – a horrible problem may well emerge, but one thing is for sure: I’ll keep all my data in a portable format from now on. MySQL is extremely permissive (or, dare I say, “lax”) with its enforcement of SQL syntax requirements and 90% of the time it’s taken to migrate has been in ad hoc repairs to tables and data to get them to conform. Now that’s done, I’m going to keep it done, and it’s easy to move back to MySQL at any time should the need arise. A bit of subtle vendor lock-in by MySQL, or simply making it “easier” for developers? Well, my thoughts on violating the standards to make it “easier” are pretty well known (see: any previous rant about Internet Explorer) so I’ll stick with the standards every time.
In conclusion: if you have the time, need and inclination I’d recommend giving PgSQL 8.3 a try.