Posts Tagged ‘postgresql’

UUIDs in Rails redux

Tuesday, April 15th, 2008

I have covered forcing ActiveRecord to respect UUID data types in Migrations before. That helps us create our database - now what about in use? We need to create the UUIDs and store them in the database.

These examples all rely on the uuidtools gem, so install that if you haven’t already (and require it somewhere in environment.rb).

1. Setting a UUID using ActiveRecord callbacks

If you don’t need the UUID in the object upon creation but only want to ensure it’s there upon save, do this. Suggestion initially from this page, changes are mine.

We will use the before_create callback to ask AR to add a UUID of our choosing before the record is saved.

Add this to your lib directory:

# lib/uuid_helper.rb
require 'uuidtools'
 
module UUIDHelper
  def before_create
    self.id = UUID.random_create.to_s
  end
end

And now include this in your models:

class Airframe < ActiveRecord::Base
  include UUIDHelper
 
  #my stuff
 
end
>> Airframe.new
=> #< Airframe id: nil, maker_id: nil>
>> Airframe.create!
=> #< Airframe id: "1a82a408-32e6-480e-941d-073a7e793299", maker_id: nil>

2. Initialising a model with a UUID

If you want the UUID in the model before save, i.e. upon initialisation, we have to get a little more fancy:

# lib/uuid_init.rb
require 'uuidtools'
 
module UUIDInit
  def initialize(attrs = {}, &block) 
   super 
   ['id'] = UUID.random_create.to_s
  end
end

Now include this in your models:

class Flightpath  < ActiveRecord::Base
 
  include UUIDInit
 
  # my stuff
 
end
>> Flightpath.new
=> #< Flightpath created_at: nil, id: "5e5bcd63-070d-4252-8556-2876ddd83b54">

Be aware that it will conflict with any other initialisation you do in there, so you might want to simply copy in the whole method if you need other fields upon initialisation:

class User < ActiveRecord::Base
 
  def initialize(attrs = {}, &block) 
   super 
   ['balance'] = 0.0
   ['id'] = UUID.random_create.to_s
  end
 
end
>> User.new
=> #

3. Sessions

All this is very well for your own models, but what about Rails’ inbuilt sessions? By default, they want an autoincrementing integer primary key.

The good news is it’s easy to override. Your migration should look like this:

create_table "sessions", :id => false, :force => true do |t|
  t.string   "session_id"
  t.text     "data"
  t.datetime "updated_at"
  t.datetime "created_at"
end

Now add this to your environment.rb file:

# config/environment.rb
CGI::Session::ActiveRecordStore::Session.primary_key = 'session_id'

And this to your Application Controller:

# app/controllers/application.rb
class ApplicationController < ActionController::Base
 
before_filter :config_session # at the top, if possible
 
def config_session
  session.model.id = session.session_id
end
 
end

And voila, your session store is using the session_id as its primary key. I don’t see any point in using a UUID for your sessions’ PK, but if you want to you’ll find an example override class in:

actionpack/lib/action_controller/session/active_record_store.rb.

Remember to drop any preexisting sessions table in your database, or it will likely complain of null ids when you switch to session_id as your primary key.

hacking native UUID support into Schema:Dump

Monday, February 11th, 2008

Want to use PostgreSQL’s native UUID datatype but AR won’t let you use it with migrations?

/Library/Ruby/Gems/1.8/gems/activerecord-2.0.2/lib/active_record/connection_adapters/postgresql_adapter.rb:

# insert into
def simplified_type
  # UUID type
  when /^uuid$/
  :uuid
# insert into
def native_database_types
  :uuid      => { :name => "uuid" },

Well, that’ll get your data OUT of the database, but AR will throw a fit when you try to load it back in unless you also add uuid into the range of column types TableDefinition will accept:

in /Library/Ruby/Gems/1.8/gems/activerecord-2.0.2
/lib/active_record/connection_adapters/abstract/schema_definition.rb:

# insert into
def column(name, type, options = {})
 
%w( string text integer float decimal datetime timestamp time date binary boolean uuid ).each do |column_type|

Now you can do this:

    t.uuid     "uuid",   :null => false

About the nastiest possible hack you can do but works in/out. Here’s a patch if you don’t want to do it yourself, but no guarantees.

UPDATE:

And don’t forget to write your migrations like this to stop AR from inserting its “helpful” id columns with autoincrementing serials which your DB doesn’t need and can’t use:

  def self.up
    create_table :transactions, :id => false do |t|
      t.uuid     "id",  :null => false
      t.timestamps
    end
  end

UPDATE 2:

I now do not recommend doing this. It’s more trouble than it’s worth. There is very little you gain in forcing native UUID type in Postgres, and the complexity, hacks, loss of cross-platform compatibility and general annoyance you face are just not worth it.

Just use a string class for any UUIDs. Of course, the final hint on this page - the no-id switch for migrations - is still useful and you should use that.

Strict databases are great for discipline

Friday, February 8th, 2008

Oh, boy. Nothing exposes one’s shitty programming habits like using a new, stricter databse like PostgreSQL. All over the place I’m discovering, and fixing, code in which I’d demonstrated a lax attitude to data integrity - from trying to insert invalid data into columns I’d previously specified should not be null (datetimes, mostly) to declaring booleans with 1 or 0 instead of true and false.

It’s annoying, sure, but it’s also quite satisfying. A database that won’t take any shit is great for your programming discipline and must lead to better, more reliable code. I really can’t believe how much MySQL allows you to get away with, and I’m very glad my eyes were opened to the problems with the data I was storing before I had to manually repair million-line tables in production.

I have discovered a few annoyances with PostgreSQL, though - just to be fair. Its sequence system is pretty silly - after importing data you have to then go and explicitly set new sequence numbers for any autoincrement fields (PgSQL calls them “serials”). A useful facility but I think the default should just be to increment the sequence on import. I have of course written a script to automate this but still.

Another complaint is regarding the security, which if anything seems *too* strict. When you use Schemas to share logical databases between multiple users, any user but the owner of the “foreign” schema must have privileges granted explicitly on not only every table they plan to use in that schema, but on the aforementioned sequence tables too! I can understand the first, kind of, although there should be an option to grant privileges at the schema level, but the second is just silly - if you have rights to add new rows to a table, it is implied you should also have rights to increment the sequence. A needless bit of complexity.

That said, I’m overall delighted with the migration and everything was up and running fine. It’s not now, since I decided to completely excise my nasty multi-database hacks and simplify the data structures, removing all potential conflicts and separating tables into their logical homes. I’m about half way through doing that. And again, I’m really happy that I’m doing this now - what may take a day or two with development-stage databases might take weeks to do with live production servers - not to mention all the code that would have been built on top of the original suboptimal data structure. I’d actually been just about to write a whole lot more of that - something I’d actually been putting off because I knew what a mess the databases were, and was reluctant to dig myself any deeper a hole - but now I’m really looking forward to flying through what should be a much simpler, more intuitive job.

Switching to PostgreSQL

Thursday, February 7th, 2008

I have decided to move my development efforts from MySQL to PostgreSQL. Why? There’s a number of reasons, but there’s one main reason:

Schemas.

The concept of the schema is pretty unknown in the MySQL world. I admit I’d pretty much forgotten they existed even though I’ve learnt about them in the past setting up other databases (MS SQL Server - actually a pretty good product). Anyway, in MySQL a schema is nothing but the structure of your database. In PostgreSQL, a schema is a powerful feature for creating multiple “views” into the same database, but with ability to share between them.

Here’s an example. Say you have two applications, which you want to share a Users table but still have their own tables for “local” settings. Here are your options on MySQL:

  1. Put both applications into the same database, mixing the tables in with each other, perhaps with different prefixes for the tables, and overriding in the case of Users. Make Users a giant catch-all table with preferences for both apps, with a namespace for those fields inside the table. Pros: easy, can join into the shared table. Cons: Security is poor (I want to grant on a per-database level, not per-table), ugly as hell.
  2. Put each application inside its own database and make a third database for shared tables. Set your app to normally look inside its own database, and connect to the shared database when it needs to access the Users table. Pros: Better security compartmentalisation. Better looking, more intuitively named tables. Possibility of easier scaling since you can host the DBs on different machines. Cons: Loss of ability to join into the shared tables without nasty hacks. Constrains the kind of lookups you can do without severe performance penalties. More complex, loss of a single authorative logfile.
  3. Like number 2 but replicating the shared tables into and out of both apps by any of a number of means. Pros: solves the problem nicely. Cons: Complex, nasty solution which seems to be asking for trouble.

For the record, I’ve tried all three. I’d settled on number 2 as the better of three evils.

Here’s what you would do on PostgreSQL:

Create a single database with 3 users and three schemas. Name the Users App1, App2 and Shared, and the Schemas likewise, granting access to the matching users. Create the shared tables in the Shared schema, and the App1 and App2 tables in their schemas. Note that as far as the Schemas are concerned, they are in their own little world - no namespace conflicts.

Now set App1 and App2’s search paths to read App1/App2,Shared. There you go - as far as App1 and App2 is concerned, the table is right there - no complexity required. Set your app to use the appropriate schema and you’re done. It’s like editing your path in unix.

This might seem like overkill for such a small issue - but actually I’ve got a number of shared tables and more apps than that. The ability to use Schemas to solve all my problems here is a godsend, one that I wish I’d thought of earlier.

PostgreSQL has some other nice features as well, such as TableSpaces, which allows easy distribution of its storage by table onto different disks: you might want to put your ultra-high-activity Users table on the fast but expensive and small SCSI disk, for example, and the much larger but lover volume CatPictures table on a big, cheap SATA drive. There’s support for millisecond timestamps - MySQL, unbelievably, doesn’t go beyond 1 second accuracy. I’ve mentioned the much more strict SQL syntax requirements below - it’s taken me hours to clean up a lot of the junk MySQL happily allowed me to store (although I’m not going to claim it wasn’t my own fault; it was). And the new native data type of UUID makes me very happy, since I’ve come to believe that basically everything in a database of any important should have a UUID (synchronising two databases on different continents primary keyed on an integer = nightmare, keyed on a UUID = doable). And the backup facilities are far improved - it can easily write out full transaction logs while live, allowing full recoverability - something I’d been pretty worried about with MySQL. And its user rights system seems much more intuitive than MySQL’s.

I’d resisted PgSQL for quite some time, but one by one those reasons have disappeared. For one, it always had a reputation for being slow - now pretty thoroughly disproved. It seemed quite alien and unfamiliar, and I had trouble even getting it running the last time I tried it. Well, either I’ve become more knowledgeable or it’s easier to install, because I had no problems at all this time. And I worried that I didn’t know how to configure it properly - I discarded this reason upon realising I don’t really know jack shit about configuring MySQL properly either, and MySQL has hundreds of opaque options I know next to nothing about. In fact, I’ve had more trouble with MySQL! Even now, I can’t seem to force the MySQL on my local machine here to reliably use UTF8 as its default language in all situations.

I definitely won’t be deleting MySQL or anything like that. MySQL works fine for this blog and a few others. My MediaWiki is installed on it, plus a number of other apps use it. I’m of the “if it ain’t broke, don’t fix it” school when it comes to things like this so I’m just going to run them concurrently for the time being. I have nothing against MySQL, it’s served me extremely well for years - but the Schemas feature was the straw that broke the camel’s back.

I still don’t know for sure if I’ll stick with it - a horrible problem may well emerge, but one thing is for sure: I’ll keep all my data in a portable format from now on. MySQL is extremely permissive (or, dare I say, “lax”) with its enforcement of SQL syntax requirements and 90% of the time it’s taken to migrate has been in ad hoc repairs to tables and data to get them to conform. Now that’s done, I’m going to keep it done, and it’s easy to move back to MySQL at any time should the need arise. A bit of subtle vendor lock-in by MySQL, or simply making it “easier” for developers? Well, my thoughts on violating the standards to make it “easier” are pretty well known (see: any previous rant about Internet Explorer) so I’ll stick with the standards every time.

In conclusion: if you have the time, need and inclination I’d recommend giving PgSQL 8.3 a try.

Reserved words in PostgreSQL

Wednesday, February 6th, 2008

Trying out PostgreSQL? You might hit some troubles importing your MySQL datasets. MySQL is far more lenient about reserved words; you might find you’ve inadvertently named your columns in a way that’ll make PgSQL scream in pain.

Here’s the hard-to-find list, some obvious (SELECT, WHERE), some not (DESC - this one got me, ORDER - same, CURRENT_USER etc):

CREATE CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP
 CURRENT_TRANSFORM_GROUP_FOR_TYPE CURRENT_USER DATE DEFAULT 
DEFERRABLE DESC DISTINCT DO ELSE END EXCEPT FALSE FOR FOREIGN 
FROM GRANT GROUP HAVING IN INITIALLY INTERSECT INTO IS ISNULL 
JOIN LEADING LEFT LIKE LIMIT LOCALTIME LOCALTIMESTAMP NEW NOT 
NOTNULL NULL OFF OFFSET OLD ON ONLY OR ORDER OUTER OVERLAPS 
PLACING PRIMARY REFERENCES RETURNING RIGHT SELECT 
SESSION_USER SIMILAR SOME SYMMETRIC TABLE THEN TO TRAILING
 TRUE UNION UNIQUE USER USING VERBOSE WHEN WHERE WITH