CouchDB session model for Rails

Here’s my initial stab at a Rails Session model for CouchDB. The marshalling stuff is taken from the example SQLBypass class in the ActiveRecord code.

You’ll need a recent and trunk CouchDB, probably.

class CouchSession < Hash
  @ = CouchRest.database!('http://localhost:5984/sessions')
 
  attr_writer :data
 
  def self.find_by_session_id(session_id)
    self.new(@.get(session_id))
    rescue
    self.new(:id => session_id)
  end
 
  def self.marshal(data)   ActiveSupport::Base64.encode64(Marshal.dump(data)) if data end
  def self.unmarshal(data) Marshal.load(ActiveSupport::Base64.decode64(data)) if data end
 
  def initialize(attributes = {})
    self['_id'] = attributes['_id'] ||= attributes[:id]
    self['marshaled_data'] = attributes['marshaled_data'] ||= attributes[:marshalled_data]
    self['_rev'] = attributes['_rev'] if attributes['_rev']
  end
 
  def data
    unless 
      if self['marshaled_data']
        ,  = self.class.unmarshal(self['marshaled_data']) || {}, nil
      else
         = {}
      end
    end
    
  end
 
  def loaded?
    !! 
  end
 
  def session_id
    self['_id']
  end
 
  def save
    self['marshaled_data'] = self.class.marshal(data)
    self['data'] = data
    self['updated_at'] = Time.now
    save_record = @.save(self)
    self['_rev'] = save_record['rev']
  end
 
  def destroy
    @.delete(self['_id'])
  end
end

Nice and short – possibly the shortest Rails session class I have seen. The beauty of CouchRest/CouchDB! And we descend from hash so we can just save the object straight – after marshalling, of course. Cool, huh?

Note that I am actually writing the raw data as well as the marshalled data into the saved doc, for troubleshooting/interest purposes. Feel free to remove that.

Not pretty, but it works. Just save it like a normal model. You’ll need to put these into environment.rb:

config.action_controller.session_store = :active_record_store
CGI::Session::ActiveRecordStore.session_class = CouchSession

Note also that I have ignored any differentiation between the record ID and the session ID, negating the need for any special overrides in ApplicationController. However, the session IDs Rails generates are large and you might find them unattractive in CouchDB – it would be fairly simple to separate them, but then you’d need a new map view and an override. I feel it’s simpler to just use the Session ID as the doc ID and damn the torpedoes. YMMV.

Improvements? See something wrong with it? Let me know! ;-)

Tags: , ,

26 Responses to “CouchDB session model for Rails”

  1. Wincent Colaiuta Says:

    What’s performance like?

  2. Sho Says:

    Well, similar to any other CouchDB access I guess. I would imagine any performance issues in the above code arise from the data marshalling, but since that’s integral to Rails’ session handling I didn’t mess with that.

    I don’t know any way of easily benchmarking the entire session creation/access/deletion loop from within Rails, if you know of one I can run it for you.

    Apart from that, access is consistent with any CouchDB interaction – decently fast. However, for individual accesses, CouchDB is much faster reading. Single-threaded write of single records does not perform as well, partially due to HTTP overhead (POST or PUT) but also depending on disk speed – not helped by this laptop’s crappy HD.

    Anyway I realised I’d never actually benchmarked CouchRest so here’s a brief speed test I whipped up. It simulates the approximate record size of a Rails session, ie. 3 fields and about 1K of data.

    On the laptop:

    couchdb/couchrest speed test

    number of docs: 500

    == WRITING ==

    time to write docs: 85.907737
    time per doc: 0.171815474
    docs per second: 5.82019754518734

    == READING ==

    time to read docs: 3.824564
    time per doc: 0.007649128
    docs per second: 130.733856199033

    ## finished ##

    Boy, that was even slower than I thought it would be. Worrying slow, in fact. I’m going to run this test on the server and see what that gets.

  3. Sho Says:

    BTW i just ran the same test on a machine with a decent disk:

    couchdb/couchrest speed test

    number of docs: 500

    == WRITING ==

    time to write docs: 4.603314
    time per doc: 0.009206628
    docs per second: 108.617400420653

    == READING ==

    time to read docs: 1.314841
    time per doc: 0.002629682
    docs per second: 380.27411679435

    ## finished ##

    Speaks for itself. That’s much more in line with my prior experience. Whew, scared myself for a second there ;-)

    Don’t know what’s wrong with this laptop, the disk is absolutely slow as fuck. Maybe I need to clear some stuff off it and defrag or something…

  4. Sho Says:

    Code I used for the speed test:

    require 'rubygems'
    require 'couchrest'
     
    @db = CouchRest.database!("http://localhost:5984/speed_test")
     
     = []
     = []
     = 500
     
    def random_string(length = 25)
      (0...length).map{65.+(rand(25)).chr}.join
    end
     
     = Time.now
     
    .times do |x|
      doc_stub = {}
      doc_stub[:name] = srand
      doc_stub[:foo] = random_string(120)
      doc_stub[:bar] = random_string(1500)
      rec = @db.save(doc_stub)
       << rec['id']
    end
     
     = Time.now - 
     
    puts
    puts 'couchdb/couchrest speed test'
    puts
    puts 'number of docs: ' + .to_s
    puts
    puts '== WRITING =='
    puts
    puts 'time to write docs: ' + .to_s
    puts 'time per doc: ' + (/).to_s
    puts 'docs per second: ' + ( 1.0/(/)).to_s
    puts
     
     = Time.now
     
    while .length < 
      rec = @db.get(.pop)
       << rec['id']
    end
     
     = Time.now - 
     
    puts '== READING =='
    puts
    puts 'time to read docs: ' + .to_s
    puts 'time per doc: ' + (/).to_s
    puts 'docs per second: ' + ( 1.0/(/)).to_s
    puts
    puts '## finished ##'
  5. Wincent Colaiuta Says:

    I think the way to test the speed of this as a session backend would probably be to set up a brand new rails app with a controller action that does pretty much nothing other than store something in the session (ie. a flash) and then hit that action with ApacheBench. So you’d run two or more tests: one with your Couch backend and others with the standard Rails sessions stores (disk, MySQL, cookies etc).

  6. Sho Says:

    If you write it, I’ll run it : )

    I can already tell you the result, though, which is MySQL beating couch by a mile. Individual record read/write performance is not CouchDB’s strength. It is competent, but I doubt it can hold a candle to MySQL. No-one uses files, so I don’t think that is worth testing, and cookies are an exceptional case – the main problem with them is the latency they cause, which will be hard to simulate in-machine.

    Good idea though, I might try it out when I get a chance, foregone conclusions notwithstanding.

  7. Wincent Colaiuta Says:

    Cookies are hardly an exceptional case; I think the cookie-backed session store is the default since Rails 2.0, isn’t it?

  8. Sho Says:

    Exceptional as in not like the others. Cookies are a storage option whose drawbacks only manifest themselves fully over slow internet links. That makes any local “speed test” futile.

  9. Wincent Colaiuta Says:

    Hm, I don’t know about that. My understanding is that the Cookie store was adopted as the default precisely because it’s so much faster than the others. And when the net link is slow enough to impact the tests it would drown out the relative speed differences of the backends to the point where the tests would be meaningless, so I don’t think you should even worry about that scenario.

    So I would expect the speed order to be something like (from slowest to fastest): Couch, filesystem, MySQL, cookies. Of all of those I’d expect cookies to scale the best and filesystem to scale the worst (imagine 100,000 sessions: 100,000 files).

  10. Sho Says:

    We’ve talked about this before, I think. I do not share Rails’ love of cookie sessions.

    As my examples above mention, I find myself writing session files of approximately 1.5K. That is not a lot of data once it’s been through marshalling – a couple of preferences, the flash hash, some bits and pieces. 1.5K, not much for a DB.

    If you store sessions in a cookie, that cookie is sent with *every* request to the page. Every image, CSS, every JS script, every AJAX call. What’s the average size of a normal HTTP header? 200 bytes? Great, you’ve now bloated that by a factor of 8.

    How many resources are called when a user requests your page? Maybe an average of 10? That’s 15K you are forcing the user to upload just to view your page. You can’t tell me that will have *zero* effect on perceived speed to the user!

    I have a fundamental problem with this approach. Dumbly sending the entire session with every single HTTP request regardless of need is inefficient and against any kind of responsible design. Scaling sessions might be a problem for big sites, but throwing them into the cookie is a user-unfriendly blunt-edged anti-solution.

    As you said, if the connection speed is *that* slow, it’s not like the user is having that great an experience anyway – but it can hardly help. The fact that most connections are speed-biased towards downloads just compounds the problem.

    In its favour, I agree that Rails has done a stellar job integrating the system and making it “just work”. Maybe cookie sessions are absolutely fine for the vast majority of developers. However, I strongly dislike the idea – as you might be able to tell! – and so don’t use them.

    I do agree with your scaling & speeds estimates, though. I doubt CouchDB will ever be as fast as straight MySQL for simple things like sessions. However, I would expect it to scale far better than the other server-side techniques.

    UPDATE: the http request for the main page here is 372 bytes for me.

  11. Wincent Colaiuta Says:

    Yeah, I agree with you about the drawbacks of the cookie session store. I personally don’t use it and never have. If your cookies weigh 1.5K then that’s pretty darn wasteful; that’s why they tell you to only store really small things like integers in the session, but I don’t know how big things like integers and strings are when marshalled into the session and encrypted.

    My initial objection to the cookie session store was from a security perspective. Even though it’s encrypted I just don’t like the idea of trusting the client to hold on to the session data. There have been security holes in Rails before, and there will be again, just like with any piece of software; who’s to say that the session is really safe, despite the encryption? That’s an oft-raised complaint about the store and the stock reply is, “don’t store anything really sensitive in the store”. I personally would rather just not use the store at all.

  12. Sho Says:

    The cookie I looked at was actually only 700B; I’ve seen 1.5K as an average elsewhere. It might be wrong.

    700B looks like this:

       "updated_at": "2008-09-19T05:02:41+00:00",
       "data": {
           "user_style": "light",
           "site_format": "mobile",
           "user_id": null,
           "previous_uri": "/entry",
           "user_nickname": "sho",
           "current_uri": "/entry/login",
           "flash": {
           },
           "language_ietf": "en-AU"
       }

    That turns into 700B after marshalling and the addition of the ginormous session ID – by itself 362 bytes! It’s easy to see how chucking a bit of flash text and maybe some nice long previous URLs or “pages you just looked at”, a “remembrall” key or whatever could blow out the size real quick. You’re not just storing the values in a hash, remember – you’re storing the key, the class of the key (and the value) .. a single integer might be a byte of data but the coded indicator that it’s an integer, the key, the class of the key .. you could be looking at 20 bytes overhead. Doesn’t sound much but adds up fast and Rails doesn’t exactly encourage you to be niggardly with the session.

    I don’t really know how much it uses in the cookie implementation, will have to check that out. This is sizes from the DB, I presume they’re similar.

    Whatever. I think the whole issue of scaling the session store has been blown out of all proportion anyway. Presumably people’s sites are not completely static but for the sessions – given the lightweight nature of the sessions I would have thought they were the least of anyone’s worries. How many Rails sites are there that are so popular they have actually outgrown, say, a decent dedicated DB sessions server? That could probably do several thousand transactions a second? My guess is, “none”.

    I didn’t change over my sessions management because of scaling concerns, I changed it over because I am a stickler for simplicity and didn’t want to worry about running two types of database. If I ever get big enough that scaling sessions becomes an actual concern then I’ll likely be rich enough to pay someone else to worry about that shit.

  13. Wincent Colaiuta Says:

    Out of curiosity I had a look at the sessions table in my MySQL database. It has an (integer) id column, a VARCHAR(255) column for the session id (which looks to be a 32-character hex-encoded hash), a TEXT blob for the session data itself, as well as two DATETIME columns for “updated at” and “created at”.

    I have no idea what’s in the TEXT blobs because they look to be Base64 encoded (why aren’t they just binary data in a binary BLOB column?). Don’t know if they’re also encrypted just like the cookie-backed session data are.

    The average size looks to be about 100 characters (bytes), although some longer blobs look as much as 300 or more chars.

    (In any case, think I’d better expire some of the old sessions… the table is starting to get pretty big…)

  14. Wincent Colaiuta Says:

    Funnily enough, clearing my sessions table turned out to be trickier than I thought.

  15. Sho Says:

    Interesting. You’d think Rails had a built-in rake task that did something a little more nuanced than just dumbly nuking the whole sessions table.

    Your wiki post also reminds me of how much I hate screwing around with datetime fields in the DB. I haven’t done it in sessions, but a nice trick is to include a simple “seconds since epoch” integer field as a supplement to the native datetime. Much easier to manipulate, for me anyway. And you can write a nice rake task to just do

    Session.destroy(:all, :conditions => {:updated_at_epoch < 4.weeks.ago.utc.to_f})

    Ah well, sounds like you have the problem well in hand.

    BTW the atom feed on your blog is giving me a 500 error. Can’t tell you for how long it’s been like that; I only just noticed since your blog’s feed is in a folder in Mail.app. I had assumed you were just feeling quiet but saw some fairly recent entries upon visiting your site just now…

  16. Sho Says:

    I have no idea what’s in the TEXT blobs because they look to be Base64 encoded

    I am guessing they would look very similar to the unmarshalled data I posted a few comments up. I don’t think they are encrypted but don’t actually have a recent enough Sessions DB to say – I abandoned the native Rails sessions code some time ago, as mentioned in several previous posts.

    The simple unmarshal code in the post will answer your question, anyway. If it works, they weren’t encrypted ;-)

    (why aren’t they just binary data in a binary BLOB column?)

    You tell me, pal. And while you’re telling me stuff, I’d also like to know why the hell the Flash system (which is basically a hash) has to be implemented using a special “FlashHash” class just so it can respond to a few “convenience” methods.

    Without this bullshit magic for magic’s sake, the sessions class above would be about 70% shorter. What the hell is wrong with just storing the session as a JSON-encoded hash?

    You might have noticed my ardour for everything Rails has somewhat diminished over the last year or so. This kind of crap is exactly the reason why.

  17. Wincent Colaiuta Says:

    Ah, thanks for letting me know about the 500 error. Hadn’t noticed it myself. Haven’t investigated it yet but no doubt it will be fall-out caused by one of the “upgrades” to between Rails 2.0 and 2.1.1.

    You know, this kind of breakage is exactly why I now make a conscious, disciplined effort to refer to software “updates” rather than “upgrades”.

  18. Wincent Colaiuta Says:

    Ok, fixed the 500. Thanks for the heads-up.

  19. Wincent Colaiuta Says:

    Yet another reminder that I need more specs… Thing is, I’ve never written specs for XML feeds before. Looks like I’ll have to figure out how.

  20. Sho Says:

    Ah, great. I was missing my twice-weekly dose of “involuntary reboot log” misery ;-)

    Sigh. I also haven’t done any testing for XML. Fucking testing man, it’s a god damn never ending rabbit hole.

    What was the problem, out of interest?

  21. Wincent Colaiuta Says:

    The problem was a change in the way routing worked in 2.1. Off the top of my head it was a side-effect of changing from “:controller” to “:as”.

  22. Wincent Colaiuta Says:

    Resurrecting this old thread… I let the sessions table grow and just tried to purge it. 25 minutes to execute the query even with the website shut down and the maintenance page up (ie no contention for or other connections to the database)!

    In the light of this I’m actually thinking of switching to the cookie store, even though I’ve never liked it (for the reasons already discussed here and elsewhere).

    More details on the query performance at: http://rails.wincent.com/issues/1142

  23. Sho Says:

    25 minutes does seem like an awfully long time for such a simple operation, even with the quantity of records you describe. Was the table indexed on created_at?

    It might be interesting to recreate a huge table like that, with random timestamps, and test different strategies of going about that task. I would also be interested to test CouchDB against that scenario, I haven’t done that yet.

    Still, you had let stale sessions accumulate for 30 days or more. After the initial big purge, surely the daily cron job wouldn’t take more than a minute or so? I wouldn’t have thought that was such a big deal, certainly less painful than redoing the whole thing with cookies…

  24. Sho Says:

    To further update this post, I’d just like to note that I don’t even use Rails sessions anymore anyway. They required so many hacks to make them work the way I wanted, and they never really worked that well anyway, that I just turned them off and replaced with a (much simpler) custom system.

    It took a while to get over my fear of messing with the Rails Black Box™ but eventually I just wrote my own, which was very easy, and now I’m much happier. Weird custom classes to store Flash data? Marshalling plain text for unknown reasons? Fuck that shit.

    I can write up a (brief) tutorial of how to do this if anyone is interested.

  25. Wincent Colaiuta Says:

    Yeah, would be interesting to hear more details about what you did.

    And you’re right: 25 minutes is ludicrous. But it seems to be consistently ludicrous. Maybe that kind of query is an insane edge case that triggers a hideously inefficient codepath in MySQL. Who knows?

    Good idea to see what indexes are on that table (don’t know as the table was created by Rails using its defaults) and perhaps play around seeing how long it takes to prune it using different methods.

    On the other hand, switching to cookie-based sessions would be a one-line environment.rb tweak so may just go that way too.

  26. Wincent Colaiuta Says:

    Incidentally, tried updating to the Rails 2.2 RC and everything broke hideously. Still haven’t had time to figure out why. This is why I hate updating Rails.

Leave a Reply