couchdb: switching to experimental branch
Current CouchDB development is progressing in an experimental branch named “interface-changes”, and it’s not in trunk yet. As the name implies, the interface is changing and it’s very useful.
To switch your svn checkout to this branch, do this:
svn sw http://couchdb.googlecode.com/svn/branches/interfacechanges/ ./bootstrap && ./configure --prefix=/opt/local && make clean && make && sudo make install
You can then have a look at the “documentation” to see some of the new changes.
While still alpha, it’s still a very interesting project. The prospect of getting something like Map/Reduce capability natively in a database is almost too exciting for words, if you’re a data nerd like me. Well, figuratively speaking, Map’s there, but Reduce isn’t yet - still, it’s great to get your hands on what seems sure to become a Big Thing so early. Oh, and in case you were wondering, COUCH stands for “Cluster of Unreliable Commodity Hardware”.
Imagine GFS and Map/Reduce, baked into a single databse, with JSON in/out, pluggable query language, and native REST .. what’s not to love?
Laika’s ThingFish is a Ruby-based competitor. You might think I’d be more interested in that, since it’s in my favourite language! Not so. The very thought of a Database programmed in Ruby actually gives me an instant waking nightmare along the lines of running through treacle, gulliver in lilliputia, 80286SXs with 12MB of RAM, etc. And using mongrel server for a database!
UPDATE: it’s been merged to trunk, so just forget that interface-changes stuff.
October 12th, 2007 at 8:08 pm
Gotta admit that I can’t yet see the usefulness of this, at least for me. Of course, it probably is useful for lots of people, I just can’t see it yet.
What exactly are you using this for?
October 13th, 2007 at 2:08 am
Well, nothing as yet. It’s not functional enough to really do anything I’m interested in yet. For example, the primary interface to a database like this - the precomputed views - are not yet functional for ad hoc queries.
But it’s certainly promising. And the usefulness should be obvious. It’s intended to solve the scaling problems of the RDBMS. As you well know, the DB is the bottleneck for pretty much every modern web application. And it’s not just a bottleneck in production - it’s a negative influence the whole time, since one knows the DB is the problem, you design to reduce load on it. So we see silly situations like, for example, Twitter’s experience scaling - I choose that example because they’re a small, web-only company, with a lot of database problems, and less than infinite money to throw at solving them.
Twitter basically had to partition and denormalise the hell out of its DB. Once you’ve denormalised that far, one wonders what the point of using the RDBMS is in the first place. DHH himself has expressed his lack of love for enforcing relationships in the DB, preferring to leave that to the app. So we’re not using 95% of the features of an RDBMS, but we’re still using them, and constraining ourselves to their performance shortcomings.
CouchDB and others (there’s ThingFish, but others also include Hadoop, Amazon’s Dynamo and of course the original GFS with Map/Reduce) are an attempted to solve the “how do you store and access a whole lot of semi-structured data in a distributed manner while still retaining database-style access to it”.
The possibilities are amazing, and I fully expect this kind of DB to usurp the traditional RDBMS within the next few years. So, may as well get on board early : )
To get a better idea, maybe check out the wiki.
October 19th, 2007 at 2:25 am
I’d be interested to hear why a “database” (and ThingFish is only very loosely describable thusly) programmed in Ruby “gives [you] an instant waking nightmare”. Is there some fundamentally nightmarish side of Ruby I haven’t discovered in the 7 years I’ve been using it? Also, what makes Mongrel unsuitable as an HTTP server (which is fundamentally what ThingFish really is)?
October 19th, 2007 at 11:14 am
Michael: Simply because Ruby is incredibly, incredibly slow.
Don’t get me wrong, I am a Rails developer and a big fan of Ruby. But it’s slow as hell, and I just don’t want to have to worry about yet another link in the chain suffering from its speed issues.
Another concern I have is with Mongrel’s long-life behaviour and garbage collection. I have to restart my mongrel servers at least every few hours if they’re not in constant use. You’ll find many Mongrel users advising cron jobs to restart the servers every few hours, or “spinner” processes that watch to see if they’re up. Compare this flakiness to Apache, or, for that matter, the highly reliable MySQL - nope, I don’t relish the idea of a Mongrel-based DB server much at all.
However, I’m highly interested in the project and will regardless keep close tabs. I do use Rails, after all, despite the operational problems, and I’m convinced these new DB paradigms will revolutionise web applications as we know them, and I’ve been playing around with designing and building a Ruby distributed database myself for the last year. Take my comments as tongue-in-cheek.
That said, the idea of a query taking 300ms (or more!) because it has to filter through Ruby really does make me shiver.
October 19th, 2007 at 9:17 pm
I’ve yet to try deploying anything with Mongrel yet, but I find such tales a little bit alarming. The thing is, without direct experience and having seen only anecdotal evidence it’s hard to know what the underlying cause of the problem is.
As an example, check out this post where they talk about memory usage ballooning as a result of “leaks” (really just inefficient resource management) in the Rails app running on Mongrel; and here is the follow-up post by the Mongrel author.
October 19th, 2007 at 9:33 pm
Wincent: thanks for the link.
I agree completely with the mongrel author there. It’s entirely possible that something, anything, in what I (and many, many others) are running that’s causing these problems. The problem is I/we don’t know what, and it’s very hard to find out. There’s no “best practise” for deploying mongrel yet, that I can find anyway.
The problems I’ve had with mongrel have been weird. On my local machine, it’s a dream - works all the time, no problems. On my server it’s less of a dream - seems poky, uses a LOT of memory, and feels slower. I haven’t benchmarked this, I guess I should before mentioning it. Still working fine though.
The real problems have been with mongrel_cluster, which is, as far as I know, the “best” way to deploy mongrel in a “real” situation, with a front-end of Apache or nginx or what have you. Mongrel cluster is *extremely* flaky and it’s very difficult to find out exactly is going wrong and where. My sites literally become unresponsive and start timing out and giving 500 errors unless they’re being regularly accessed by clients - I frequently have to restart the clusters, and it happens so often I’ve written scripts to do this. I am actively looking into this problem, between other things, and I haven’t yet solved the problem. I know that many others have had similar problems.
This is all very new, “cutting edge” stuff compared to the PHP & Apache example, which is basically a solved problem - anyone who can’t deploy a fast, reliable php/apache site is basically incompetent, but it’s nowhere near that simple with the mongrel situation. Maybe it will be in a few years, let’s see.
October 20th, 2007 at 7:51 pm
I have now changed from mongrel_cluster to monit and most of my complaints seem to have gone away. Monit automatically starts and stops, monitors ports for failure, and is really easy to use.
Highly recommended.
October 20th, 2007 at 8:48 pm
The trouble with monit is that it is essentially a very elegant, elaborate, complicated band-aid. It alleviates the symptoms, but doesn’t correct the underlying causes.
Of course, having said that, if you don’t know the causes and therefore don’t know how to address them, then having a band-aid is better than not nothing at all.
October 20th, 2007 at 9:34 pm
Right you are, but I’m beginning to think the problem *was* mongrel_cluster. My two main problems were the tendency of the mongrel processes to “fall asleep” and remain unresponsive while Apache tried to proxy requests through them - that seems to have gone away. My other main problem was very slow response on the odd occasion they *were* working (usually right after a restart), which also seems to be completely resolved.
I haven’t been running this system for long enough to know if these problems have gone away for good, but it’s certainly looking good. I admit I have difficulty understanding how using mongrel_cluster as the “management” tool could cause the slow response but it certainly seems to have improved. The mongrel_rails launch commands are identical, but the monit-launched process shows up as “ruby” while mongrel_cluster’s shows up as “mongrel_rails” in top … no idea what’s going on there. I had no problems with performance at all after manually launching mongrel, monit is simply duplicating the manual commands in an organised and automated fashion. I have no idea what differed when mongrel_cluster performed the launch.
Anyway I’ll be trialling it over the next few days so will report on my findings. I’ve been incredibly frustrated for weeks now about the treacle-like performance of my apps, and that’s now massively improved, somehow. I’d like to get to the bottom of this - there must be some other factors I’ve missed - but for now the band-aid has worked wonders.
November 29th, 2007 at 11:13 am
I’d just like to follow up here and say that a combination of the maturing mongrel software (now up to v. 1.1.1) and my adaptation of monit groups and discarding of mongrel_cluster have completely solved any issues I’d had. I still don’t know what the hell m_c was doing that was so deleterious to my experience but whatever, it’s gone now.
Monit groups + mongrel proxied through Apache are a completely different experience and I’d recommend them now as the best way to do rails serving. I can post some reference setup files if anyone’s interested.
I’m so glad this is a “fixed” problem, after wasting endless amounts of time on it I haven’t had to touch it now for over a month - a great relief!