How Dreamhost cron jobs ran as duplicates and their new management feature fixed it

May 12, 2007

I've been hosting my private web operations on Dreamhost. I'm generally really happy with what they are providing me, but recently, there was something that drove me nuts. I wrote Reposter and made it power my Misc Random site. And while everything was OK in testing, entries kept showing up on the live site as duplicates. And I for the world couldn't figure it out.

But recently, Dreamhost rolled out a feature to manage your cronjobs in their backend, instead of having to do it manually with the crontab file. Which is also OK, but the backend interface is simpler to use. See the Unofficial Dreamhost Blog for a quick example.

And the backend manager has an actual advantage that also fixed my duplicate entries bug, that was consistent with my hunch about why it was happening. It seemed simply that multiple copies of the reposter script were being run at the same time, and this is why it also made dupe entries.

And so now, if you add some cron jobs through the Dreamhost backend, they get added to your crontab file in a special section. And if you take a look inside the crontab, you'll see that it's prepended with some helper utility that's called "setlock" and that supposedly takes care of making sure that your script is only run once. (Whether or not it gets prepended is actually controlled by a checkbox when you're editing the cronjob.)

And so I enabled that locking thing for Reposter, and lo and behold, all dupes gone and me happy. Thanks Dreamhost. Though it remains to me a mystery why it ran in multiple copies in the first place. Isn't the point of crontab that stuff gets run only once? But perhaps it has something to do with their server farming... it's not like my stuff would only be in one place at Dreamhost, but that the shell hosts, filesystems and databases are all in different places and so maybe the cron thing is run somehow across hosts and has parallel invocations... I have no idea. I just know that I migrated my cron jobs to their backend which adds the setlock thing which does the job.