Peter's Blog

Redefining the Impossible

Putting mysql scripts in subversion


I have developed my strategy for putting drupal database dumps into subversion every day. This was slightly more complicated than I though it would be. In principle I could just use mysqldump to dump the sql and put that in subversion but the problem is that mysqldump by default will output the insert statements to reconstruct a table in a single line in the file, even of that line grows to 600k long. If subversion does a line-by-line diff on the files it stores it will end up writing all 600k to it's transaction log.

To minimise the sizes of each diff I dump the mysql in two parts, the data and the structure. For the data dump I do the following:

  • use --skip-opt to ensure that each row is inserted on a seperate line: this limits the line length. The --opt option is the default for my version of mysql (4.1.13) so I have to turn it off.
  • remove comments
  • strip out data for the 'accesslog', 'cache', 'search_index' and 'sessions' tables. I don't think these need backing up.
  • sort the remaining lines into alphabetical order as I am not sure the sql dump is guaranteed to always dump the rows in the same order

The structure is dumped straight, I certainly don't sort the lines in the file!

This creates sql data files that should diff pretty optimally.

Here is the bash script that does this for me:

   1  #!/bin/bash
   2  
   3  function SqlDumpData {
   4      mysqldump -u secret -psecret --no-create-info --skip-opt --comments=0 $1 | \
   5        egrep -v "INSERT INTO \`(accesslog|cache|search_index|sessions)\`" | sort >$2
   6  }
   7  
   8  function SqlDumpStructure {
   9      mysqldump -u secret -psecret --no-data $1 >$2
  10  }
  11  
  12  cd /home/peterc/DatabaseDumps
  13  
  14  SqlDumpData petersblog petersblog_data.sql
  15  
  16  SqlDumpStructure petersblog petersblog_structure.sql
  17  
  18  svn commit -m "daily backup"

This is all done on the server. Following this I can use my standard strategy to backup the subversion repository.

I have tested that the sql dumps can be reimported into mysql and give a functioning website.


Filed under: backup drupal mysql subversion

Uwe Hermann Says:

Hi Peter,

thanks for this post, it's very helpful! I do regular backups of my Drupal sites in svn, too, but was a bit reluctant to include the DB dumps because of the size... I'll try your method tomorrow, that should save lots of disk space.

P.S. I _knew_ it was a good idea to set up Planet Drupal ;)

Uwe.

Neil Drumm Says:

Peter Says:

I could have run the dump through something that looked for the '),(' between the insert statements and added a newline but I thought it was possible to accidently reformat the blog postings.

cat -s is a good tip.

Peter

Peter Says:

I am not sure that the dumps are as optimal as they could be. I could take out the watchdog entries but I don't want to lose them in case they are helpful in phorensics. Also the awtags tag popularity table gets updated regularly and I'm not sure that needs archiving.

Peter

Dave Says:

I like the idea of checking in database dumps. Haven't been doing that but I'll start.

Here's a nice trick if you're using db_prefix for some tables. Let's say you want to dump only the tables that begin with "_shared"; pipe your mysqldump through:

sed -e '/^CREATE TABLE shared_/,/;$/ !d'

And for the data dump pipe through:

sed -e '/shared_/ !d'

Boaz Says:

Hi Peter, Haven't tried this yet (aside from playing a bit) but I have encountered the problems is comes to solve. Looks like a very helpful piece of code/script!

Thanks a lot! Boaz.

Mike Says:

Hope this question doesn't reveal me as a total ignorant -- which I am >;)

Is this a manageable solution for a site with tons of content (ie. over 1gb)? Won't the enitre db be backed up each time?

Thanks for your time, Mike

Peter Says:

I'd say this was too much for a database of that size, although it may depend on how much the data changes between dumps (bearing in mind it is not necessarily worth dumping cache tables, session tables etc).

Personally I wouldn't do it, I'd be bzip2ing big sql dumps like that.

Peter

yhager Says:

This is very helpful, and I've been using it on my subversion based sites.

You might want to add a '--set-charset' option to the mysqldump call in SqlDumpData. Without adding this, data may come out as different charset at the server.

yhager Says:

If you remove the search index from the dump, you also need to remove the variables about the search_index status, otherwise, indexing will not take place in the site after database restore. Add: egrep -v "INSERT INTO \`variable\` VALUES \('node_cron_last"

anon Says:

Shouldn't cache by cache.* to remove all cache tables?

The revised regex would be:

"INSERT INTO \`(accesslog|cache|search_index|sessions)\`"

Without that, cache_content still gets dumped for me...

Have Your Say

I welcome constructive comments or questions but I reserve the right to delete any comments that displease me.

Who are you?

(Optional) If you enter an email address here I might email you back. Your email address will not be sold to spammers or shown anywhere

What do you have to say?