Peter's Blog

Redefining the Impossible

Putting mysql scripts in subversion


I have developed my strategy for putting drupal database dumps into subversion every day. This was slightly more complicated than I though it would be. In principle I could just use mysqldump to dump the sql and put that in subversion but the problem is that mysqldump by default will output the insert statements to reconstruct a table in a single line in the file, even of that line grows to 600k long. If subversion does a line-by-line diff on the files it stores it will end up writing all 600k to it's transaction log.

To minimise the sizes of each diff I dump the mysql in two parts, the data and the structure. For the data dump I do the following:

  • use --skip-opt to ensure that each row is inserted on a seperate line: this limits the line length. The --opt option is the default for my version of mysql (4.1.13) so I have to turn it off.
  • remove comments
  • strip out data for the 'accesslog', 'cache', 'search_index' and 'sessions' tables. I don't think these need backing up.
  • sort the remaining lines into alphabetical order as I am not sure the sql dump is guaranteed to always dump the rows in the same order

The structure is dumped straight, I certainly don't sort the lines in the file!

This creates sql data files that should diff pretty optimally.

Here is the bash script that does this for me:

   1  #!/bin/bash
   2  
   3  function SqlDumpData {
   4      mysqldump -u secret -psecret --no-create-info --skip-opt --comments=0 $1 | \
   5        egrep -v "INSERT INTO \`(accesslog|cache|search_index|sessions)\`" | sort >$2
   6  }
   7  
   8  function SqlDumpStructure {
   9      mysqldump -u secret -psecret --no-data $1 >$2
  10  }
  11  
  12  cd /home/peterc/DatabaseDumps
  13  
  14  SqlDumpData petersblog petersblog_data.sql
  15  
  16  SqlDumpStructure petersblog petersblog_structure.sql
  17  
  18  svn commit -m "daily backup"
Toggle Line Numbers

This is all done on the server. Following this I can use my standard strategy to backup the subversion repository.

I have tested that the sql dumps can be reimported into mysql and give a functioning website.


Filed under: backup drupal mysql subversion

12 Comments

Uwe Hermann Says:

over 4 years ago

Hi Peter,

thanks for this post, it's very helpful! I do regular backups of my Drupal sites in svn, too, but was a bit reluctant to include the DB dumps because of the size... I'll try your method tomorrow, that should save lots of disk space.

P.S. I _knew_ it was a good idea to set up Planet Drupal ;)

Uwe.

Neil Drumm Says:

over 4 years ago

Peter Says:

over 4 years ago

I could have run the dump through something that looked for the '),(' between the insert statements and added a newline but I thought it was possible to accidently reformat the blog postings.

cat -s is a good tip.

Peter

Peter Says:

over 4 years ago

I am not sure that the dumps are as optimal as they could be. I could take out the watchdog entries but I don't want to lose them in case they are helpful in phorensics. Also the awtags tag popularity table gets updated regularly and I'm not sure that needs archiving.

Peter

Dave Says:

over 4 years ago

I like the idea of checking in database dumps. Haven't been doing that but I'll start.

Here's a nice trick if you're using db_prefix for some tables. Let's say you want to dump only the tables that begin with "_shared"; pipe your mysqldump through:

sed -e '/^CREATE TABLE shared_/,/;$/ !d'

And for the data dump pipe through:

sed -e '/shared_/ !d'

Boaz Says:

over 2 years ago

Hi Peter, Haven't tried this yet (aside from playing a bit) but I have encountered the problems is comes to solve. Looks like a very helpful piece of code/script!

Thanks a lot! Boaz.

Mike Says:

over 2 years ago

Hope this question doesn't reveal me as a total ignorant -- which I am >;)

Is this a manageable solution for a site with tons of content (ie. over 1gb)? Won't the enitre db be backed up each time?

Thanks for your time, Mike

Peter Says:

over 2 years ago

I'd say this was too much for a database of that size, although it may depend on how much the data changes between dumps (bearing in mind it is not necessarily worth dumping cache tables, session tables etc).

Personally I wouldn't do it, I'd be bzip2ing big sql dumps like that.

Peter

yhager Says:

over 2 years ago

This is very helpful, and I've been using it on my subversion based sites.

You might want to add a '--set-charset' option to the mysqldump call in SqlDumpData. Without adding this, data may come out as different charset at the server.

yhager Says:

about 1 year ago

If you remove the search index from the dump, you also need to remove the variables about the search_index status, otherwise, indexing will not take place in the site after database restore. Add: egrep -v "INSERT INTO \`variable\` VALUES \('node_cron_last"

anon Says:

about 1 year ago

Shouldn't cache by cache.* to remove all cache tables?

The revised regex would be:

"INSERT INTO \`(accesslog|cache|search_index|sessions)\`"

Without that, cache_content still gets dumped for me...

phil Says:

about 1 year ago

THANKS a LOT! I couldn't use egrep trick and back ticks in my case (mac os X and apple script), so I used a mysql option (--ignore-table). Now it works fine with git on a Mac GUI (see on blog.vocamen.com, not updated for now with ignore table)

Sorry but comments on this post are now closed.