Backup of a WordPress Site into a Docker container

WordPress is a publishing software used by many users. This website (http://www.opencloudblog.com) is using WordPress. If somebody has a running website with useful content, it’s a good practice to backup the data. And it is a even better practice to VERIFY that the data in a backup can be read and reused.

To verify the data, an installation of a LAMP (Linux, Apache, Mysql, PHP) stack is needed. The docker container framework is a good tool to implement this LAMP stack. In order to create a copy of a running WordPress site in a Docker container the following steps are necessary:

Create a simple LAMP Docker container, which contains everything. Expose the apache server via port 8080 and ssh via port 22
Copy the data from the WordPress site (mysql database and the webserver content)
Create a new Docker container, copy the backup data to this container and expose the webserver via http://127.0.0.1:8080 on the local host

Create a LAMP container

The LAMP Docker container is the base for the WordPress installation. The solution shown here is based on the work of https://github.com/tutumcloud/tutum-docker-lamp . The Dockerfile is modified and a ssh daemon is added to the container (I like SSH…). The database is also stored in the LAMP container to reduce the number external dependencies. The Dockerfile for the LAMP stack is:

# Main source: https://github.com/tutumcloud/tutum-docker-lamp
#
# docker build --no-cache --force-rm=true -t wp/lamp .
#
# VERSION               1.0.0

FROM     ubuntu:14.04
MAINTAINER Ralf Trezeciak

# make sure the package repository is up to date
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y upgrade

RUN DEBIAN_FRONTEND=noninteractive apt-get -y install openssh-server supervisor git apache2 libapache2-mod-php5 mysql-server php5-mysql pwgen php-apc

RUN apt-get clean

RUN mkdir /root/.ssh/
RUN mkdir /var/run/sshd
#
# insert your public ssh key here and remove the comment
#RUN echo "" > /root/.ssh/authorized_keys
RUN sed -i 's/^PermitRootLogin.*/PermitRootLogin yes/' /etc/ssh/sshd_config 

# Add image configuration and scripts
ADD start-apache2.sh /start-apache2.sh
ADD start-mysqld.sh /start-mysqld.sh
ADD start-sshd.sh /start-sshd.sh
ADD run.sh /run.sh
RUN chmod 755 /*.sh
ADD my.cnf /etc/mysql/conf.d/my.cnf
ADD supervisord-apache2.conf /etc/supervisor/conf.d/supervisord-apache2.conf
ADD supervisord-mysqld.conf /etc/supervisor/conf.d/supervisord-mysqld.conf
ADD supervisord-sshd.conf /etc/supervisor/conf.d/supervisord-sshd.conf

# Remove pre-installed database
RUN rm -rf /var/lib/mysql/*

# Add MySQL utils
ADD create_mysql_admin_user.sh /create_mysql_admin_user.sh
ADD import_sql.sh /import_sql.sh
ADD create_db.sh /create_db.sh
RUN chmod 755 /*.sh

# init the databases
RUN mysql_install_db --user=mysql 

# config to enable .htaccess
ADD apache_default /etc/apache2/sites-available/000-default.conf
RUN a2enmod rewrite

# Configure /app folder with sample app
RUN git clone https://github.com/fermayo/hello-world-lamp.git /app
RUN mkdir -p /app && rm -fr /var/www/html && ln -s /app /var/www/html

EXPOSE 22 8080 
CMD ["/run.sh"]

The other files needed in the docker directory are contained the following tar.gz file: Additional LAMP files

Now it’s time to build the LAMP Docker container using the command

docker build --no-cache --force-rm=true -t wp/lamp .

Copy WordPress data

To copy the wordpress data requires two steps. The first step is to copy the directory containing the php wordpress code and the static content (e.g. the media files). The following code (works for Strato WordPress installations)

# assume, that rsync can be used via ssh, copy the wordpress static data
#
rsync -av -e ssh <user>@<yoursite>:<your wordpress directory> ./data/

copies the static content to the local directory ./data/ .

Now the data of the mysql database is required. Your provider might offer a web/php backup tool, but I prefer a scripted solution, which works for Strato WordPress installations. Strato creates backups of the mysql database once per hour. The following script copies the last version of the backup to a local file:

#!/bin/bash
#
USERID='<the user id of your database>'
DBNAME='<the name of the database>'
PASSWORD='<the password of your database>'
#
SSH='ssh -n <your login>@ssh.strato.de '
SCP="scp -p <your login>@ssh.strato.de"
#
# get the name of the last backup
#
LASTDB=$($SSH mysqlbackups ${USERID} | head -1)
#
DATUM=$(date +%s)
#
DUMPFILE="wp-database-backup.sql"
#
# make a copy of the database backup
#
$SSH "mysqldump --add-drop-table -h ${LASTDB} -u ${USERID} -p${PASSWORD} ${DBNAME} > ${DUMPFILE}"
XDIR=.
X=$(${SCP}:${XDIR}/${DUMPFILE} .)
#

Create the directory to for the WordPress container

Now you must create the directory for the WordPress docker container data. This directory contains the docker manifest and the static data.

Post process the WordPress data

After you have downloaded the WordPress data from your WordPress site, it is necessary to postprocess the data.

Create a tar gz file of the static content
Change the WordPress „site url“ in the mysql backup to 127.0.0.1:8080
Get the wp-config.php file from the static data and change the parameters for the mysql database

The compression of the static data is done by the following script:

#!/bin/bash
#
cd <path to static data>
# fix the permissions
find . -type d -exec chmod 0755 {} \;
find . -type f -exec chmod 0644 {} \;
# create a tar.gz file in the WP container directory
tar -clSzpf wordpress.tar.gz ./

Changing the WordPress site URL is done using the following script:

#!/bin/bash
cd <path to static data>
# change the sed command and replace www.opencloudblog.com by the url of your site 
cat wp-database-backup.sql | sed 's#www\.opencloudblog\.com#127.0.0.1:8080#g' > wordpress.sql 
python <wp docker directory>/sfix.py wordpress.sql

The used solution to do this is very dirty!!! I used this, because I did not find any simple solution to do this using WordPress. After running sed on the sql backup file, the database needs to by fixed, because string lengths might change (keyword php serialization). To do this, I use the python helper script sfix.pl from https://gist.github.com/astockwell/6489489:

#!/usr/bin/env python
#
# Source: https://gist.github.com/astockwell/6489489

import os, re

# Regexp to match a PHP serialized string's signature
serialized_token = re.compile(r"s:(\d+)(:\\?\")(.*?)(\\?\";)")

# Raw PHP escape sequences
escape_sequences = (r'\n', r'\r', r'\t', r'\v', r'\"', r'\.')

# Return the serialized string with the corrected string length
def _fix_serialization_instance(matches):
  target_str = matches.group(3)
  ts_len = len(target_str)
  
  # PHP Serialization counts escape sequences as 1 character, so subtract 1 for each escape sequence found
  esc_seq_count = 0
  for sequence in escape_sequences:
      esc_seq_count += target_str.count(sequence)
  ts_len -= esc_seq_count
  
  output = 's:{0}{1}{2}{3}'.format(ts_len, matches.group(2), target_str, matches.group(4))
  return output
    
# Accepts a file or a string
# Iterate over a file in memory-safe way to correct all instances of serialized strings (dumb replacement)
def fix_serialization(file):
  try:
    with open(file,'r') as s:
      d = open(file + "~",'w')
      
      for line in s:
        line = re.sub(serialized_token, _fix_serialization_instance, line)
        d.write(line)
        
      d.close()
      s.close()
      os.remove(file)
      os.rename(file+'~',file)
      print "file serialized"
      return True
  except:
    # Force python to see escape sequences as part of a raw string (NOTE: Python V3 uses `unicode-escape` instead)
    raw_file = file.encode('string-escape')
    
    # Simple input test to see if the user is trying to pass a string directly
    if isinstance(file,str) and re.search(serialized_token, raw_file):
      output = re.sub(serialized_token, _fix_serialization_instance, raw_file)
      print output
      print "string serialized"
      return output
    else:
      print "Error Occurred: Not a valid input?"
      exit()

if __name__ == "__main__":
  import sys
  
  try:
    fix_serialization(sys.argv[1])
  except:
    print "No File specified, use `python serialize_fix.py [filename]`"

Place sfix.py in the wp container directory.

Do not forget to save a copy of the wp-config.php file to the wp container directory and set the following values to:

// ** MySQL settings - You can get this info from your web host ** //
/** The name of the database for WordPress */
define('DB_NAME', 'wordpress');
/** MySQL database username */
define('DB_USER', 'root');
/** MySQL database password */
define('DB_PASSWORD', '');
/** MySQL hostname */
define('DB_HOST', 'localhost');

The WordPress Docker Container

The wordpress docker container is build using the following Dockerfile:

#
# docker build --no-cache --force-rm=true -t wp/blog .
#
# docker run -d -p 8080:8080 -p 8022:22 --name=oc wp/blog
#
FROM wp/lamp:latest
MAINTAINER RTR

# Configure WordPress to connect to local DB
ADD wp-config.php /app/wp-config.php

# save the original config
RUN mv -v /app/wp-config.php /app/wp-config.php.org

# copy the wordpress stuff from strato
ADD wordpress.tar.gz /app/

# save the strato config
RUN mv -v /app/wp-config.php /app/wp-config.php.strato

# restore the original config
RUN cp -v /app/wp-config.php.org /app/wp-config.php

# add the owner of the content and change the owner and Modify permissions to allow plugin upload
RUN useradd -d /var/www -g www-data -M -s /usr/sbin/nologin -c "Owner of the content" www-owner &&\
    chown -R www-data:www-data /app &&\
    find /app -type d -exec chmod 775 {} \; 

# Modify permissions to allow plugin upload
# RUN chmod -R 777 /app/wp-content

# copy the database backup
ADD wordpress.sql /root/

# copy the db import script
ADD import-db.sh /root/
  
RUN  bash /root/import-db.sh

# apache listens to 8080
RUN echo "Listen 8080" > /etc/apache2/ports.conf

EXPOSE 22 8080
CMD ["/run.sh"]

The Dockerfile requires a helper script to import the database:

#!/bin/bash

/usr/bin/mysqld_safe > /dev/null 2>&1 &

RET=1
while [[ RET -ne 0 ]]; do
    echo "=> Waiting for confirmation of MySQL service startup"
    sleep 5
    mysql -uroot -e "status" > /dev/null 2>&1
    RET=$?
done

mysql -u root -e "create database wordpress;"
mysql -u root wordpress < /root/wordpress.sql
ls -la /var/lib/mysql/

mysqladmin -uroot shutdown

To build the image, you need to run

docker build --no-cache --force-rm=true -t wp/blog .

Check with the command docker images, that the image has been created:

# docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
wp/blog                latest              4fffcac92b7d        9 seconds ago       677 MB
ubuntu                 14.04               accae238329d        3 weeks ago         221.1 MB
wp/lamp                latest              932e7a77f9dd        2 hours ago         518.2 MB

Run the container

Now it’s time to run your copy of your wordpress site. Create a container using the command:

docker run -d -p 8080:8080 -p 8022:22 --name=wordpresscopy wp/blog
# check if the container is running
docker ps

Connect to http://127.0.0.1:8080 to connect to the webserver and access the local copy of the wordpress site.Now you may run updates of wordpress, plugins, themes,… in the container. If the versions of PHP, apache, mysql match the version of your original site, you can use the docker copy of your WordPress site as a test system.

You can login to the container using ssh to port 8022 (ssh -l root -p 8022 127.0.0.1)

CONTAINER ID   IMAGE            COMMAND    CREATED         STATUS          PORTS                                        NAMES
81595c6da3c2   wp/blog:latest   "/run.sh"  29 seconds ago  Up 28 seconds   0.0.0.0:8022->22/tcp, 0.0.0.0:8080->8080/tcp wordpresscopy

You can create an image from your started container to create an archive of snapshots, holding e.g. monthly copies of the original content to review changes:

#
# stop the conatiner
docker stop wordpresscopy
#
# get the ID of the created docker container
ID=$(docker inspect --format="{{ .Id }}" wordpresscopy)
#
# create an image from a container
docker commit ${ID} wordpress/archive-2014-11
#
# list all images
docker images