Deploying Thinking Sphinx

This is how I like to deploy Thinking Sphinx. In summary:

  1. Install Sphinx on the server.
  2. Decide where you want Sphinx’s PID file and indexes in production.
  3. Ignore Sphinx’s configuration and indexes in development.
  4. Configure Capistrano to work with Thinking Sphinx.
  5. Set up cron on the server to re-index your data regularly.

Instructions

1. Install Sphinx on the server.

On server:


$ curl -O http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
$ gzip -d sphinx-0.9.8.1.tar.gz 
$ tar xvf sphinx-0.9.8.1.tar 
$ cd sphinx-0.9.8.1
$ ./configure
$ make
$ sudo make install

And to make sure it installed correctly:


$ search

Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

Usage: search [OPTIONS] 
...

2. Decide where you want Sphinx’s PID file and indexes in production.

I like all my PID files in /var/run/<process>. We also want to preserve Sphinx’s indexes across deployments.

In your app create config/sphinx.yml:


production:
  pid_file: /var/run/sphinx/searchd.pid
  searchd_files: /path/to/your/app/shared/db/sphinx

On server:

$ sudo mkdir /var/run/sphinx
$ sudo chown deploy:deploy /var/run/sphinx
$ mkdir -p /path/to/your/app/shared/db/sphinx

Adjust the ownership to suit your needs.

3. Ignore Sphinx’s configuration and indexes in development.

This isn’t really a deployment step but it needs to be done.

Add to .gitignore:


config/development/sphinx.conf
db/sphinx/*

4. Configure Capistrano to work with Thinking Sphinx.

Add this to your config/deploy.rb:


# Thinking Sphinx
namespace :thinking_sphinx do
  task :configure, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
  end
  task :index, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
  end
  task :start, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
  end
  task :stop, :roles => [:app] do
    run "cd #{current_path}; rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
  end
  task :restart, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
  end
  task :rebuild, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:rebuild RAILS_ENV=#{rails_env}"
  end
end

# Thinking Sphinx typing shortcuts
namespace :ts do
  task :conf, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
  end
  task :in, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
  end
  task :start, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
  end
  task :stop, :roles => [:app] do
    run "cd #{current_path}; rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
  end
  task :restart, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
  end
  task :rebuild, :roles => [:app] do
    run "cd #{release_path}; rake thinking_sphinx:rebuild RAILS_ENV=#{rails_env}"
  end
end

# http://github.com/jamis/capistrano/blob/master/lib/capistrano/recipes/deploy.rb
# :default -> update, restart
# :update  -> update_code, symlink
namespace :deploy do
  task :before_update_code do
    # Stop Thinking Sphinx before the update so it finds its configuration file.
    thinking_sphinx.stop
  end

  task :after_update_code do
    symlink_sphinx_indexes
    thinking_sphinx.configure
    thinking_sphinx.start
  end

  desc "Link up Sphinx's indexes."
  task :symlink_sphinx_indexes, :roles => [:app] do
    run "ln -nfs #{shared_path}/db/sphinx #{current_path}/db/sphinx"
  end
end

5. Set up cron on the server to re-index your data regularly.

Edit your cron table (crontab -e) and add something along the lines of:


0 * * * * cd /path/to/your/app/current && /usr/local/bin/rake RAILS_ENV=production thinking_sphinx:index >> /path/to/your/app/current/log/cron.log 2>&1
Andy Stewart, 10 April 2009

Posted in Databases, Deployment, Rails


  1. Thanks for the great walkthrough. You got me deployed in about 5 minutes.

    The only issue I ran into was even with searchd running, the rake ts:stop thought it wasn't and would break the deploy. I told ts to index the files instead (which automatically reloads searchd upon completion, if searchd is running) and that worked perfectly.

    Cheers!

    galen
    21 April 2009
  2. Hi, thanks for this. Gave me the tips I needed to override the hostname and paths on production. Am now Thinking Sphinx enabled!

    Darren
    25 April 2009
  3. First, your article here was a huge help. i had similar issues with the starting and stopping as well. I use moonshine for deployment, and thought this may help others.

    
    config/sphinx.yml
    
      production:
        config_file: /path/to/your/app/shared/config/production.sphinx.conf
        searchd_file_path: /path/to/your/app/shcf/shared/db/sphinx/production
        searchd_log_file: /path/to/your/app/shcf/shared/log/searchd.log
        query_log_file: /path/to/your/app/shcf/shared/log/searchd.query.log
        pid_file: /path/to/your/app/shcf/shared/log/searchd.production.pid
    

    and then added to the moonshine_cap.rb

    
    set :branch, 'master'
    set :scm, :git
    set :git_enable_submodules, 1
    ssh_options[:paranoid] = false
    ssh_options[:forward_agent] = true
    default_run_options[:pty] = true
    set :keep_releases, 2
    set :rails_env, 'production'
    
    after 'deploy:restart', 'deploy:cleanup'
    after 'deploy:symlink', 'app:symlinks:update'
    
    #load the moonshine configuration into
    require 'yaml'
    begin
      hash = YAML.load_file(File.join((ENV['RAILS_ROOT'] || Dir.pwd), 'config', 'moonshine.yml'))
      hash.each do |key, value|
        set(key.to_sym, value)
      end
    rescue Exception
      puts "To use Capistrano with Moonshine, please run 'ruby script/generate moonshine',"
      puts "edit config/moonshine.yml, then re-run capistrano."
      exit(1)
    end
    
    namespace :moonshine do
    
      desc <<-DESC
      Bootstrap a barebones Ubuntu system with Git, Ruby, RubyGems, and Moonshine
      dependencies. Called by deploy:setup.
      DESC
      task :bootstrap do
        begin
          config = YAML.load_file(File.join(Dir.pwd, 'config', 'moonshine.yml'))
          put(YAML.dump(config),"/tmp/moonshine.yml")
        rescue
          puts "Please run 'ruby script/generate moonshine' and configure config/moonshine.yml first"
          exit(0)
        end
        put(File.read(File.join(File.dirname(__FILE__), '..', 'lib', 'moonshine_setup_manifest.rb')),"/tmp/moonshine_setup_manifest.rb")
        put(File.read(File.join(File.dirname(__FILE__), "bootstrap.#{fetch(:ruby, 'ree')}.sh")),"/tmp/bootstrap.sh")
        sudo 'chmod a+x /tmp/bootstrap.sh'
        sudo '/tmp/bootstrap.sh'
        sudo 'rm /tmp/bootstrap.sh'
        sudo "shadow_puppet /tmp/moonshine_setup_manifest.rb"
        sudo 'rm /tmp/moonshine_setup_manifest.rb'
        sudo 'rm /tmp/moonshine.yml'
      end
    
      desc 'Apply the Moonshine manifest for this application'
      task :apply do
        on_rollback do
          run "cd #{current_release} && RAILS_ENV=#{fetch(:rails_env, 'production')} rake --trace environment"
        end
        sudo "RAILS_ROOT=#{current_release} DEPLOY_STAGE=#{ENV['DEPLOY_STAGE']||fetch(:stage,'undefined')} RAILS_ENV=#{fetch(:rails_env, 'production')} shadow_puppet #{current_release}/app/manifests/#{fetch(:moonshine_manifest, 'application_manifest')}.rb"
      end
    
      desc "Update code and then run a console. Useful for debugging deployment."
      task :update_and_console do
        set :moonshine_apply, false
        deploy.update_code
        app.console
      end
    
      desc "Update code and then run 'rake environment'. Useful for debugging deployment."
      task :update_and_rake do
        set :moonshine_apply, false
        deploy.update_code
        run "cd #{current_release} && RAILS_ENV=#{fetch(:rails_env, 'production')} rake --trace environment"
      end
    
      after 'deploy:finalize_update' do
        local_config.upload
        local_config.symlink
      end
    
      before 'deploy:restart' do
        apply if fetch(:moonshine_apply, true) == true
      end
    
    end
    
    namespace :app do
    
      namespace :symlinks do
    
        desc <<-DESC
        Link public directories to shared location.
        DESC
        task :update, :roles => [:app, :web] do
          fetch(:app_symlinks, []).each { |link| run "ln -nfs #{shared_path}/public/#{link} #{current_path}/public/#{link}" }
        end
    
      end
    
      desc "remotely console"
      task :console, :roles => :app, :except => {:no_symlink => true} do
        input = ''
        run "cd #{current_path} && ./script/console #{fetch(:rails_env, "production")}" do |channel, stream, data|
          next if data.chomp == input.chomp || data.chomp == ''
          print data
          channel.send_data(input = $stdin.gets) if data =~ /^(>|\?)>/
        end
      end
    
      desc "Show requests per second"
      task :rps, :roles => :app, :except => {:no_symlink => true} do
        count = 0
        last = Time.now
        run "tail -f #{shared_path}/log/#{fetch(:rails_env, "production")}.log" do |ch, stream, out|
          break if stream == :err
          count += 1 if out =~ /^Completed in/
          if Time.now - last >= 1
            puts "#{ch[:host]}: %2d Requests / Second" % count
            count = 0
            last = Time.now
          end
        end
      end
    
      desc "tail application log file"
      task :log, :roles => :app, :except => {:no_symlink => true} do
        run "tail -f #{shared_path}/log/#{fetch(:rails_env, "production")}.log" do |channel, stream, data|
          puts "#{data}"
          break if stream == :err
        end
      end
    
      desc "tail vmstat"
      task :vmstat, :roles => [:web, :db] do
        run "vmstat 5" do |channel, stream, data|
          puts "[#{channel[:host]}]"
          puts data.gsub(/\s+/, "\t")
          break if stream == :err
        end
      end
    
    end
    
    namespace :local_config do
    
      desc <<-DESC
      Uploads local configuration files to the application's shared directory for
      later symlinking (if necessary). Called if local_config is set.
      DESC
      task :upload do
        fetch(:local_config,[]).each do |file|
          filename = File.split(file).last
          if File.exist?( file )
            put(File.read( file ),"#{shared_path}/config/#{filename}")
          end
        end
      end
    
      desc <<-DESC
      Symlinks uploaded local configurations into the release directory.
      DESC
      task :symlink do
        fetch(:local_config,[]).each do |file|
          filename = File.split(file).last
          run "ls #{current_release}/#{file} 2> /dev/null || ln -nfs #{shared_path}/config/#{filename} #{current_release}/#{file}"
        end
      end
    
    end
    
    # Thinking Sphinx
    namespace :thinking_sphinx do
      task :configure, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
      end
      task :index, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
      end
      task :start, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
      end
      task :stop, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
      end
      task :restart, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
      end
    end
    
    # Thinking Sphinx typing shortcuts
    namespace :ts do
      task :configure, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
      end
      task :in, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
      end
      task :start, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
      end
      task :stop, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
      end
      task :restart, :roles => [:app] do
        run "cd #{current_path};  rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
      end
    end
    
    namespace :deploy do
      desc "Restart the Passenger processes on the app server by touching tmp/restart.txt."
      task :restart, :roles => :app, :except => { :no_release => true } do
        run "touch #{current_path}/tmp/restart.txt"
      end
    
      [:start, :stop].each do |t|
        desc "#{t} task is a no-op with Passenger"
        task t, :roles => :app do ; end
      end
    
      desc <<-DESC
        Prepares one or more servers for deployment. Before you can use any \
        of the Capistrano deployment tasks with your project, you will need to \
        make sure all of your servers have been prepared with `cap deploy:setup'. When \
        you add a new server to your cluster, you can easily run the setup task \
        on just that server by specifying the HOSTS environment variable:
    
          $ cap HOSTS=new.server.com deploy:setup
    
        It is safe to run this task on servers that have already been set up; it \
        will not destroy any deployed revisions or data.
      DESC
      task :setup, :except => { :no_release => true } do
        moonshine.bootstrap
      end
    
      task :before_update do
        # Stop Thinking Sphinx before the update so it finds its configuration file.
        thinking_sphinx.index
      end
    
      task :after_update do
        symlink_sphinx_indexes
        thinking_sphinx.configure
        thinking_sphinx.start
      end
    
      desc "Link up Sphinx's indexes."
      task :symlink_sphinx_indexes, :roles => [:app] do
        run "ln -nfs #{shared_path}/db/sphinx #{current_path}/db/sphinx"
      end
    end
    

    its long, but i hope it helps. I spent days trying to get this to work. You do need to have the TS plugin installed, and i had to ssh into my slice and install sphinx according to the directions above. Bob

    Bob Hanson
    01 May 2009
  4. I think the first line to add to .gitignore should be:

    
    config/development.sphinx.conf
    
    Erik Ostrom
    19 June 2009
  5. If you find that your scheduled thinking sphinx task is running but your index isn't updating then you probably need to add PATH and possibly SHELL variables to the top of your crontab file because cron doesn't load your users ennvironment.

    Andy Ferra
    13 July 2009
  6. Thanks for the clear concise walk through. Helped me solve a problem.

    Keep up the good work!

    Alastair Brunton
    10 November 2009

Have your say

You can use Markdown in your comments. If you want to post code, do this:

<pre><code class="ruby|javascript|css|html">your code here</code></pre>

Thanks!