Replacing Drupal search with SOLR

There has been a need to replace Drupal’s core search with Apache SOLR in Svenska YLE for a quite some time. Before I could begin implementation, we needed to decide which Drupal modules we would select to handle the issue. There were really only two options: the Apache SOLR and Search API modules. Search API was already familiar to us and there was better Views support for our purposes, making it the obvious choice from the very beginning. At this point, we haven’t even done any actual comparison between Search API and Apache SOLR.

We already had an Apache SOLR test environment on YLE’s internal network, so we only needed to discuss how to work with the Apache SOLR service on the local environments of the developers. We could either use a local virtual SOLR environment (e.g. VirtualBox) or we could use an external service that could be accessed from anywhere. Using SOLR service within YLE’s internal network was out of the question because the development environment service needs to be functional outside of YLE’s network.

We investigated some of the external SOLR services available, but we finally chose to use local virtual SOLR environments. The main problem with this was how to ensure that all developers would have the exactly the same development environment and how to ensure that the development environment would be similar to what exists in the production environment. After a few trials and errors, Vagrant-box gave us the solution to this problem. I will not go any further into the subject of Vagrant at this point, except to say that Vagrant is the perfect tool for managing environments.

Once the modules and environments were selected, the actual implementation work could begin. We were using SOLR 3.x in both production and test environments so I needed to get a similar environment set up in my local environment. I found a ready-made vagrant-solr-box on github, so I decided to try that first. The environment worked just fine, so I continued the implementation using that.

I installed the Search API and Search API SOLR modules and also the Search API SOLR Overrides module for overriding SOLR server settings in different environments. Configuring Search API in Drupal was already a familiar procedure to me, and everything proceeded very smoothly. I began by configuring the Search API SOLR server and index. I replaced the content listing pages with the help of the Search API Views module, and everything seemed to work nicely on my local environment. We were now ready to move everything to the test environment, where a “real” Apache SOLR environment was waiting for us. All we needed was a new SOLR core for our site.

As I mentioned, everything had proceeded reasonably well, so far, but in the test environment, we started to run into some problems. First, Drupal wasn’t able to connect to the Apache SOLR server. By adjusting the proxy settings, we were able to resolve this issue, but Search API still just wasn’t working with the multicore Apache SOLR on the test environment. Indexing was successful on our local virtual environments, but these had a single-core SOLR server. The configuration that had worked just fine on my local environment didn’t work at all in the test environment, even though both were using the same version of Apache SOLR.

To solve the problem, we started by installing Vanilla Drupal on the test environment with the same modules in use on the actual site. By doing this, we were able to exclude any possible problems that might be caused by our own installation profile and features. Search API was not indexing content on this new test site, either, so we decided to try upgrading SOLR. We upgraded SOLR from version 3.6 to 4.4, and at the same time we updated schema.xml to support the latest Search API and Apache SOLR modules. This resolved the problem, the test site was able to index content to SOLR, so we configured the actual site and indexing started working there, as well.

We were very relieved when this adventure was finally over. A task that initially had seemed easy didn’t turn out to be quite so easy after all, as these things usually go, but there is no greater joy than when everything works out in the end.

With the SOLR index we have been able to replace most of the taxonomy listing pages, and this has meant a reduction on the processor load (on the database server) – especially in views that have depth enabled. The next thing to looking into is to remove the standard Drupal search index, to get a smaller database.

Written by Ari Ruuska
Ari has worked with Drupal developing about 7 years. Most of that time as a consultant at YLE as Drupal Developer and architect. He has also managed Drupal projects and developers team.

Switching Distros on a Running Site

As Mårten explained in an earlier post we decided to redo our installation profile into a more diversified model with a common core set of modules and with a differentiated set of modules for every site that will use the common core.

Splitting the previous Yle – profile that we used as a base for our installation from the beginning was easy enough. Just create two new installation profiles – Syndprofile and Fyndprofile that used most parts of the earlier Yleprofile and the parts that were specific for the two installations into their own sets of repositories, ie synd_modules and fynd_modules and so on. Starting development on the fynd-platform was also very successful and has since launced two successful projects: Kuningaskuluttaja and MOT.

However, on the svenska.yle.fi – platform, the change posed a lot of challenges to overcome, since we had to change installation profiles on a running site. And since a lot of the path settings to modules and themes are set in the database at installation time a lot of file paths would have to be changed. So when we first tried the easiest approach, just replacing the profiles/yleprofile installation folder with a profile/syndprofile we were left with a severely broken site, that couldn’t find any modules or themes, no matter how many times you tried to clear the cache. So it became clear that we had to do the changes directly into the production database.

So what we did was to dump the database into an sql file and doing a search and replace on every occurence of yleprofile to the new synprofile. We opted to slightly change the name to synprofile since there are a lot of serialized data that contains the path in the database as well.

We have around 500 000 nodes in our database at the moment, and a pretty large index, but using sed to do the search and replace the operation only took a few minutes to do, even though the sql file itself is getting close to 5 Gigabytes in size.

$ sed ‘s/yleprofile/synprofile/g’ < svenskaylefi.sql > svenskaylefi_synprofile.sql

This, however, was not enough in our case. We also had a couple of modules and our own theme settings and features that was not going to be used in the other distributions so we had to manually change the location of these. Fortunately most of the file path settings are stored in just a few tables.

system
registry and
registry_file

The paths are also stored in the cache tables so it is advisable to truncate them at the same time, especially

cache
cache_bootstrap and
cache_path

It took a few tries to get every step in this workflow to work out without problems, but when we finally figured out how to do it the migration process went pretty smoothly. Just one final hurdle gave us a bit of cold sweat when the site didn’t even bootstrap even though the migration process had otherwise gone smoothly. But, a manual flush of the memcache server solved that too.