Zend_Search_Lucene: Not enterprise-ready

Zend Framework has been attracting more and more attention from the PHP community lately, and while it lacks certain things (like code generation) that other frameworks (like Rails) have implemented to great effect, Zend Framework 2.0 is slowly taking shape and it looks like it will be the framework of choice for startups and enterprises alike. (Yes, it will even have code generation.)

But despite having several “enterprise-ready” components, I’ve found that one in particular is not: Zend_Search_Lucene, Zend Framework’s native PHP implementation of Apache Lucene, written in Java.

Don’t get me wrong; Zend_Search_Lucene is great for a small site or blog. However, from extensive personal experience, it is not appropriate for a site with a medium or large index. I think this should be noted upfront in the documentation.

Against my better judgment, the company I work for migrated our previous search solution to Zend_Search_Lucene. On pretty heavy-duty hardware, indexing a million documents took several hours, and searches were relatively slow. The indexing process consumed vast amounts of memory, and the indexes frequently became corrupted (using 1.5.2). A single wild card search literally brought the web server to its knees, so we disabled that feature. Memory usage was very high for searches, and as a result requests per second necessarily declined heavily as we had to reduce the number of Apache child processes.

We have since moved to Solr (a Lucene-based Java search server) and the difference is dramatic. Indexing now takes around 10 minutes and searches are lightning fast. What a difference a language makes.

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!

Tags: ,

6 comments

  1. We made the same observation. While PHP is parsing the source char per char, it takes up to 3 minutes for a 2mb ascii file – or 1000 documents. Way too slow.

  2. Why did I not read this 6 month ago :-( . We see the exact same things. For small indexes (up to 100k objects) things work fine. For larger indexes we get corrupted indexes once every 48 hours.

    Even when everything is working it is far from fast.
    We have indexed dates in the index and when doing a Zend_Search_Lucene_Search_Query_Range the system goes into total hibernation. For very small ranges it works, like searching in between one or at most two days.

  3. Good to know, thanks for the post and the comments.

    However, if one already has an implementation of Zend_Search_Lucene (considering paying for one), would it be easy to port it to use Solr or Lucene? I.e. is there some sort of API compatible wrapper, or pluggable backend?

  4. Colin, you may have some luck with this:
    http://www.supercerebral.com/2011/06/apache-solr-service-for-zend-framework/

    I did not write this and have not used it, but it may be what you’re looking for.

Leave a comment