PHP


12
Jun 08

The future of PHP

Jurriën Stutterheim posted an interesting article on the future of PHP recently (“PHP… what to say?”). In it he argues that PHP shouldn’t try to maintain complete backwards compatibility in the next release, saying,

“PHP 6 will make or break PHP as a language. For PHP to make it, it needs a clear vision of what it wants to be. Trying to maintain compatibility with ancient and mostly poorly written scripts can’t be part of this vision.”

The language can’t advance, he says, if it’s chained to outdated yet popular open-source projects.

But here’s the problem. If you’re intent on being the lingua franca of the Web, backwards compatibility is critical. It’s part of that compatibility-stability-security triumvirate that users love.

You see this conflict elsewhere in the tech industry. Microsoft’s greatest strength and biggest weakness is backwards compatibility. Users love it because decade-old programs still run in Windows. If you’re, say, a medical transcription company that uses an application written in the late ’90s, your company is going to stay with Windows indefinitely, because no external factor forces you to update that application. And Microsoft isn’t going to do anything to jeopardize that.

This is the same wall that PHP is headed toward. I view it as a wall. Others might not; it’s certainly not a bad place to be at the moment. On the plus side, it means PHP developers have more opportunities afforded to them because of the popularity of the language. The downside is that they can look left and right and see niche languages innovating in compelling ways. The more “cosmopolitan” PHP programmers inevitably develop language envy and move on, but these are exactly the kind of developers PHP should fight to retain.

It doesn’t have to be this way. PHP could flourish as a smaller language. This is the tack that Apple has taken, and it seems to work well for the company and its users. Apple is one of the most innovative companies in the entire tech industry owing in no small part to this strategy.

For my part, I’d like to see first-class functions and closures included in the language. An object-oriented API ($file->read(1024)) alongside the procedural functions (fread($resource, 1024)) would also be welcome.

But none of that will happen, because PHP is a language in decline. Not a decline in usage—it will only continue to expand its reach—but in the addition of innovative features from other languages. There will be no need to evolve; most of the agitators for change will have moved on.

There is one hope for PHP, however: Zend Framework. I think if it can gain a foothold among developers, some of these trends can be reversed. Hopefully (for PHP) everyone hasn’t moved on to Ruby by then.

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!

17
May 08

PHP’s create_function() and closures

A coworker recently asked me what the difference was, functionally, between PHP’s create_function() function and traditional closures that you might find in languages with first-class functions, like Ruby or JavaScript. You can pretty easily illustrate this with a couple of examples.

First, a bit about closures. The idea with closures is that you can cleanly and readably pass around a bit of logic as an object, and any references that that object makes to variables in the surrounding scope must persist until that object is done with them.

So here’s an example in JavaScript:

function getGreeter(name) {
  return function(salutation) {
    alert(salutation + ', ' + name);
  };
}

var greeter = getGreeter('Eddy');
greeter('Hello');   // Hello, Eddy
greeter('Howdy');   // Howdy, Eddy
greeter('Bonjour'); // Bonjour, Eddy

Here’s the closest equivalent in PHP:

$code = '$name, $salutation', 'print $salutation . ', ' . $name;';
$greeter = create_function($code);
$greeter('Eddy', 'Hello');
// etc.

And that’s a callback, not a closure. In JavaScript the garbage collector reclaims the memory used by the anonymous “greeter” function… but in PHP, functions get declared and stay declared, so every time you call create_function(), you increase the memory usage.

It gets worse. This is basically what PHP does internally:

function create_function($args, $code) {
    // create a random $functionName
    eval('function ' . $functionName . '($args){$code}');
    return $functionName;
}

Yeah, the entire thing is evaluated. So not only does it not get garbage collected, but it has all the traditional problems of eval()—it’s slow, difficult to debug, and uncacheable by bytecode caches like APC. Problems that closures don’t have in other languages.

It’s why you can do something like this (which works on the same principle as SQL injection)…

$code = 'print "I print repeatedly.\n"; } print "I print once.\n"; if (false) {';
$function = create_function('', $code);
call_user_func($function);
call_user_func($function);
call_user_func($function);

// I print once.
// I print repeatedly.
// I print repeatedly.
// I print repeatedly.

…and why you should never use create_function().

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!

17
May 08

Converting string literals in PHP

In PHP (and most languages), this is false:

'\143\141\164' == "\143\141\164"

No surprise there. One is a 12-byte string of backslashes and numbers, and the other is a 3-byte string of octal values spelling “cat”. When you use double quotes, PHP transparently converts the string.

Sometimes it’s convenient to write values in files as string literals that represent characters. Some values simply don’t translate well in their native form, and it’s more explicit to write them out “long hand” in octal or hexadecimal. This is useful if you have to match, say, an exotic series of characters with 100% accuracy.

But what happens when you need to clue PHP in that the string “\143\141\164″ (as read from a file) should equal “cat”? As far as I know, there’s no easy way to do this. Presumably, there should be a function—something like str_convert_literals()—which would accept a string and do the conversion itself. But there isn’t, so you must rely on regular expressions.

Here’s the solution I found after some trying various other methods (like tokenizing the string):

$string = preg_replace_callback('/\\\\([0-7]{1,3})/', 'convertOctalToCharacter', $string);

function convertOctalToCharacter($octal) {
    return chr(octdec($octal[1]));
}

I’ll run through what’s going on briefly. The regular expression matches anything following a backslash that is a series of up to three digits, 0-7 (octal is base 8, after all). It passes that match to the convertOctalToCharacter() function, which converts the value to decimal and then feeds it to the chr() function (which only accepts decimal values). That in turn converts the integer to its corresponding character value, which is then substituted into the string.

Based on this, the hexadecimal conversion function isn’t very difficult to guess. To get you started, I’ll give you a not-so-subtle hint: the regular expression is /\\\\x([0-9A-F]{1,2})/i.

One more thing: if you also translate special characters like \r, consider using lookbehinds in your expression to ensure that valid sequences like \\r aren’t converted twice.

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!

15
Apr 08

Magic format changes; no more magic.mime

The problem with unofficial, de facto standards, like magic.mime? What happened late last month, when the Unix file(1) command development team, led by Christos Zoulas, released version 4.24, a minor revision that changes the entire magic format and no longer generates a magic.mime file.

Many programs rely on the magic format in order to identify a file’s MIME type (for example, returning “video/quicktime” for a QuickTime movie or “image/jpeg” for a JPEG image). With MIME detection being merged into magic.mgc, a compiled file, programs that rely on this functionality must be modified in order to use the latest changes.

According to Christos the new format yields more accurate results:

[N]ow mime detection is more precise as it depends on the full magic specification of each magic type, not just a single magic/offset.
—Christos Zoulas

And indeed, in testing this appears to be the case, e.g., MP4 videos are detected more often than they were in 4.23. But to use these latest changes, many developers must make system-level calls directly to the file command until extensions are updated.

The PHP extension Fileinfo, for example, is a thin wrapper around the library version of file (libmagic), but yet does not understand the new format. In PHP, calling the Unix file(1) command on a fast machine via exec() is about 16 times slower than using Fileinfo (0.128 seconds versus 0.008).

Of course, there is a standard specification, but neither file(1) nor Fileinfo use it, unfortunately.

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!

18
Apr 07

Decrypting a Dreamweaver site definition password

I don’t use Dreamweaver, but everyone I work with does. It so happens that whenever I need server connection information, they send it to me in the form of a Dreamweaver site definition (.ste). Naturally, this isn’t terribly useful for someone like me who connects via SSH or SCP most of the time. In the end, I have to waste time asking around to see if anyone actually remembers the password.

So today I finally took a few minutes out of my day and wrote a simple PHP class to parse site definitions. It reads the bare essentials of the connection information and decrypts the password. Because Dreamweaver site definitions are just XML files, if (for some bizarre reason) someone wanted to extend this, it wouldn’t be hard at all.

<?php
/**
 * A Dreamweaver site definition (.ste) reader.
 */

class SteReader {
    /** @var SimpleXMLElement SimpleXML object */
    protected $_xml = null;

    /**
     * Constructor.
     *
     * Parses a site definition file into its SimpleXML equivalent.
     *
     * @param string $file Fully-qualified file path
     */

    public function __construct($file) {
        if (!is_file($file)) {
            throw new Exception('File does not exist');
        }

        $contents = file($file);
        foreach ($contents as $i => $line) {
            // This element is unnecessary, and often contains duplicate
            // attributes that prevent the file from loading correctly
            if (substr($line, 0, 14) == '<appserverinfo') {
                unset($contents[$i]);
            }
        }
        $contents = implode('', $contents);
        try {
            $xml = new SimpleXMLElement($contents, LIBXML_NOWARNING | LIBXML_NOERROR);
        } catch (Exception $e) {
            throw new Exception("File is not a valid Dreamweaver site definition");
        }
        $this->_xml = $xml;
    }

    /**
     * @return string Site name
     */

    public function getSiteName() {
        return (string) $this->_xml->localinfo['sitename'];
    }

    /**
     * @return string Host address
     */

    public function getHost() {
        return (string) $this->_xml->remoteinfo['host'];
    }

    /**
     * @return string Remote root directory
     */

    public function getRemoteRoot() {
        return (string) $this->_xml->remoteinfo['remoteroot'];
    }

    /**
     * @return string Username
     */

    public function getUsername() {
        return (string) $this->_xml->remoteinfo['user'];
    }

    /**
     * @return string Password
     */

    public function getPassword() {
        if (!isset($this->_xml->remoteinfo['pw'])) {
            return false;
        }

        $encoded  = (string) $this->_xml->remoteinfo['pw'];
        $literals = explode(' ', wordwrap($encoded, 2, ' ', 2));
        $password = '';
        for ($i = 0; $i < count($literals); $i++) {
            $password .= chr(hexdec($literals[$i]) - $i);
        }

        return $password;
    }
}

Thanks to Bart Grantham for his Dreamweaver site definition password decryption algorithm!

Like this post? You might also like Coalmine, my centralized error tracking service for your apps. Coalmine captures errors and all kinds of helpful debugging information, notifies you, and makes it all searchable. Check it out!