Monday, May 27, 2013

Composing Music with PHP

I'm not an expert on probability theory, artificial intelligence and machine learning, and even my Music 201 class from years ago has been long forgotten. But if you'll indulge me for the next 10 minutes, I think you'll find that even just a little knowledge can yield impressive results if creatively woven together into an application. I'd like to share with you how PHP can be taught to compose music.

Here's an example:

generated melody

You're looking at a melody generated by PHP. Sure it's not the most memorable melody, but it's not unpleasant either. And surprisingly, the code to generate the sequence of notes is relatively brief.

So what's going on? In short, musical data is analyzed by the script to "learn" which intervals make up pleasing melodies, and then the script creates a new composition by selecting pitches based on what it knows. Speaking technically, the script calculates a probability map of melodic intervals and applies a Markov process to generate a new sequence.

Standing on Shoulders

Music composition does not happen in a vacuum. Bach was fond of Buxtehude and Vivaldi; Chopin influenced Lizt and Wagner; Mozart and Hayden instructed Beethoven. Even the same melodic phrase can be found in different pieces of work. Orfeo ed Euridice by Gluck and the hymn tune Non Dignus for example both share a common phrase.

similar melody in Orfeo ed Euridice by Gluck and hymn tune Non Dignus

If you ask PHP to compose blindly, the results aren't pretty. To prove this point, here's a melody generated by mapping random values returned by successive calls to rand() to notes on a staff.

random melody

Unless you're keen on twelve-tone, it's better to draw from earlier compositions for inspiration and understanding.

To do just this, I first transcribed the melody of several pieces of music using Scientific Pitch Notation. I didn't concern myself with note durations, instead focusing on the pitches themselves. A middle C on paper was entered as 'C4' (C is the note name, 4 is the octave), a semitone above that was 'C#4', the next semitone 'D4', and so on, until a melody (the first 8 measures of Tantum Ergo by Bottazzo shown here) was encoded like:

first 8 measures of Tantum Ergo by Bottazzo

A4 C5 G4 A4 G4 A#4 D5 A4 B4 A4 C5 D5 C5 G4 B4 B4 C5

I now had a sequence that was easily parsable and on which I could perform some basic analysis. For example, given an instance of A4, what is the next probable note to follow?

A4 C5 G4 A4 G4 A#4 D5 A4 B4 A4 C5 D5 C5 G4 B4 B4 C5

or:

C5 G4 B4 C5

There is a 50% chance that the next note would be C5, a 25% chance it will be C4, and a 25% chance it will be B4.

This process translated warm flowing music into something the computer, understanding things only within the context of cold, unfeeling mathematics, could comprehend and reason about.

Paging Doctor Markov

We're familiar with deterministic systems – systems where the same input will always generate the same output. Addition is a deterministic function, with inputs 2 and 4 always yielding 6. Stochastic systems on the other hand behave with some level of randomness. Identical inputs can result in wildly different outputs, such as with the function array_rand(). There is an element of randomness in composition, or else all compositions starting on F4 would end up the same, making generations of composers irrelevant and filling the coffers of the RIAA. But the randomness is tempered, even if at a subconscious level, by the composer with the knowledge of what combinations are pleasing.

A prime example of a stochastic system, one which is also relevant to the composition script, is a Markov process (named after the mathematician Andrey Markov who not only studied them but also had an amazing beard). As Nick Didkovsky explains:

Markov analysis looks at a sequence of events, and analyzes the tendency of one event to be followed by another. Using this analysis, you can generate a new sequence of random but related events, which will look similar to the original. A Markov process is useful for analyzing dependent random events - that is, events whose likelihood depends on what happened last.

The traditional example used to illustrate the concept is a weather predicting graph. Suppose the day following a sunny day has a 90% chance of also being sunny, and the one following a rainy day has a 50% chance of being rainy. The graph would look like this:

Markov Sunny/Rainy graph

Depending on the mechanism used to resolve probabilities, walking the graph for 5 iterations we might find ourselves transitioning with Sunny the first day, Rainy the next, Sunny after that, then Sunny, and Sunny, or we might find ourselves transitioning Sunny, Sunny, Sunny, Rainy, Rainy another.

Hopefully it's obvious where I'm going with all of this; it's possible to constrain the random process of "next note selection" using the weighted probabilities learned by analyzing melodies for a better sounding result.

A4 graph

This simple process allows us to generate passable melodies in infinitely less time than it would take for a monkey hitting random keys on an organ to play the complete works of Messiaen.

Robot Composers (the singularity is near)

At this point you hopefully have a cursory understanding of the two key concepts I employed to generate music. Even if you don't, you've at least survived my humor and deserve to be rewarded by seeing some code.

<?php
namespace Zaemis;

class Composer
{
    private $pitchProb;

    public function __construct() {
        $this->pitchProb = [];
    }

    public function train($noteData) {
       $numNotes = count($noteData);
       for ($i = 0; $i < $numNotes - 1; $i++) {
           $current = $noteData[$i];
           $next = $noteData[$i + 1];
           $this->pitchProb[$current][] = $next;
       }
   }

   public function compose($note, $numNotes) {
       $melody = [$note];
       while (--$numNotes) {
           $i = array_rand($this->pitchProb[$note], 1);
           $note = $this->pitchProb[$note][$i];
           $melody[] = $note;
       }
       return $melody;
   }
}

<?php
require_once '../include/Zaemis/Composer.php';
use Zaemis\Composer;

$noteData = trim(file_get_contents('../data.txt'));
$noteData = explode(' ', $noteData);

$c = new Composer();
$c->train($noteData);
$melody = $c->compose($_GET['note'], $_GET['count']);

echo '<img src="img/notes/clef.png" alt="Treble Clef">';
foreach ($melody as $note) {
    echo '<img src="img/notes/' . urlencode($note) . '.png" alt="' .
        $note . '">';
}

The learning process takes place in the train() method which accepts an array of training notes (the encoded melody string split on its spaces). This code is simple, quick, and dirty; the notes are pushed to a 2-dimensional array with their probabilities indirectly implied by the quantity of elements themselves.

When populated, the array would look similar to:

array(9) {
  ["A4"]=> array(13) {
    [0]=> string(2) "C5"
    [1]=> string(2) "G4"
    [2]=> string(3) "A#4"
    [3]=> string(2) "C5"
    [4]=> string(2) "B4"
    [5]=> string(3) "A#4"
    [6]=> string(2) "G4"
    [7]=> string(2) "A4"
    [8]=> string(2) "D5"
    [9]=> string(2) "G4"
    [10]=> string(2) "C5"
    [11]=> string(2) "C5"
    [12]=> string(2) "G4"
  }
  ["C5"]=> array(11) {
...

Looking at the data, a randomly selected note to follow A4 has approximately a 31% chance of being C5 since 4 out of the 13 members of the list hold that value. (Yes, I know maintaining probabilities instead of a memory-exhausting list of all of the notes in a composition is the "right way", but I wanted to keep things simple for illustrative purposes.)

The compose() method encapsulates the logic to generate the melodic sequence. A starting note the desired length is given, and the method randomly selects a value for the following note from the array until the desired number of notes has been retrieved.

Of course we humans would rather see the result notated on a staff as opposed to a list of note values, so I created a set of note images to accompany the script. Each image displays a note on the appropriate position on a staff, and the files are named according to the note name. Looping through the melody to emit some IMG elements was an effective rendering method for my needs.

Harder, Better, Faster, Stronger

It is impressive that such simple concepts can be used to create a script capable of emulating a composer. Of course, there is infinitely more that can be done to build and improve. Consider this your first exploration into musical intelligence. David Cope, who has been exploring computer composition since 1981, has this to say:

Simply breaking a musical work into smaller parts and randomly combining them into new orders almost certainly produces gibberish. Effective recombination requires extensive musical analysis and very careful recombination to be effective at even an elemental level.

Beyond the obvious changes, such as changing the pitch matrix to maintain probabilities, how would you improve things? Maybe replace this naive approach with a completely different mechanism for analyzing music? Parse input from MIDI files? What would be needed to identify harmonies? How about chord progressions? Note durations? Composer "signatures"? Could your script learn from itself by analyzing and feeding pleasing melodies it produced back into its knowledge base? In what ways could you recombine samples to form new works?

I look forward to hearing about your own experiments in AI-driven composition in the comments below.

Update 6/10/13: I've tossed some code up on GitHub if anyone's interested.

Saturday, October 20, 2012

PHP Assertions

I stumbled upon assertions in PHP today, though why I didn't know they existed after working with the language for so long and what I was looking for originally when I came across them are both mysteries. And with the increasing focus on software quality in the PHP community, I wondered why I hadn't seen them used by others. I decided to ask around, look into PHP's implementation of assertions, and do some tinkering.

I asked a few friends if they knew about assertions; they did. I asked if they used them; they didn't.

Remi Woler: I think nobody has found a good use case. It weaves tests into code. How are you going to recover from a failed assertion?
Davey Shafik: They kinda suck. For example: assert('mysql_query("")') It's a string of code that gets eval'd.

Indeed, PHP assert did not get stellar endorsements from people whose opinions I respect.

My primary experience with assertions comes from C where they are defined as macros. Its argument must evaluate true otherwise the program terminates with an error. These checks can be stripped at compile time using -dNDEBUG if desired for performance reasons, although there is some disagreement on the wisdom of doing so.

PHP asserts are implemented a bit differently. First, assertions are configurable in php.ini or with assert_options(). A failure doesn't necessarily have to abort the script – you can bail if you want to, or disable them, or convert them to run-time warnings, or even invoke a callback to handle them. This makes them very flexible and much less black-and-white than in C.

The actual assert() function accepts either a string or a Boolean for its condition. So, for example, you can write either:

<?php
assert(is_string($foo));

or:

<?php
assert('is_string($foo)');

In the first instance, the statement is evaluated and then the resulting Boolean is passed to assert(). While perhaps a little more traditional, it may not be as efficient as you will see momentarily.

In the second, the string is passed to assert() directly which eval's it to determine its truthiness. This is a better approach for two reasons:

  1. assert() immediately returns true if assertions are disabled. The code string is not evaluated and performance hit from executing unnecessary statements is minimized.
  2. If the assertion fails, the code string is passed to a callback (if one is used) and can be included in any output or logging.

I'm not convinced Davey's eval concerns are entirely well-founded in this instance because of the above reasons and the fact that it's static code to be evaluated by PHP. It's a controlled environment, not eval($randomUserSuppliedCode).

PHP 5.4 also added a second parameter to assert() – a string description to annotate the test. If present, the string is also passed to the callback.

The PHP manual offers some guidance on using assertions:

Assertions should be used as a debugging feature only. You may use them for sanity-checks that test for conditions that should always be true and that indicate some programming errors if not or to check for the presence of certain features like extension functions or certain system limits and features.

Assertions should not be used for normal runtime operations like input parameter checks. As a rule of thumb your code should always be able to work correctly if assertion checking is not activated.

Both are good pieces of advice, but are contradictory; your code may not work if assertion checking is disabled and you are using them to test system limitations.

Wikipedia explains the difference between assertions and error handling:

Assertions should be used to document logically impossible situations and discover programming errors — if the impossible occurs, then something fundamental is clearly wrong. This is distinct from error handling: most error conditions are possible, although some may be extremely unlikely to occur in practice. Using assertions as a general-purpose error handling mechanism is unwise: assertions do not allow for recovery from errors; an assertion failure will normally halt the program's execution abruptly. Assertions also do not display a user-friendly error message.

I (perhaps foolishly) disregarded the manual's and Wikipedia's advice while I was tinkering with them. PHP assertions don't behave like their C brethren, so perhaps the traditional C way of thinking (asserts as debugging only) might be artificially restrictive? What I found was that PHP assertions, with a bit of creativity, could be used to write readable, quality code.

Consider a naive Active Record implementation. You might have code that resembles:

<?php
class User
{
    protected $id;
    ...

    public function setId($id) {
        if (!is_null($this->id)) {
            throw new BadMethodCallException('ID already set for user.');
        }
        if (!is_int($id) || $id < 1) {
            throw new InvalidArgumentException('ID for user is invalid.');
        }
        $this->id = $id;
    }

    ...
}

It is possible to use assert() to test the $id argument (disregarding the manual's advice) and a callback to throw the exceptions (ignoring Wikipedia).

<?php
assert_options(ASSERT_CALLBACK, function ($file, $line, $code, $desc) {
    list($exClass, $msg) = explode(':', $desc, 2);
    throw new $exClass($msg);
});

class User
{
    protected $id;
    ...

    public function setId($id) {
        assert('is_null($this->id)',
            'BadMethodCallException:ID already set for user.');
        assert('is_int($id) && $id > 1',
            'InvalidArgumentException:ID for user is invalid.');

        $this->id = $id;
    }

    ...
}

No, this isn't how assertions are intended to be used, but it does address Remi's concern about recovery. One doesn't typically recover from an assertion but now the condition has been converted into an exception so recovery is possible to the same extent that recovery from the exception would be.

If assertions have been turned off then the code won't work, so if you wanted to go down this path then it'd be wise to add assert_options(ASSERT_ACTIVE, true) to your bootstrap file.

Now don't get me wrong, I'm not about to doing this in all of my code (at least not yet anyway!). It's fun to play but there's still some questions worth pondering.

If you were to use assert() properly instead of something along the lines of my bastardized exception example, what type of things would be worth asserting?

Assertions are meant to identify program logic/design bugs, not as a run-time error handling mechanism. Isn't this why we do unit testing? Playing devil's advocate, what's wrong with pushing unit tests directly into your code if we have doc comments that are extracted for documentation?

Feel free to let me know your thoughts in the comments section below. Do you constrain yourself to the classical interpretation of assertions, or do you take advantage of the flexibility of PHP's implementation? Where and when do you use them in your code?

Saturday, September 29, 2012

PHP_EOL: Most Worthless Constant?

PHP_EOL may very well be the most worthless general-purpose constant in modern PHP. It's supposed to be helpful for cross-platform developing, for example you could write a PHP-powered shell script that says:

<?php
echo "Operation Successful!" . PHP_EOL;

and then expect the proper newline to terminate the output string based on the platform PHP is running on.

That's all well and good, but the following is functionally equivalent:

<?php
echo "Operation Successful!\n";

Try it out and you'll see. In console output on Windows, Linux, and Mac they all are displayed with the expect newline terminating the output string.

I don't see it being useful for writing data or log output to a file either. If you're writing and reading on the same platform then newline discrepancies won't be an issue, and if you're writing on one platform and reading on another then you'll want to standardize on a newline anyway.

Has PHP_EOL's time come and gone? Do you use it in your code, and if so why?

Tuesday, July 24, 2012

PHP Recursive Directory Traversal

It sounds like a simple enough task: Generate an array that mirrors a directory structure. Directories may have subdirectories (arbitrary nesting), and entries should be alphabetized with directories grouped first. The image below shows what the array should look like given a sample directory.

While not terribly difficult, there are a few snags that can trip you up if you're not careful. For me, the first snag was trying to do it “the right way.”

The RecursiveDirectoryIterator “provides an interface for iterating recursively over filesystem directories” (php.net), so this was my first approach. I hacked together this code after a short while:

<?php
function getDirectoryList($dir) {
    $dirList = [];
    $dirIter = new RecursiveDirectoryIterator($dir,
        FilesystemIterator::SKIP_DOTS);
    $iterIter = new RecursiveIteratorIterator($dirIter);

    foreach ($iterIter as $entry) {
        $path = substr($entry->getPath(), strlen($dir) - 1);
        $keys = "['" . join("']['", explode("/", $path)) . "']";
        eval('$dirList' . $keys . '[]="' . $entry->getFilename() . '";');
    }

    return $dirList;
}

The function gets the nesting right for files, but empty directories are missing and the ordering is wrong. I could have spent some time trying to fix those issues, but the use of eval() bothered me enough to abandon the approach completely. A straight iteration wasn't going to build up the array correctly without it, so I needed to take a true recursive approach.

In addition to doing away with eval(), the recursive approach also afforded me an easy way to implement the necessary sorting. I was able to queue the directory names and file names separately, sort them, and then return their union.

<?php
function getDirectoryList($dir) {
    $dirList = $fileList = [];

    if ($dfp = opendir($dir)) {
        while (($entry = readdir($dfp)) !== false) {
            if ($entry[0] != ".") { // catches dot dirs and hidden files
                $path = "$dir/$entry";
                if (is_file($path)) {
                    $fileList[] = $entry;
                }
                else if (is_dir($path)) {
                    $dirList[$entry] = getDirectoryList($path);
                }
            }
        }
        closedir($dfp);

        uksort($dirList, "strnatcmp");
        natsort($fileList);
    }

    return $dirList + $fileList;
}

Interestingly enough, PHP doesn't have a natksort() function. I had to mock my own implementation using uksort() and strnatcmp().

I ran the solution past a few friends of mine and the response from one was:

you... bring shame to our profession.

His efforts to show me “the right way” again with RecursiveDirectoryIterator were short lived however when he came across the same issues I did and gave up to eat a leftover burrito.

So I guess there are a couple morals to my tale. One, that despite our fancy modern OOP APIs, sometimes the procedural approach is a better fit for the task at hand. We abstract everything so we don't have to re-invent the wheel but then have a mass of code that is too generic to actually do something that should be trivial. Two, we should be careful about being pompous. It's hard to eat a burrito with your foot in your mouth.

Of course, if you can come up with a better way then let me know. I might just buy you a new burrito. :)

Saturday, May 19, 2012

Writing a Minimal PSR-0 Autoloader

An excellent overview of autoloading in PHP and the PSR-0 standard was written by Hari K T over at PHPMaster.com, and it's definitely worth the read. But maybe you don't like some of the bloated, heavier autoloader offerings provided by various PHP frameworks, or maybe you just like to roll your own solutions. Is it possible to roll your own minimal loader and still be compliant?

First, let's look at what PSR-0 mandates, taken directly from the standards document on GitHub:

  • A fully-qualified namespace and class must have the following structure \<Vendor Name>\(<Namespace>\)*<Class Name>
  • Each namespace must have a top-level namespace ("Vendor Name").
  • Each namespace can have as many sub-namespaces as it wishes.
  • Each namespace separator is converted to a DIRECTORY_SEPARATOR when loading from the file system.
  • Each "_" character in the CLASS NAME is converted to a DIRECTORY_SEPARATOR. The "_" character has no special meaning in the namespace.
  • The fully-qualified namespace and class is suffixed with ".php" when loading from the file system. Alphabetic characters in vendor names, namespaces, and class names may be of any combination of lower case and upper case.

The first two and the last points are aimed at module/library authors, and the third point is of little consequence. The remaining three are the important points relevant to writing the autoloading mechanism. Of course standards have to be wordy by their very nature, but if you boil the relevant mandates down they essentially say the following: “replace namespace separators and class-name underscores with a directory separator and append a .php suffix.”

The standard doesn't describe what support functionality must be provided by a PSR-0 compliant autoloader (registration methods, configuration options, etc.). If it can automatically find a class definition in the \<Vendor Name>\(<Namespace>\) pattern, then it's PSR-0 compliant. Furthermore, it doesn't specify the parent directory for <Vendor Name>. The extra “fluff” of most autoloader implementations is convenient if you need to specify the location via code, but most of the times unnecessary if you simply use a directory already within PHP's include path.

With modern namespacing support in in PHP, it's probably not necessary to encapsulate the logic as a class, like most libraries/frameworks do, either. A single function can perform the necessary transformations on a class path and be namespaced properly so it doesn't pollute the global namespace. Instead of creating an instance of an autoloader object and then invoking the instances register() method, one can simply register a function directly with spl_autoload_register().

Or if you want to be even more minimal, you can register an anonymous function with spl_autoload_register(). Put the code in an include file, include that file, and you have no-muss-no-fuss PSR-0 autoloading instantly at your disposal.

<?php
spl_autoload_register(function ($classname) {
    $classname = ltrim($classname, "\\");
    preg_match('/^(.+)?([^\\\\]+)$/U', $classname, $match);
    $classname = str_replace("\\", "/", $match[1])
        . str_replace(["\\", "_"], "/", $match[2])
        . ".php";
    include_once $classname;
});

The magic here is in the regex which splits the incoming name into its constituent parts; the class name will always be in $match[2], and $match[1] the namespace name which may or may not be an empty string. It's necessary to identify the parts because the underscore has no special meaning in the namespace portion making a blind replace on underscores and backslashes incorrect.

Oh, and before you start jumping all over me about DIRECTORY_SEPARATOR, I'd like to point out that a hard-coded slash is equivalent for the purpose here. From the PHP manual:

On Windows, both slash (/) and backslash (\) are used as directory separator character. In other environments, it is the forward slash (/).

So YES, it is possible to write a minimal and elegant PSR-0 compliant autoloader. The only extra requirement is that the <Vendor Name> directories already be in PHP's include path to negate the need for additional path registering functions, which I would argue is good practice anyway.

Perhaps someday the group could sponsor something that mandates the path requirement (and maybe name it PSR-0a)?

Of course, maybe I'm just crazy.

Special thanks to Graham Christensen for his efforts in proofing my concept.

Tuesday, May 8, 2012

Doing a 360 on Grid 960

I've never been a fan of CSS frameworks; They just seem unnecessary to me. Every project can benefit from a reset.css file and maybe basic typography styles, but a whole framework? Meh.

Then I read an excellent argument in favor of grid-layout frameworks in some book which I've since forgotten the name of and changed my mind (a tremendous feat indeed). I decided I'd make use of a grid-layout framework in my next project.

I chose Grid 960 for the project since that was the one mentioned in the book, I had heard about it before, and it seemed to me the most mature and stable. My experiences with Grid 960 weren't bad per se... I mean, it didn't sour me back to my original mindset... but a few points will have me looking for another framework.

  1. The extra markup required is basically reminiscent of tables. Instead of <tr> or <td> though now you've got <div class="container_12"> and <div class="grid_3">.
  2. Borders, margins, and padding will throw your grids off. While it makes sense and is ultimately unavoidable, it highlights the fact grid-systems are not necessarily as intuitive as they claim to be.
  3. I found 960px still a bit wide. More screen-real estate is available than there was a few years ago, but people don't necessarily view sites full screen like they did back in the 800x600 days.
  4. Grid 960 isn't scalable. I'm not talking about "responsive web design" here, rather just using ems or rems instead of pxs so things can scale properly.

Researching beyond 960 I saw there are few fluid and responsive ones. And I saw a 1KB framework which was cool. It lacked push/pull functionality, but would be sufficient for most of my work I think.

So 960 wasn't my cup of tea, but I haven't given up on grid-frameworks yet. Maybe I'll find something more to my liking for my next project... or even roll my own.

Friday, May 4, 2012

Are Coding Standards Futile?

Unless the visual layout of a program's code affects its execution, there will always be programmers who circumvent the established coding standards. I admit, I've done it myself from time to time. There's no scientific survey that such standards really reduce cognitive friction when reading someone else's code as far as I know, and aesthetic matters are generally subjective. Make the argument for tabs over spaces until you're blue in the face; someone will just come along touting the benefits of spaces.

I warned achieving a consensus on PHP Coding Standards as PSR-1 would be difficult and that the group's efforts would be better spent discussing more "meatier" topics, such as object caching. Two months later, the proposal failed to garner enough votes for a simple majority and has now been split.

And let's not forget the "Beat Up on Crockford" festival over bootstrap and JSMin. His comments were a bit harsh, yes... but then again he only made two comments in the entire (quite lengthy) discussion and ended up immortalized in the (admittedly funny) Dangerous Punctuation.

Novelists don't all write in the same style; noting the formatting in a section of code might give a heads up on who wrote it or insight into the coder's way of thinking. Maybe it's a clue as to who we can go to for help when something doesn't work. Weak arguments, sure. But maybe so is "consistency breeds success" when applied to code formatting.

Most coding standards seem to target only low-hanging fruit anyway: capitalize something this way, place your braces in this manner, space something that way, etc. None of that really matters, does it? Standards that enforce good architectural design, specific interoperability concerns, etc. have more merit. After all, standards should help make things work, not squash creativity. And if Joe Programmer's self-expression manifests itself as 5-space indenting, who am I to judge?