msteigerwalt

Easy Footnotes with PHP

Maybe you read my site and think, “Gee, if only I had footnotes like Michelle, maybe I would be popular!” Perhaps you’ve tried to implement your own footnotes a few times, never to find an solution that felt right to you.

Maybe you resigned yourself to the fact that you’re not as awesome as me, and will never be able to have your own footnotes unless I personally walk you through every step of the process.

If so, I have great news. I can’t guarantee they’ll be as clever as my footnotes, but I can show you the technique I used for my article backend.

Designing the Solution

The first step to any solution is to figure out exactly what we’re trying to do. Yes, I know, we want to add footnotes to our text. But that’s a high-level overview. There are several details we must decide on before we start coding.

First, we have to decide on what the optimum syntax will be. How do we format our footnotes so that they’re easy to write, don’t clutter the text while reading the raw format, and can be easily interpreted by the computer?

The intended audience will also affect our choice in format. Will a lot of people be reading the raw format directly, such as in a README file packaged with a software distribution? What formatting system will be used for the rest of the text?

In my case, the primary audience for the footnote functionality is me. Therefor, the footnote syntax doesn’t necessarily have to be simple, but I’m terribly lazy, so I made it as simple as possible.

If I were doing footnotes in a context such as documentation packaged with project, it might make more sense to move the footnotes to the end of the document and keep only a reference to each footnote next to the annotated items, like so:

 This is some text[1] with footnotes in it[2].

 [1] Very fine text, too!
 [2] Can't think of anything for the second footnote.

You may have seen this footnote style on Usenet.1 Since it’s a standard syntax, it would be the best format for distribution to a wider audience.

However, since the raw format of my files will most likely be seen by me alone, full footnotes can be placed right next to the text they’re adding extra details to without distracting the reader. Keeping footnote references up-to-date and organized would be too error-prone and tedious, without enough gain to compensate for the drawbacks. I’d be better served by having the footnotes within the text itself, like so:

 This is some text[Very fine text, too!] with footnotes
 in it[Can't think of anything for the second footnote.].

My next consideration is the formatting I use in the rest of my article text, Markdown. Markdown uses the following syntax for links:

[link title](linkurl.com)

I wanted my footnotes to share the same style of formatting with links, since they’re very similar to links. In the end, I chose this as the final formatting style for footnotes:

This is some text{^}(Very fine text, too!) with footnotes
in it{^}(Can't think of anything for the second footnote.).

Of course, you can choose whatever works best for you. Try to be creative and play with multiple options.

Finding the Footnotes

Now we get to do the fun part, and actually get cracking on some code. We’ll start with a regular expression:

$pattern = "/{^}\((.*?)\)/";

It looks a little scary,2 but it’s simply the code to look for the footnote syntax described above. Most of the slashes are to escape special characters, since (, and ) have special meaning in regular expressions. In fact, so do ^, { and }, but these only have meaning in certain cases, which do not apply here.3

PHP’s regular expression engine is pretty much identical to JavaScript’s. We’re only actually using one bit of regular expression magic here, (.*?), which tells the engine to extract anything between the parenthesis, or nothing at all if the parenthesis are empty.

This is pretty much the most basic pattern out there, which most people pick up before actually knowing regular expressions. The period signifies any character, the asterisk signifies any number of the preceding character. So, together, they stand for ‘any number of any characters’. The question mark just means the preceding character or nothing.

To make this clear to us in the future, let’s add a comment mentioning what we’re looking for:

//Footnote Pattern: {^}(<message>)
$pattern = "/{^}\((.*?)\)/";

Now we’ll find all the footnotes in the document by putting our pattern and text through PHP’s preg_match_all function.

preg_match_all($pattern, $txt, $matches);

The last argument, $matches, is simply the variable to place the matches found by preg_match_all. You’d think it would be $matches = preg_match_all($pattern, $txt), or something similar, but PHP is a pretty messy language sometimes.4

We’re also using preg_match_all instead of preg_match since we want to match more than one item.

So, what gets placed in the $matches array when we run our example text through the filter? We can find out by doing a print_r on matches and loading our PHP file in the browser.5

The full code would look like this:

$txt = "This is some text{^}(Very fine text, too!) with "
      ."footnotes in it{^}(Can't think of anything for the "
      ."second footnote.).";

//Footnote Pattern: {^}(<message>)
$pattern = "/{^}\((.*?)\)/";
preg_match_all($pattern, $txt, $matches);
print_r($matches);

The return should look something like this:6

Array
(
    [0] => Array
        (
            [0] => {^}(Very fine text, too!)
            [1] => {^}(Can't think of anything for the second footnote.)
        )

    [1] => Array
        (
            [0] => Very fine text, too!
            [1] => Can't think of anything for the second footnote.
        )

)

The first element of the array is an array containing the fully matched patterns, and the second element is an array containing the part of the match we requested with the (.*?) syntax. Using both of these arrays, replacing the footnote format with actual footnote code is a simple matter.

$notes = Array();
$i = 1;
foreach ($matches[0] as $footnote) {
    $notes[$i] = $matches[1][$i-1];
    $newText   = "<sup class=\"footnote\">"
                 ."<a href=\"#footnote-$i\">$i</a>"
                 ."</sup>";
    $txt = str_replace($footnote, $newText, $txt);
    $i++;
}

This would replace all footnotes with HTML linking to an anchor for the footnote, extracting all footnote text into a numbered array with a explicitly set one-based index.

Go ahead, try it. Make sure it works.

Wrapping It Up

Now that we know exactly how to accomplish our goals, let’s wrap this up in a handy function, like so:

function stripFootnotes(&$txt) {
    //Footnote Pattern: {^}(<message>)
    $pattern = "/{^}\((.*?)\)/";
    preg_match_all($pattern, $txt, $matches);
    $notes = Array();
    $i = 1;
    foreach ($matches[0] as $footnote) {
        $notes[$i] = $matches[1][$i-1];
        $newText   = "<sup class=\"footnote\">"
                    ."<a href=\"#footnote-$i\">$i</a>"
                    ."</sup>";
        $txt = str_replace($footnote, $newText, $txt);
        $i++;
    } return $notes;
}

There’s only one fancy thing we’re doing here. Notice that the argument of the function is prepended by &. &txt signifies that we want to make modifications directly to the variable passed in.

A simple demonstration is in order. The following code would echo “hello world”:

function uppercase($text) { $text = strtoupper($text); }
$string = "hello world";
uppercase($string);
echo $string;

Adding a & to the first line, like so:

function uppercase(&$text) { $text = strtoupper($text); }

Would cause the same code to output “HELLO WORLD”. Clearly, this wouldn’t be the sort of functionality we’d want to use in such a simple example case. It’s ideal for our formatting code because we want to modify the passed text and return an array of footnotes. For us, the & operator allows us to squeeze a little more power out of our implementation.

However, it’s a strange thing to do, and commenting it would be a good idea:

/***********************************************************
 * Removes footnotes from the source text and returns an 
 * array of the stripped content.
 * NOTE: This function writes to the source text directly.
 **********************************************************/
function stripFootnotes(&$txt) {
    //Footnote Pattern: {^}(<message>)
    $pattern = "/{^}\((.*?)\)/";
    preg_match_all($pattern, $txt, $matches);
    $notes = Array();
    $i = 1;
    foreach ($matches[0] as $footnote) {
        $notes[$i] = $matches[1][$i-1];
        $newText = "<sup class=\"footnote\">".
                   "<a href=\"#footnote-$i\">$i</a></sup>";
        $txt = str_replace($footnote, $newText, $txt);
        $i++;
    } return $notes;
}

Also note that we’re putting HTML inside of our PHP, tying our code to a specific output template. This would be daunting for a client-side coder to modify, and is something you would want to remove if you were working as part of a team or had a large code base in which this HTML could theoretically become lost.

However, for my situation, the gains don’t outweigh the problems associated with abstracting the HTML generation.

Putting It To Good Use

In the simplest case, we’d put our function to use like so:

<?php
 $txt = "This is some text{^}(Very fine text, too!) with "
       ."footnotes in it{^}(Can't think of anything for "
       ."the second footnote.).";
 $notes = stripFootnotes($txt);
 echo $txt;
?>

<h2>Footnotes:</h2>
<ol>
    <? foreach ($notes as $i=>$note): ?>
        <li id="footnote-<?=$i?>"><?=$note?></li>
    <?endforeach;?>
</ol>

If we were feeling adventurous, we’d probably want to abstract the functionality from the template and create an Article class. The implementation of this would look very similar to the one above, with the logic of article content retrieval and formatting removed:

<? $a = new Article($id); ?>
<h1>$a->title;</h1>
<div id="content">
    <?= $a->content; ?>
</div>

<h2>Footnotes:</h2>
<ol>
    <? foreach ($a->notes as $i=>$note): ?>
        <li id="footnote-<?=$i?>"><?=$note?></li>
    <?endforeach;?>
</ol>

I’ll leave the actual implementation of that as an exercise for the reader.7


Footnotes:

  1. Or other places where old people hang out. (back)
  2. Like most regular expressions if you're not used to them. (back)
  3. This is sort of why regular expressions are scary. (back)
  4. Sorry, PHP fanboys, but it's true. (back)
  5. Through the webserver, not on our local machines. (back)
  6. Unless you're doing it wrong. (back)
  7. Mostly because confusing you right at the end of my article is good times. (back)