26
Aug
06

First-rest list processing

If you’re a functional programmer (particularly a Lisp or Haskell programmer) — move along. I’m unlikely to say anything you don’t already know.

If you’re not a functional programmer, you might find this of use.

Everyone knows that the right way to traverse a list looks something like this:

int i;
for(i = 0; i < n; i++) {
    do_something_to(list[i]);
}

or maybe this:

node* n;
for(n = list->head; n; n = n->next) {
    do_something_to(n->data);
}

or even, if you’re one of those fancy OO types:

iterator* it;
for(it = list->CreateIterator(); !it->Finished(); it++) {
    do_something_to(*it);
}

You might be sick of writing out a for loop every time you want to munge every element in a list, and prefer to write something along the lines of:

list.map { |elem| do_something_to(elem) }

That’s all well and good, but it’s not the whole story. Sometimes it’s worth processing lists recursively.

Here’s my motivating example: I want to generate HTML. In particular, I want to generate a <p> tag, process the rest of the tokens in a list, and generate a matching </p> tag. Any balanced-action task will do — maybe you want to open and close database connections, or filehandles, or allocate and free memory.

If you’re stuck in a for-loop mentality, you’re going to have a hell of a time remembering to close the things you opened (in the right order, no less!) when you finish off a list. You can create and maintain a stack of actions, remembering to pop off (and close) actions when you get to the end of the list.

Hmm. Stack. Does that suggest recursion, or what?

Here’s how you do it:

process_list(first, rest) {
    open_action(first);
    if(rest != NULL) {
        next = rest.shift; /* pops off first item in rest; rest is destructively shortened */
        process_list(next, rest);
    }
    close_action(first);
}

Isn’t that pleasant? We’ll let the program stack remember where we are, recurse a while, and finish up at the end. Okay, it won’t work if the lists are ridiculously long — we’ll run out of stack space. But for most sane data sets, it’s easier than explicitly maintaining a stack.

First-rest processing works nicely when you’re parsing some data, too. Suppose you have a character string that, unbeknownst to the string itself, is (syntactically) a list of tokens. You can write a full-blown lexer in lex, if you want. You can count the number of tokens, allocate space for them, and then blow through the string, filling the allocated space.

Or you can just process the tokens once at a time:

typedef struct {
    token tok;
    char* rest;
} tok_rest;

tok_rest
next_token(char* rest)
{
    tok_rest tr;
    tr.rest = extract_first_token(rest);
    tr.first = get_token(rest); /* stops before next token */
    return tr;
}

token_list*
tokenize(char* string)
{
    tok_rest tr;
    if(string == NULL) return NULL;
    tr = next_token(string);
    return prepend(tr.tok, tokenize(tr.rest));
}

Okay, I’ve taken a few liberties with string processing and list manipulation, but you get the idea. A first-rest string (or list) processing function should only care about the first element in its argument. It should deal with that element — and let the next recursive call deal with the remainder. Eventually, you get to the end of the list, and everything starts popping back to where you started.


0 Responses to “First-rest list processing”



  1. No Comments Yet

Leave a Reply




anarchocapitalist agitprop

Be advised

I say fuck a lot
Grammar Nazi

Categories

Archives

Statistics FTW