Wednesday, March 28, 2007

XMMS disk_writer plugin patch

This is a (very) small patch I wrote a while ago for the disk_writer plugin of xmms (an open-source X windows media player, modelled after WinAmp). The post in the link is pretty self-explanatory, it's taken from the xmms development list where I sent it... basically it simply modifies how the plugin build the filename to write to so that you get an 'auto-rotate' effect instead of overwriting your previous file recorded from the same source.

Unfortunately, at the time xmms already had a lack of active development, and as far as I can tell is currently no longer an active project. Nevertheless, the patch is there if you're interested/want to take a look, and it does work (I use it myself)...

Sunday, March 25, 2007

RFC-Compliant URI Validation

Recently, as part of another project, I needed some code to validate a URI string based on RFC-2396. The goal here was the ability to ensure that a URI was RFC compliant. As such, I decided to use a set of regular expressions which were directly modelled from the ABNF definitions in the RFC. ABNF is by it's nature a very close match for regular expressions in terms of usage, syntax and purpose, and so using them seemed like a logical method of building the URI validation code.

I started by creating an expression for the simplest (and first) definitions in the RFC. 'lowalpha' is defined by the ABNF as being one of the characters a-z inclusive, while 'upalpha' is defined as A-Z inclusive. 'alpha' is defined as either a 'lowalpha' or an 'upalpha' character. 'digit' is defined as one of the characters 0-9 inclusive. Lastly, 'alphanum' is defined as being either an 'alpha' or a 'digit' character. Based on these five definitions, I could create five matching regular expressions which would serve the purpose of indicating whether an arbitrary string matches one of these definitions or not.

<?php

define('LOWALPHA', '[a-z]');
define('UPALPHA', '[A-Z]');

define('ALPHA', '(?:'.LOWALPHA.'|'.UPALPHA.')');
/// (?:[a-z]|[A-Z])

define('ALPHA_OPT', '[a-zA-Z]');

define('DIGIT', '[0-9]');

define('ALPHANUM', '(?:'.ALPHA.'|'.DIGIT.')');
/// (?:(?:[a-z]|[A-Z])|[0-9])

define('ALPHANUM_OPT', '[a-zA-Z0-9]');

?>
The defined expressions ending in _OPT are optimized versions of the regular expression - ie: it's much more efficient to execute a single expression which is a range like [a-zA-Z] than it is to execute two adjacent ranges such as [a-z]|[A-Z].

Within the final implementation, expressions have been optimized where possible but for the most part they mirror the ABNF in the document more or less directly. Almost all the optimization that is present occurs at the lowest level, ie: in the simplest, base expressions from which the further, more complicated expressions are constructed. This approach seems to work since any optimization can loosely be thought of as having an exponential benefit, relative to how low of a level the optimization is performed at.

UriValidator

Thursday, March 22, 2007

A New Project

So today I finally decided to start writing a proper blog, after toying with the idea on and off for a little while. Among other things, I want all of the random code I write to be freely available and accessible somewhere online under the GPL and it doesn't always fit nicely into existing projects/categories/paradigms/etc. I'll still send and/or upload whatever I can where an appropriate project exists somewhere, but this gives me a nice way to keep everything referenced from one place, regardless of what it is. Yes, I know - I could just upload everything into it's own project on SourceForge or something, but this also lets me explain everything and talk about stuff in an informal way. It also provides space for me to talk about programming and related topics in general, and gives me somewhere to post random thoughts and opinions on the world of software development. Lastly, I believe the whole idea of open source shouldn't be limited to the actual source itself - the whole process of creating software should be an organic, open, and cooperative activity in most cases, and so I'll be using this to document the life of my various projects, from start to 'release'.