Dangerous Programmer: March 2009

Tuesday, March 31, 2009

365 DoC - W1, D2 - C port of PHP Array functions

So today, I decided to 'port' some of the more useful/relevant PHP builtin array handling functions to what their equivalents in C might look like. Sounds kind of boring, I know, but the end result is both useful and re-usable, and is the kind of thing that people write over and over again... although C doesn't really have true arrays, pointer-pointers do the trick for string arrays (char **'s), etc.

Keep in mind that because of the inherent difference between these two languages, there will be a) some significant differences in the way the resulting function 'ports' operate, and b) there will likely be many more ways to implement a version of the same function in addition to what I've done here, both inside the function itself (different logical implementation) and in the declaration and behaviour of the function as well (ie: destructive or otherwise over-writing functions, vs. versions of the function that return/fill in a new array, etc)...

Problem: create functional counterparts of the following PHP functions in C: array_diff, array_intersect, array_merge, array_reverse, array_search, array_slice, array_splice, array_unique.

Note: there are many more PHP array handling functions than those listed in the problem... some don't apply to C, or aren't suitable for a port for various reasons, some are more complex and will be explored separately on another day (ie: reduce, walk, diff/intersect-type functions with user-specified callbacks, etc), and finally the sorting functions are ignored as we'll ultimately spend whole days dedicated to implementing various sorting algorithms, and so don't need to repeat that stuff here.

Solution: the above functions were implemented in the context of a C function performing the same duties, on string-based "arrays" (ie: char **'s), in a non-destructive manner (ie: array_splice returns a filled-in, separate array, not a modified version of the original, etc.

Note: the pre-processor directive

#include <string.h>

is omitted from the following examples, but they will all need at least that one header to function properly.

array_diff and array_intersect were implemented in the same function, due to the extreme similarities between the logic of each (one returns similarities, one differences), however I'll explain the initial array_diff first, and then how it was modified to do both functions:


int array_diff_str(
  char **array1,
  size_t array1Size,
  char **array2,
  size_t array2Size,
  char **arrayResult)
{
  int count1, count2, found, numFound = 0;

  for ( count1 = 0; count1 < (int)array1Size; count1 ++ ) {
    found = 0;

    for ( count2 = 0; count2 < (int)array2Size; count2 ++ )
      if ( 0 == strcmp(*(array1 + count1), *(array2 + count2)) ) {
        found = 1;

        break;
      }

    if ( !found )
      *(arrayResult + numFound ++) = *(array1 + count1);
  }

  return numFound;
}

Pretty simple... the arguments in order are the first array and it's size, the second array and it's size, and an empty pointer array to hold the result (initialized outside the function to be big enough to hold the largest possible result, ie: size of the first array). The function returns the number of 'elements' in the result array. We basically just loop through the first array, and for each element we loop through the second until we either find it, at which point we set a flag and break, or we don't find it and so add it to the result array.

Now the cool part - to make this work as an array_intersect as well, we just add one argument to flag whether we're doing a diff or not, and then change the

if ( !found )

if ( found ^ diff )

The XOR here will make the expression evaluate to true only if found is false and diff is true (we're calculating a difference), or the other way around (we're calculating an intersection)... if the element was found AND we're doing a diff, it doesn't end up in the result... likewise, if it wasn't found AND we're doing an intersection, it also won't end up in the result. Here's the final function that does both jobs:


int array_diff_intersect_str(
  char **array1,
  size_t array1Size,
  char **array2,
  size_t array2Size,
  char **arrayResult,
  int diff)
{
  int count1, count2, found, numFound = 0;

  for ( count1 = 0; count1 < (int)array1Size; count1 ++ ) {
    found = 0;

    for ( count2 = 0; count2 < (int)array2Size; count2 ++ )
      if ( 0 == strcmp(*(array1 + count1), *(array2 + count2)) ) {
        found = 1;

        break;
      }

    if ( found ^ diff )
      *(arrayResult + numFound ++) = *(array1 + count1);
  }

  return numFound;
}

Next, we have an implementation of array_merge, which we re-name and modify slightly as array_union_str, which instead of simply appending one array onto another, will product an actual union of the values in each array.


int array_union_str(
  char **array1,
  size_t array1Size,
  char **array2,
  size_t array2Size,
  char **arrayResult)
{
  int count, arrayDiffSize;

  char *arrayDiff[array1Size];

  arrayDiffSize = array_diff_intersect_str(array1, array1Size, array2, array2Size, arrayDiff, 1);

  for ( count = 0; count < array2Size; count ++ )
    arrayResult[i] = array2[i];

  for ( count = 0; count < arrayDiffSize; count ++ )
    arrayResult[i + array2Size] = arrayDiff[i];

  return array2Size + arrayDiffSize;
}

First, we use our previously created array_diff_intersect_str() function to produce the diff of the two arrays to be married (cue drum roll). Once we have that, we just 'append' the elements in the diff to the second array. The first for loop adds all of the second array to the result array, and the next for loop adds the elements from the diff result array. As with array_diff_intersect_str, we're passing two arrays and their sizes, along with an 'empty' result array to be filled in, and we're returning the size of the result array.

array_reverse is somewhat simpler than the previous two, and unlike them it does modify the original array (seemed too simple to bother with a result array and copying pointers when we can just move them around in place!)...


void array_reverse_str(
  char **array,
  size_t arraySize)
{
  int count;

  char *tmp;

  for ( count = 0; count < (arraySize - (arraySize % 2)) / 2; count ++ ) {
    tmp = array[count];
    array[count] = array[arraySize - count - 1];
    array[arraySize - count - 1] = tmp;
  }
}

The modulus in the for expression basically causes it to ignore the middle element for an array with an odd number of elements... if the array size is odd, %2 will return 1, and we'll subtract one from the size of the array before dividing it in half to get the number of iterations to perform. Other than that, it's pretty self-explanatory...


int array_search_str(
  char **arrayHaystack,
  size_t haystackSize,
  char *needle)
{
  int count;

  for ( count = 0; count < (int)haystackSize; count ++ )
    if ( 0 == strcmp(arrayHaystack[count], needle) )
      return count;

  return -1;
}

Another pretty simple one: we just enumerate through arrayHaystack and compare each 'element' to needle... if it's a match, we return the index of the element, otherwise we'll end up returning -1 to indicate no match was found.


int array_slice_str(
  char **array,
  size_t arraySize,
  int offset,
  int length,
  char **resultArray)
{
  if ( 0 == length )
    length = arraySize - offset;
  else if ( 0 > length )
    length = arraySize - offset + length;

  resultArray = array + length;

  return length;
}

Because we're working with C, we can take a shortcut here and not bother removing the 'old' elements that originally followed the slice - we just set resultArray to the position in array where the slice begins, and calculate & return the size of the slice (resultArray)... again, because we're writing C here, the calling code can't assume there's anything beyond the X elements we tell it are in resultArray, and so although there are still extra trailing elements, they can safely be left as-is and we don't have to do nearly as much work as we would (because, you know, it's VERY difficult to null-ify a pointer... what a pain that would be!).


int array_splice_str(
  char **array,
  size_t arraySize,
  int offset,
  int length,
  char **replacementArray,
  size_t replaceSize,
  char **resultArray,
  size_t resultSize)
{
  int count, tailSize;

  if ( 0 == length )
    tailSize = arraySize - offset;
  else if ( 0 > length )
    tailSize = -length;
  else
    tailSize = arraySize - offset - length;

  for ( count = 0; count < offset; count ++ )
    resultArray[count] = array[count];

  for ( count = 0; count < replaceSize; count ++ )
    resultArray[offset + count] = replacementArray[count];

  for ( count = 0; count < tailSize; count ++ )
    resultArray[offset + replaceSize + count] = array[arraySize - tailSize + count];

  return offset + tailSize + replaceSize;
}

Since it's the most complicated, array_splice_str() deserves more explanation than the last few... this is roughly a port of array_splice, and lets you insert an array into another (possibly replacing/overwriting a slice in the array we're inserting into). First, we calculate tailSize, which is the length of the of the portion of the original array that falls after the slice we're replacing. If length is zero, then the tailSize is simply the size of the original array minus the value of offset. If it's negative (the slice ends X places from the end of the array), then the tailSize is simply the negation of the length value (ie: a length of -2 means the slice will end 2 places from the end of the array, so the tail size is simply -(-2), or 2). Otherwise, for a normal, positive length value, the tail size is the array size, minus the offset, minus the length of the slice.

Next, we proceed to build the new (spliced) array in three stages: first, we add elements from the original array, up to the offest. Then, we append all the elements from the replacement array. Lastly, we append tailSize elements from the original array, starting with the index equal to the original array size minus the tailSize.

Finally, we return the size of the spliced array: offset plus tailSize plus replaceSize.


int array_unique_str(
  char **array,
  size_t arraySize,
  char **resultArray)
{
  int count, resultSize = 0;

  for ( count = 0; count < arraySize; count ++ )
    if ( 0 > array_search_str(resultArray, resultSize, array[count]) )
      resultArray[resultSize ++] = array[count];

  return resultSize;
}

Our last array-handling PHP function port is array_unique, which returns an array comprised of all unique values in the original. For this implementation, like others, we build on previous ported functions, and make use of array_search_str to do the heavy lifting. That way, all we really need to do now is loop through the elements of array, and check if we have each value in resultArray yet... if not, we can 'append' it and continue.

That's it for today! Stay tuned for another better-late-than never post (day 3 - some JavaScript date/time handling with a few other goodies mixed in!).

Monday, March 30, 2009

365 DoC - Call for Topics/Disclaimer

First off, just wanted to ask any one who's reading this to please send any and all suggestions you have re. topics/excercises/etc to use as one of the 365 days... I have a pretty decent list of stuff to do, things to work on, etc., but it will be good to have as many options as possible and as much material as I can to choose from when deciding on the mission for a given day.

Also, as far as the actual resulting code goes, I in no way guarantee that anything I produce and subsequently post about here will be a) optimized in any way, b) the most appropriate/elegant/efficient/etc solution to a given problem, c) suitable for usage in any real applications, etc. Having said that, I'll obviously do my best to post working, usable code, but at the heart of it the main purpose here is to code every day and hone my skills, etc, and so the blog posts and accompanying code are just a side effect.

Lastly, just because I'm writing code every day, doesn't mean I'll have time to post about it as well (in fact, writing code every day makes it that much more unlikely that I'll have a new post daily)... I will however do my best to try and post, at least briefly, eventually, about every day's code. It may be a few days late, and you'll probably see chunks of 3 or 4 day's worth of posts appear all at once, but I'll try to get them all in here sooner or later.

So, without further ado... on to the next day's code!

Saturday, March 28, 2009

365 DoC - W1, D1 - Subset Permutations

Problem: based on a given finite set of elements, find every possible combination of X elements, where X indicates 1 to X.

Example: in the finite set of elements (A, B, C), all possible combinations of up to three elements would be ((A), (B), (C), (A, B), (A, C), (B, C), (A, B, C)) - this assumes that we would treat (A, B) and (B, A) as the same permutation of a 2-element subset, etc.

Solution: if you think of the set of elements as digits in a number system, then you simply need to enumerate between 'zero' and the max for the number of elements in each permutation, while eliminating duplicates. For example, decimal (base-10) is a number system comprised of a set of 10 possible elements: the digits 0 through 9. So, given this set, if we wanted to find all the permutations of up to 3 elements, we simply count from 0 to 999, and eliminate duplicates/repeats, ie: '999' would be eliminated because we would already have '9' in the list of possible permutations... likewise, '21' would be omitted, since at that point we'd already have '12' in the set of possible permutations of digit combinations ('21' is just a re-arrangement of the digits in '12').

Sounds complicated, but we really just need two things: a function to 'increment' a set of elements (the permutation), and an easy/efficient way to check for duplicates/repetitions.

The first item will end up being a simple implementation of an elementary school place-value lesson... the pseudo-code will look something like this:

possible elements: A, B, C
permutation to 'increment': (A, C)

- set the current 'place' we're working on to zero (the first 'digit', 'C' in this case if we're working right-to-left)
- is the 'value' of the element at the current place the last possible value in the set of possible elements?
yes: set the value of the current place to the first element in the set, move the place up one, and repeat
no: set the value of the current place to the next element in the set

'A, C' would be "incremented" to 'B, A'

This is a actually a good candidate for recursion, although I didn't really plan on talking about recursion in today's excercise... here's a simple implementation of the above 'algorithm' in PHP...


$possibleElements = array('A', 'B', 'C');

function incrementPermutation(&$perm, $place = 0)
{
  global $possibleElements;

  if ( !isset($perm[$place]) )
    $perm[$place] = $possibleElements[0];
  else {
    $valueIndex = array_search($perm[$place], $possibleElements) + 1;

    if ( $valueIndex < count($possibleElements) )
      $perm[$place] = $possibleElements[$valueIndex];
    else {
      $perm[$place] = $possibleElements[0];

      $place ++;

      incrementPermutation($perm, $place);
    }
  }
}

The implementation is slightly incomplete - there's not argument checking (the $perms parameter needs to be an array), no checking of the $possibleElements array, and there's some other updates we could make so it's more general-purpose, but it will work as-is. PHP might optimize it out anyways, but for the sake of writing good code we've kept the $possibleElements declaration/initialization outside of the function body so that it's not re-declared every time we (potentially) recurse.

Instead of commenting the function to explain it's operation, let's dissect it piece by piece:


$possibleElements = array('A', 'B', 'C');

This is just setting up an array to hold the full set of elements from which we'll be generating permutations - if we were working with a real numbering system, this would hold our digits, in order... ie: array('0', '1', ..., '9');


function incrementPermutation(&$perm, $place = 0)
{
  global $possibleElements;

Our function declaration... we're passing $perm, which is an array representing the existing permutation to increment, by reference so we don't need to write clumsy $var = func($var) type statements all over. $place defaults to zero so that outside of the recursive calls, ie: elsewhere in our code, we just call incrementPermutation($perm) without worrying about explicitly telling the function to start at the "least-significant" spot in the permutation.

Again using the example of a numbering system, $place indicates the place-position in the number that we're working with. This corresponds to the 'ones' or 'tens' or 'hundreds' position in a decimal numbering system, if you remember back to the explanation you got in grade school.


if ( !isset($perm[$place]) )
    $perm[$place] = $possibleElements[0];

First we check if the $place we're working with has already been set/defined... if not, incrementing is easy: we just set it to the first element in $possibleElements and we're done.


else {
    $valueIndex = array_search($perm[$place], $possibleElements) + 1;

Otherwise, we'll need to do some real work - first we need to figure out where in the set of possible elements the current value of the element at $place in our permutation resides. We increment that because in both places where we're about to use it we need to increment it anyways, so we might as well do that right off the bat...


if ( $valueIndex < count($possibleElements) )
      $perm[$place] = $possibleElements[$valueIndex];

This expression, if True, indicates that we don't need to "carry-over" anything - we can increment the value at the current place without running out of elements in the original set to use. $valueIndex represents the next possible index/key in the array of possible elements. If it's not less than the size of $possibleElements, that means we're already at the last possible element and need to 'carry-over'. Otherwise, it will 'point' to the next possible element, which is used as the incremented value for this $place in our permutation.


else {
      $perm[$place] = $possibleElements[0];

      $place ++;

      incrementPermutation($perm, $place);
    }

If we do have to carry-over a value, we set the current place's value to the first in the set of possible elements, increment the $place variable, and recurse, passing in the new value of $place, in order to perform the whole increment operation on the next place over.

Next, we need to deal with the task of identifying duplicates/repetitions in our permutations. The solution is relatively simple... we're going to write a function that generates a numerical value for our resulting permutation, based on a bitmask that assigns a different (bit-)value to each possible element in our original set. To generate the value of our permutation, we perform a bitwise OR on the bits corresponding to each 'digit' in the permutation, and then obtain the resulting decimal representation of the number. If this matches the value of a permutation already in our 'found permutations' array, which we're tracking, then we can safely ignore it.

Here's the function:


function permutationValue($perm)
{
    global $possibleElements;

    $value = 0;

    foreach ( $perm as $element )
        $value |= 1 << array_search($element, $possibleElements);

    return $value;
}

It's pretty simple - we're just finding the position (index) of each element in our permutation, and using that as a bitshift operand in our |= operation, which 'adds' bits to $value.

And that's pretty much it - using these two functions, we can enumerate all possible combinations of the elements in a given, arbitrary set, and by keeping track of our permutation 'values', we can check for duplicates/repetition before adding each found permutation to the final list (array) of permutations discovered.

Here's the complete version of the code, with a built-in example... you can see the output here.


// define the members of the entire set
$possibleElements = array('A', 'B', 'C');

// the max size of a permutation is the number of elements in the original set
$maxSetSize = count($possibleElements);

// initialize an array to store the found permutations
$permutations = array();

// initialize an array to store the "values" of each permutation found
$permutationValues = array();

// determine the "value" of a permutation
function permutationValue($perm)
{
    global $possibleElements;

    $value = 0;

    foreach ( $perm as $element )
                    $value |= (1 << array_search($element, $possibleElements));

    return $value;
}

// "increment" a permutation, using each element of the original/whole set as a possible "digit"
// in our arbitrary number system
function incrementPermutation(&$perm, $place = 0)
{
  global $possibleElements;

  if ( !isset($perm[$place]) )
    $perm[$place] = $possibleElements[0];
  else {
    $valueIndex = array_search($perm[$place], $possibleElements) + 1;

    if ( $valueIndex < count($possibleElements) )
      $perm[$place] = $possibleElements[$valueIndex];
    else {
      $perm[$place] = $possibleElements[0];

      $place ++;

      incrementPermutation($perm, $place);
    }
  }
}

$currentPerm = array();

while ( count($currentPerm) <= $maxSetSize ) {
  // "increment" the current permutation
  incrementPermutation($currentPerm);

  // calculate the value of the new permutation
  $currentPermValue = permutationValue($currentPerm);

  // if it's not already in our array of values, add the new
  // permutation to the list of ones we've found
  if ( FALSE === array_search($currentPermValue, $permutationValues)) {
    array_push($permutations, $currentPerm);
    array_push($permutationValues, $currentPermValue);
  }
}

// Dump out the final list of permutations in semi-pretty/human-readable format
print_r($permutations);

365 Days of Code

That's right... one piece of working, usable, non-trivial code (ie: it actually does something useful), every single day, for an entire year.

Ambitious? Maybe. Necessary? I think so... you see, at some unknown point in recent history, I made a conscious decision to strive to be The Best at what I do. And what I do, is Code. It's my Art. I don't do this for a living because it pays big bucks or I'm trying to become a millionaire, I don't do it because I figured computers would be a safe bet for finding employment, or anything silly like that - I have another ridiculous explanation: for some reason, I love programming. Yes, I'm a big geek. And because it's something I love, my goal is simpler, although much harder to achieve: perfection, or as close as I can get, in the Art of Code.

Impossible? Definitely... but that's besides the point, which is that no one ever became a rockstar by sitting on their ass and dreaming about how awesome it would be - they did it by playing their guitar for hours on end every single bloody day until their fingers bled and they were really, really, ridiculously good at it. And then (hopefully), they practice some more, because even though everyone else already knows they're awesome, they still don't think it's good enough.

And so, we come to the real point: I'm not going to become the next Torvalds or Kernighan or Knuth (yes, these are my heroes, that's how much of a nerd I am - there, I said it) by working a lot of overtime, reading, or cranking out a lot of code, but only by making a conscious effort to hone my skills, any and every relevant way possible, every single day. No samurai was ever born on a couch.

I need to turn my Art (which, althogh already pretty awesome, is still just stick people playing kick-ball on a piece of loose-leaf lined paper), into something closer to Code Budo. The Way of the Code Warrior. The Art of the Software Sandbenders.

So. One (at least) piece of working code, every single day, for an entire year.

And here... we... go :)

Dangerous Programmer