Tuesday, March 31, 2009

365 DoC - W1, D2 - C port of PHP Array functions

So today, I decided to 'port' some of the more useful/relevant PHP builtin array handling functions to what their equivalents in C might look like. Sounds kind of boring, I know, but the end result is both useful and re-usable, and is the kind of thing that people write over and over again... although C doesn't really have true arrays, pointer-pointers do the trick for string arrays (char **'s), etc.

Keep in mind that because of the inherent difference between these two languages, there will be a) some significant differences in the way the resulting function 'ports' operate, and b) there will likely be many more ways to implement a version of the same function in addition to what I've done here, both inside the function itself (different logical implementation) and in the declaration and behaviour of the function as well (ie: destructive or otherwise over-writing functions, vs. versions of the function that return/fill in a new array, etc)...

Problem: create functional counterparts of the following PHP functions in C: array_diff, array_intersect, array_merge, array_reverse, array_search, array_slice, array_splice, array_unique.

Note: there are many more PHP array handling functions than those listed in the problem... some don't apply to C, or aren't suitable for a port for various reasons, some are more complex and will be explored separately on another day (ie: reduce, walk, diff/intersect-type functions with user-specified callbacks, etc), and finally the sorting functions are ignored as we'll ultimately spend whole days dedicated to implementing various sorting algorithms, and so don't need to repeat that stuff here.

Solution: the above functions were implemented in the context of a C function performing the same duties, on string-based "arrays" (ie: char **'s), in a non-destructive manner (ie: array_splice returns a filled-in, separate array, not a modified version of the original, etc.

Note: the pre-processor directive

#include <string.h>
is omitted from the following examples, but they will all need at least that one header to function properly.

array_diff and array_intersect were implemented in the same function, due to the extreme similarities between the logic of each (one returns similarities, one differences), however I'll explain the initial array_diff first, and then how it was modified to do both functions:

int array_diff_str(
char **array1,
size_t array1Size,
char **array2,
size_t array2Size,
char **arrayResult)
{
int count1, count2, found, numFound = 0;

for ( count1 = 0; count1 < (int)array1Size; count1 ++ ) {
found = 0;

for ( count2 = 0; count2 < (int)array2Size; count2 ++ )
if ( 0 == strcmp(*(array1 + count1), *(array2 + count2)) ) {
found = 1;

break;
}

if ( !found )
*(arrayResult + numFound ++) = *(array1 + count1);
}

return numFound;
}

Pretty simple... the arguments in order are the first array and it's size, the second array and it's size, and an empty pointer array to hold the result (initialized outside the function to be big enough to hold the largest possible result, ie: size of the first array). The function returns the number of 'elements' in the result array. We basically just loop through the first array, and for each element we loop through the second until we either find it, at which point we set a flag and break, or we don't find it and so add it to the result array.

Now the cool part - to make this work as an array_intersect as well, we just add one argument to flag whether we're doing a diff or not, and then change the
if ( !found )
to
if ( found ^ diff )
The XOR here will make the expression evaluate to true only if found is false and diff is true (we're calculating a difference), or the other way around (we're calculating an intersection)... if the element was found AND we're doing a diff, it doesn't end up in the result... likewise, if it wasn't found AND we're doing an intersection, it also won't end up in the result. Here's the final function that does both jobs:

int array_diff_intersect_str(
char **array1,
size_t array1Size,
char **array2,
size_t array2Size,
char **arrayResult,
int diff)
{
int count1, count2, found, numFound = 0;

for ( count1 = 0; count1 < (int)array1Size; count1 ++ ) {
found = 0;

for ( count2 = 0; count2 < (int)array2Size; count2 ++ )
if ( 0 == strcmp(*(array1 + count1), *(array2 + count2)) ) {
found = 1;

break;
}

if ( found ^ diff )
*(arrayResult + numFound ++) = *(array1 + count1);
}

return numFound;
}

Next, we have an implementation of array_merge, which we re-name and modify slightly as array_union_str, which instead of simply appending one array onto another, will product an actual union of the values in each array.

int array_union_str(
char **array1,
size_t array1Size,
char **array2,
size_t array2Size,
char **arrayResult)
{
int count, arrayDiffSize;

char *arrayDiff[array1Size];

arrayDiffSize = array_diff_intersect_str(array1, array1Size, array2, array2Size, arrayDiff, 1);

for ( count = 0; count < array2Size; count ++ )
arrayResult[i] = array2[i];

for ( count = 0; count < arrayDiffSize; count ++ )
arrayResult[i + array2Size] = arrayDiff[i];

return array2Size + arrayDiffSize;
}

First, we use our previously created array_diff_intersect_str() function to produce the diff of the two arrays to be married (cue drum roll). Once we have that, we just 'append' the elements in the diff to the second array. The first for loop adds all of the second array to the result array, and the next for loop adds the elements from the diff result array. As with array_diff_intersect_str, we're passing two arrays and their sizes, along with an 'empty' result array to be filled in, and we're returning the size of the result array.

array_reverse is somewhat simpler than the previous two, and unlike them it does modify the original array (seemed too simple to bother with a result array and copying pointers when we can just move them around in place!)...

void array_reverse_str(
char **array,
size_t arraySize)
{
int count;

char *tmp;

for ( count = 0; count < (arraySize - (arraySize % 2)) / 2; count ++ ) {
tmp = array[count];
array[count] = array[arraySize - count - 1];
array[arraySize - count - 1] = tmp;
}
}

The modulus in the for expression basically causes it to ignore the middle element for an array with an odd number of elements... if the array size is odd, %2 will return 1, and we'll subtract one from the size of the array before dividing it in half to get the number of iterations to perform. Other than that, it's pretty self-explanatory...

int array_search_str(
char **arrayHaystack,
size_t haystackSize,
char *needle)
{
int count;

for ( count = 0; count < (int)haystackSize; count ++ )
if ( 0 == strcmp(arrayHaystack[count], needle) )
return count;

return -1;
}

Another pretty simple one: we just enumerate through arrayHaystack and compare each 'element' to needle... if it's a match, we return the index of the element, otherwise we'll end up returning -1 to indicate no match was found.

int array_slice_str(
char **array,
size_t arraySize,
int offset,
int length,
char **resultArray)
{
if ( 0 == length )
length = arraySize - offset;
else if ( 0 > length )
length = arraySize - offset + length;

resultArray = array + length;

return length;
}

Because we're working with C, we can take a shortcut here and not bother removing the 'old' elements that originally followed the slice - we just set resultArray to the position in array where the slice begins, and calculate & return the size of the slice (resultArray)... again, because we're writing C here, the calling code can't assume there's anything beyond the X elements we tell it are in resultArray, and so although there are still extra trailing elements, they can safely be left as-is and we don't have to do nearly as much work as we would (because, you know, it's VERY difficult to null-ify a pointer... what a pain that would be!).

int array_splice_str(
char **array,
size_t arraySize,
int offset,
int length,
char **replacementArray,
size_t replaceSize,
char **resultArray,
size_t resultSize)
{
int count, tailSize;

if ( 0 == length )
tailSize = arraySize - offset;
else if ( 0 > length )
tailSize = -length;
else
tailSize = arraySize - offset - length;

for ( count = 0; count < offset; count ++ )
resultArray[count] = array[count];

for ( count = 0; count < replaceSize; count ++ )
resultArray[offset + count] = replacementArray[count];

for ( count = 0; count < tailSize; count ++ )
resultArray[offset + replaceSize + count] = array[arraySize - tailSize + count];

return offset + tailSize + replaceSize;
}

Since it's the most complicated, array_splice_str() deserves more explanation than the last few... this is roughly a port of array_splice, and lets you insert an array into another (possibly replacing/overwriting a slice in the array we're inserting into). First, we calculate tailSize, which is the length of the of the portion of the original array that falls after the slice we're replacing. If length is zero, then the tailSize is simply the size of the original array minus the value of offset. If it's negative (the slice ends X places from the end of the array), then the tailSize is simply the negation of the length value (ie: a length of -2 means the slice will end 2 places from the end of the array, so the tail size is simply -(-2), or 2). Otherwise, for a normal, positive length value, the tail size is the array size, minus the offset, minus the length of the slice.

Next, we proceed to build the new (spliced) array in three stages: first, we add elements from the original array, up to the offest. Then, we append all the elements from the replacement array. Lastly, we append tailSize elements from the original array, starting with the index equal to the original array size minus the tailSize.

Finally, we return the size of the spliced array: offset plus tailSize plus replaceSize.

int array_unique_str(
char **array,
size_t arraySize,
char **resultArray)
{
int count, resultSize = 0;

for ( count = 0; count < arraySize; count ++ )
if ( 0 > array_search_str(resultArray, resultSize, array[count]) )
resultArray[resultSize ++] = array[count];

return resultSize;
}

Our last array-handling PHP function port is array_unique, which returns an array comprised of all unique values in the original. For this implementation, like others, we build on previous ported functions, and make use of array_search_str to do the heavy lifting. That way, all we really need to do now is loop through the elements of array, and check if we have each value in resultArray yet... if not, we can 'append' it and continue.

That's it for today! Stay tuned for another better-late-than never post (day 3 - some JavaScript date/time handling with a few other goodies mixed in!).

13 comments:

  1. Egh, been ages since I've used a compiled language. Interesting stuff though. Wonder how those benchmark.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. This comment has been removed by a blog administrator.

    ReplyDelete