[Status] Upgrade Delays

I just found out that the Atom CPU and mainboard that I ordered is on backorder won’t be getting here until the end of the month (why did the site I purchased it from list it as “In Stock” when it wasn’t really in stock?). So now we play the waiting game, and hope that this old server is able to keep holding up.

Posted in status | Leave a comment

[Objective-C + Cocoa] UIScrollView and contentSize

So here’s a simple one. For whatever reason, a UIScrollView instance only behaves correctly if you programmatically set its contentSize when you use it. This is fairly silly because in most cases the contentSize is simply the total size of the UIScrollView’s subview(s). Why the UIScrollView class doesn’t provide at least the option of automatically determining its own contentSize based upon its current subviews is beyond me, but here is some simple code to approximate this behavior:

@interface UIScrollView(auto_size)
- (void) adjustHeightForCurrentSubviews: (int) verticalPadding;
- (void) adjustWidthForCurrentSubviews: (int) horizontalPadding;
- (void) adjustWidth: (bool) changeWidth andHeight: (bool) changeHeight withHorizontalPadding: (int) horizontalPadding andVerticalPadding: (int) verticalPadding;
@end

@implementation UIScrollView(auto_size) 
- (void) adjustWidth: (bool) changeWidth andHeight: (bool) changeHeight withHorizontalPadding: (int) horizontalPadding andVerticalPadding: (int) verticalPadding {
    float contentWidth = horizontalPadding;
    float contentHeight = verticalPadding;
    for (UIView* subview in self.subviews) {
        [subview sizeToFit];
        contentWidth += subview.frame.size.width;
        contentHeight += subview.frame.size.height;
    }
    
    contentWidth = changeWidth ? contentWidth : self.superview.frame.size.width;
    contentHeight = changeHeight ? contentHeight : self.superview.frame.size.height;
    
    NSLog(@"Adjusting ScrollView size to %fx%f, verticalPadding=%d, horizontalPadding=%d", contentWidth, contentHeight, verticalPadding, horizontalPadding);
    self.contentSize = CGSizeMake(contentWidth, contentHeight);
}

- (void) adjustHeightForCurrentSubviews: (int) verticalPadding {
    [self adjustWidth:NO andHeight:YES withHorizontalPadding:0 andVerticalPadding:verticalPadding];
}

- (void) adjustWidthForCurrentSubviews: (int) horizontalPadding {
    [self adjustWidth:YES andHeight:NO withHorizontalPadding:horizontalPadding andVerticalPadding:0];
}
@end

This code allows a UIScrollView to internally determine its contentSize based upon its current subviews; all you have to do is call one of the three interface methods at an appropriate time (like from within your parent view-controller’s ‘viewDidLoad:‘ implementation). Note that while auto-sizing based upon both width and height is supported, you will only get a correct result for width if all of the UIScrollView’s subviews span the entire height of the view, and you will only get a correct result for height if all of your subviews span the entire width of the view. For instance, if you add a thumbnail image to the UIScrollView and then drag a UILabel next to it then both of them will count towards the computed height even though they are logically on the same row.

You can work around this limitation either by using the ‘…padding’ parameters to adjust the final contentSize, or by adding a UIView that spans the width of the UIScrollView and placing both your thumbnail image and UILabel as subviews of that UIView instead of the UIScrollView. The latter option of using a nested UIView to contain the content of the row is a better/more maintainable way to build an interface anyways (and also building a UI in Android basically requires you to follow this pattern, so best to get used to it). But I did try various approaches to solve this problem automatically in the code, such as keeping track min and max x/y coordinates of every subview in the UIScrollView, but this gave inconsistent results between the initial time the view was displayed and subsequent times.

Posted in coding, objective-c | Tagged , , | 8 Comments

[Cocoa + iPhone] UITableViewCell: It’s Broken!

I present for your consideration the following screenshot:

UITableViewCell is broken!

It shows a basic table-view, in which each cell has been assigned the same image (using its built-in ‘imageView‘ property). The source image is 20 pixels square, and the imageView’s ‘contentMode‘ property has not been changed (not that changing it makes any difference). The image for each row is also being scaled to 50% and rendered at the orientation stated in the cell text. The code for the table controller is as follows:

#import "UITableViewTestViewController.h"

static NSString* rowNames[8] = {@"UIImageOrientationUp", @"UIImageOrientationDown", @"UIImageOrientationLeft", @"UIImageOrientationRight", 
                                @"UIImageOrientationUpMirrored", @"UIImageOrientationDownMirrored", @"UIImageOrientationLeftMirrored", 
                                @"UIImageOrientationRightMirrored"};

#define IMAGE_NAME @"testImage.png"

@implementation UITableViewTestViewController

- (void)dealloc {
    [super dealloc];
}

- (void)didReceiveMemoryWarning {
    // Releases the view if it doesn't have a superview.
    [super didReceiveMemoryWarning];
}

#pragma mark - View lifecycle
- (int) tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section {
    return 8;  //number of elements in the enumeration
}

- (int) numberOfSectionsInTableView:(UITableView *)tableView {
    return 1;
}

- (UITableViewCell*) tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath {
    static NSString* cellIdentifier = @"TestCell";
    
    //return a basic cell with the icon in it and some text
    UITableViewCell* cell = [tableView dequeueReusableCellWithIdentifier:@"StationCell"];
    if (cell == nil) {
        //init cell
        cell = [[[UITableViewCell alloc] initWithFrame:CGRectZero reuseIdentifier:cellIdentifier] autorelease];
    }
    
    cell.accessoryType = UITableViewCellAccessoryNone;
    cell.textLabel.text = rowNames[indexPath.row];          //enum starts from 0, so indexPath.row matches the orientation that we are going to apply
    cell.textLabel.font = [cell.textLabel.font fontWithSize:12.0];
    cell.textLabel.textColor = [UIColor darkGrayColor];
    cell.imageView.image =  [UIImage imageWithCGImage:[UIImage imageNamed:IMAGE_NAME].CGImage scale:0.5 orientation:indexPath.row];  //the scale operation will be ignored for UIImageOrientationUp; because something is broken
    
    return cell;
}

- (void) tableView:(UITableView *)tableView willDisplayCell:(UITableViewCell *)cell forRowAtIndexPath:(NSIndexPath *)indexPath {
    //it makes no difference if we set the image here
    //cell.imageView.image =  [UIImage imageWithCGImage:[UIImage imageNamed:IMAGE_NAME].CGImage scale:0.5 orientation:indexPath.row];
}
@end

It’s not doing anything all that special, but as you can see in the screenshot the image in the first cell is rendered differently than all the others. More specifically, it is being stretched to the full size of its container so that it just looks kind of sad, and no amount of programmatic scale operations will fix it.

This can be one of the most maddening aspects about working with table-cells and images. If you want an image that is slightly smaller than its container in the table-cell, or that is centered away from the top/side, then the only consistent way to do so is to create a custom table-cell. And while it is not difficult to create a custom table-cell that implements the desired behavior, it needlessly clutters the source-tree with code that replicates functionality that Apple is supposed to be providing out of the box.

The problem, as exposed by this example code, is that when an image is scaled using UIImageOrientationUp (which is what most developers would use, given that they generally store their images in the orientation they want them displayed at) the UITableViewCell completely ignores the scaling operation. I can only speculate as to the reason for this odd behavior, because at the very least I would expect the output to be the same no matter what UIImageOrientation is used (i.e. I would think that scaling would either consistently not work or consistently work, but this is manifestly not the case).

In any case, this behavior is very clearly a bug, and a particularly inconvenient one at that. But it does expose a potential workaround that generates less source-clutter than creating a custom table-cell implementation every time you want to have cell images that actually work. Just store your images upside-down (or preprocess them so that they are upside-down prior to adding to the table) and then invert them back to the proper orientation when you scale them to the size you want for your table.

It’s dodgy as all hell to do it that way, but still arguably better than reimplementing functionality that Apple is supposed to be providing out of the box.

Project source code is available here: http://codethink.no-ip.org/UITableViewTest.zip

Posted in coding, objective-c | Tagged , , , | Leave a comment

[Cocoa + iPhone] Unraveling Apple’s Pagecurl

First off, I encourage anyone that’s unfamiliar of this topic to read through this short but very sweet blog post on the subject (and to take a quick look at his sample code). We’ll be picking up where Steven left off.

In any case, to summarize the current situation; there exists a private and undocumented API in the iPhone SDK which Apple uses to great effect in their iBook application. The way to interface with this private API has been discovered and even fairly well documented. Using the private API is pretty straightforward but for one small problem: if you use the private API in your application then Apple will reject your app. For whatever non-specified reason (probably to keep potential iBook competitors in check), Apple does not want to open up their private API to developers or to play nice with developers who bend the rules and use the private API.

So our goal is clear. If Apple isn’t going to play nice and open the API up to developers, then perhaps we can do some digging to figure out how Apple’s implementation actually works and create our own implementation that does the same thing. It’s a pretty standard exercise in reverse-engineering, really. The core of the private page-curl API is used like so:

		filter = [[CAFilter filterWithType:kCAFilterPageCurl] retain];
		[filter setDefaults];
		[filter setValue:[NSNumber numberWithFloat:((NSUInteger)fingerDelta)/100.0] forKey:@"inputTime"];
		
		CGFloat _angleRad = angleBetweenCGPoints(currentPos, lastPos);
		CGFloat _angle = _angleRad*180/M_PI ; // I'm far more comfortable with using degrees ;-)
					
		if (_angle < 180 && _angle > 120) {// here I've limited the results to the right-hand side of the paper. I'm sure there's a better way to do this
			if (fingerVector.y > 0)
				[filter setValue:[NSNumber numberWithFloat:_angleRad] forKey:@"inputAngle"];
			else
				[filter setValue:[NSNumber numberWithFloat:-_angleRad] forKey:@"inputAngle"];

			_internalView.layer.filters = [NSArray arrayWithObject:filter];
		}

This is an excerpt straight out of Steven Troughton-Smith’s example. The example includes additional code related to tracking touch positions and interpolating the angle and distance between them, but this is really the core of the private API right here. All of the heavy-lifting is handled by the CAFilter class (private), which has a type of ‘kCAFilterPageCurl‘ (private constant, just the string @”pageCurl”, other filter types also exist), and which takes just a small number of input parameters (‘inputTime‘ and ‘inputAngle‘) and then works its magic behind the scenes.

So given that CAFilter seems to be doing pretty much all the work, it would follow that by constructing our own class that exposes the same interface as CAFilter we can supplant the private-API class with one of our own making (ah, the joys of reflection and weak-typing), thus interfacing with the underlying platform without breaking any of the rules. But what exactly is a CAFilter? Is it as onerous as a UIView with its hundreds of methods and properties? Does it extend another obscure private-API class that will also need to be reverse-engineered? Well thanks to the ‘printObject:toDepth:‘ routine discussed in a previous post we can see that a CAFilter is exactly:

@interface CAFilter : NSObject {
	unsigned int _type;
	NSString* _name;
	unsigned int _flags;
	void* _attr;
	void* _cache;
}

//Constructors
- (id) initWithType:  (NSString*) arg0;
- (id) initWithName:  (NSString*) arg0;

//NSCoding
- (NSObject*) initWithCoder:  (NSCoder*) arg0;
- (void) encodeWithCoder:  (NSCoder*) arg0;

//NSKeyValueCoding
- (void) setValue: (id) arg0 forKey: (NSString*) arg1;
- (id) valueForKey:  (NSString*) arg0;

//NSCopying and NSMutableCopying
- (NSObject*) mutableCopyWithZone:  (NSZone*) arg0;
- (NSObject*) copyWithZone:  (NSZone*) arg0;

//interface methods
- (void) setDefaults;
- (bool) isEnabled;
- (struct UnknownAtomic*) CA_copyRenderValue;

//garbage collection (doesn't need to be declared here)
- (void) dealloc;

//property accessors (don't need to be declared here)
- (bool) enabled;
- (void) setEnabled:  (bool) arg0;
- (bool) cachesInputImage;
- (void) setCachesInputImage:  (bool) arg0;
- (NSString*) name;
- (void) setName:  (NSString*) arg0;
- (NSObject*) type;

//properties
@property(nonatomic, readonly) NSString* type;
@property(nonatomic, retain) NSString* name;
@property(nonatomic) bool enabled;
@property(nonatomic) bool cachesInputImage;

@end

Nineteen methods and a handful of fields. Not bad, not bad at all, particularly when many of the methods are simply implementing various publicly-documented protocols such as NSCoding, NSCopying, and NSKeyValueCoding. As an added bonus, the superclass of CAFilter is NSObject, so the problem has now been reduced to the implementation of a single unknown class (which may still be a Herculean task, but at least now there are clearly-defined boundaries).

But the above code includes some methods that do not need to be part of the publicly declared interface. Let’s clean it up, rename it so that it doesn’t conflict with the existing private-API class, and add the proper definition of the ‘…Atomic‘ struct:

#import <Foundation/Foundation.h>

struct RenderValueResult { 
	int (**x1)(); 
	struct MyAtomic { 
		struct { 
			NSInteger x; 
		} _v; 
	} x2; 
} *_filterResult;

@interface MyCAFilter : NSObject<NSCoding, NSCopying, NSMutableCopying> {
	unsigned int _type;
	NSString* _name;
	unsigned int _flags;
	void* _attr;
	void* _cache;
}

//Constructors
- (id) initWithType:  (NSString*) arg0;
- (id) initWithName:  (NSString*) arg0;

//NSKeyValueCoding
- (void) setValue: (id) arg0 forKey: (NSString*) arg1;
- (id) valueForKey:  (NSString*) arg0;

//interface methods
- (void) setDefaults;
- (bool) isEnabled;
- (struct RenderValueResult*) CA_copyRenderValue;

//properties
@property(nonatomic, readonly) NSString* type;
@property(nonatomic, retain) NSString* name;
@property(nonatomic) bool enabled;
@property(nonatomic) bool cachesInputImage;

@end

Looking better already. That ‘RenderValueResult‘ struct will prove to be a nasty one, but more on that later.

Now that we know the interface, and before we go flying off randomly trying to replicate functionality that we still don’t fully understand, let’s take a simpler step. Let’s create a simple class that exposes the CAFilter interface, wraps an actual CAFilter instance, and logs each method call, parameters, and result, like so:

//MyCAFilter.h (modified to include 'delegate' field)
#import <Foundation/Foundation.h>

struct RenderValueResult {
    int (**x1)();
    struct MyAtomic {
        struct {
            NSInteger x;
        } _v;
    } x2;
} *_renderValueResult;

@class CAFilter;  //private-API

@interface MyCAFilter : NSObject<NSCoding, NSCopying, NSMutableCopying> {
    unsigned int _type;
    NSString* _name;
    unsigned int _flags;
    void* _attr;
    void* _cache;
    
    CAFilter* delegate;  //private-API
}

//Constructors
- (id) initWithType:  (NSString*) arg0;
- (id) initWithName:  (NSString*) arg0;

//NSKeyValueCoding
- (void) setValue: (id) arg0 forKey: (NSString*) arg1;
- (id) valueForKey:  (NSString*) arg0;

//interface methods
- (void) setDefaults;
- (bool) isEnabled;
- (struct RenderValueResult*) CA_copyRenderValue;

//properties
@property(nonatomic, readonly) NSString* type;
@property(nonatomic, retain) NSString* name;
@property(nonatomic) bool enabled;
@property(nonatomic) bool cachesInputImage;

@end

//MyCAFilter.m
#import "MyCAFilter.h"

@implementation MyCAFilter

@dynamic name, cachesInputImage, type, enabled;

- (id) initWithType: (NSString*) theType {
	NSLog(@"initWithType: type='%@'", theType);
    if ((self = [super init])) {
        delegate = [[CAFilter alloc] initWithType: theType];    //TODO:  remove delegate
    }
	return self;
}

- (id) initWithName: (NSString*) theName {
	NSLog(@"initWithName: name='%@'", theName);
    if ((self = [super init])) {
        delegate = [[CAFilter alloc] initWithName: theName];    //TODO:  remove delegate
    }
	return self;
}

- (id) initWithCoder: (NSCoder*) coder {
	NSLog(@"initWithCoder: coder=%@", coder);
	if ((self = [super init])) {
        delegate = [[CAFilter alloc] initWithCoder: coder];     //TODO:  remove delegate
    }
    return self;
}

- (void) setDefaults {
	NSLog(@"setDefaults");
	[delegate setDefaults];  //TODO:  remove delegate
}

- (void) encodeWithCoder: (NSCoder*) encoder {
	NSLog(@"encodeWithCoder:  coder=%@", encoder);
	[delegate encodeWithCoder:encoder];  //TODO:  remove delegate
}

- (id) mutableCopyWithZone: (NSZone*) zone {
    id result = [delegate mutableCopyWithZone:zone];   //TODO:  remove delegate
	NSLog(@"mutableCopyWithZone: zone=%@; result=%@", zone);
	return result;
}

- (id) copyWithZone: (NSZone*) zone {
    id result = [delegate copyWithZone:zone];  //TODO:  remove delegate
	NSLog(@"copyWithZone:  zone=%@; result=%@", zone, result);
	return result;
}

- (void) setValue: (id) value forKey: (NSString*) key {
	NSLog(@"setValue:  key=%@, value=%@", key, value);
	[delegate setValue:value forKey:key];	//TODO:  remove delegate
}

- (id) valueForKey:(id) key {
    id result = [delegate valueForKey:key];  //TODO:  remove delegate
	NSLog(@"valueForKey:  key=%@; result=%@", key, result);
	return result;
}

- (bool) isEnabled {
    bool result = [delegate isEnabled]; //TODO:  remove delegate
	NSLog(@"isEnabled; result=%d", result);
	return result; 
}

- (void) dealloc {
	NSLog(@"dealloc");
	[delegate release];		//TODO:  remove delegate
	[super dealloc];
}

- (bool) enabled {
    bool result = [delegate enabled];		//TODO:  remove delegate
	NSLog(@"enabled; result=%d", result);
	return result;
}
- (void) setEnabled: (bool) val {
	NSLog(@"setEnabled: value=%d", val);
	[delegate setEnabled:val];		//TODO:  remove delegate
}

- (void) setCachesInputImage: (bool) val {
	NSLog(@"setCachesInputImage: val=%d", val);
	[delegate setCachesInputImage:val];		//TODO:  remove delegate
}
- (bool) cachesInputImage {
    bool result = [delegate cachesInputImage];		//TODO:  remove delegate
	NSLog(@"cachesInputImage; result=%d", result);
	return result;
}

- (id) name {
    id result = [delegate name];		//TODO:  remove delegate
	NSLog(@"name; result=%@", result);
	return result;
}

- (void) setName: (NSString*) name {
	NSLog(@"setName: name='%@'", name);
	[delegate setName: name];		//TODO:  remove delegate
}

- (NSString*) type {
    NSString* result = [delegate type];		//TODO:  remove delegate
	NSLog(@"type; result=%@", result);
	return result;
}

- (struct RenderValueResult*) CA_copyRenderValue {
	struct RenderValueResult* result = [delegate CA_copyRenderValue];	//TODO:  remove delegate
    NSLog(@"CA_copyRenderValue; result=0x%08X, result.x1=0x%08X, result.x2=%d", result, result->x1, result->x2);
	return result;
}

@end

Using this class is a simple matter of editing ‘ReadPdfView.m‘ (working with Steven’s example project) to replace both instances of ‘[[CAFilter filterWithType:kCAFilterPageCurl] retain];‘ with ‘[[MyCAFilter alloc] initWithType: @”pageCurl”];‘. Note that it is also now safe to remove the ‘@class CAFilter;‘ and ‘extern NSString *kCAFilterPageCurl;‘ lines from this class.

Now obviously this still won’t fly with Apple, as it continues to use the private-API CAFilter class. But consider what we’ve accomplished; we’ve now inserted our own custom object into the rendering pipeline, and the core-animation framework is none-the-wiser. If we can now figure out how to get the same results without internally using the CAFilter instance, we will have cracked the page-curl animation.

Moving along, if we run this code through a complete page-curl animation, we see a very simple pattern emerge:

2011-02-09 00:59:11.694 PageCurlDemo[5501:207] initWithType: type='pageCurl'
2011-02-09 00:59:11.707 PageCurlDemo[5501:207] setDefaults
2011-02-09 00:59:11.711 PageCurlDemo[5501:207] setValue:  key=inputTime, value=0
2011-02-09 00:59:11.716 PageCurlDemo[5501:207] setValue:  key=inputAngle, value=-3.141593
2011-02-09 00:59:11.734 PageCurlDemo[5501:207] CA_copyRenderValue; result=0x04AF09E0, result.x1=0x00D96448, result.x2=65538
2011-02-09 00:59:11.767 PageCurlDemo[5501:207] valueForKey:  key=inputTime; result=0
2011-02-09 00:59:11.808 PageCurlDemo[5501:207] dealloc

This sequence of calls is repeated a number of times as the animation runs. None of the other methods that exist on the object are called. Every single one of these calls with the exception of ‘CA_copyRenderValue‘ originates in the example code; so now our task is constrained to the implementation of a single unknown method. But what a method it is. ‘CA_copyRenderValue‘ returns an instance of a fairly obtuse structure that has the following definition:

struct RenderValueResult { 
	int (**x1)(); 
	struct MyAtomic { 
		struct { 
			NSInteger x; 
		} _v; 
	} x2; 
} *_renderValueResult;

I’ve changed the name of the structure and its nested structure to avoid any issues with name collisions, but since the order and type of fields matches the private-API version there should be no issues in terms of compatibility between the different declared versions. At runtime this structure should be indistinguishable from the private-API version for all practical purposes (barring reflection, which could detect the difference in the naming).

Anyways, this structure contains two fields; ‘x1‘, which is a pointer to an array of functions that return integers, and ‘x2‘, which is simply an integer. Interestingly enough, the memory address of the returned data structure never differs by more than 256 bytes between calls, nor do the absolute values of ‘x1‘ or ‘x2‘ change. And here is where things start to get a bit murky. I’m going to forget about ‘x2‘ for a moment, as it is a simple type and its value never seems to vary. ‘x1‘ is not so easy.

By inspecting the value of ‘x1‘, I’ve determined that it references no more than 11 distinct functions (the 12th element in the result returned by CAFilter is NULL, and I assume that the NULL indicates the probable end of the meaningful data in the array). Moreover, the addresses of the functions returned do not appear to vary, even between independent runs of the application. Which implies to me that perhaps the result being returned is simply referencing some pre-existing object in memory.

But this is all speculation on my part. What’s needed here is more digging, so let’s create our own callback functions and see what we can discover about the way this data structure is being used by the core-animation framework. We can do that by adding the following to the MyCAFilter implementation:

int (**originalFuncs)();  //cache for the actual function pointers

//copy/paste this 11 times, incrementing both '0's each time...it's inelegant but it works
int callback0( id firstParam, ... ) {
	int myIndex = 0;
	NSLog(@"callback%d invoked, stack=%@", myIndex, [NSThread callStackSymbols]);
	
	va_list args;
	va_start(args, firstParam);
	int originalResult = originalFuncs[myIndex](firstParam, args);  //pass any params we recieved on to the original function; not sure if this is the correct way to do this
	
	NSLog(@"callback%d will return result:  %d", myIndex, originalResult);
	
	return originalResult;
}

And then by revising ‘CA_copyRenderValue‘ like so:

void* myCallbacks[11] = {&callback0, &callback1, &callback2, &callback3, &callback4, &callback5, &callback6, &callback7, &callback8, &callback9, &callback10};

- (struct RenderValueResult*) CA_copyRenderValue {
	struct RenderValueResult* result = [delegate CA_copyRenderValue];	//TODO:  remove delegate
	struct RenderValueResult* myResult = malloc(sizeof(struct RenderValueResult));
	myResult->x2 = result->x2;  //just copy the integer component of the result; 65538?
	
	//see how many functions there are before we encounter a NULL
	int funcIndex = 0;
	while (result->x1[funcIndex] != NULL) {
		funcIndex++;
		if (funcIndex >= 11) {
			NSLog(@"CA_copyRenderValue;  NULL sigil not found, assuming max number of functions is 11!");
			break;
		}
	}
	NSLog(@"CA_copyRenderValue;  found %d functions in delegate's result...", funcIndex);
	
	myResult->x1 = malloc(sizeof(int*) * (funcIndex + 1));		//we return this to the CA framework
	originalFuncs = malloc(sizeof(int*) * (funcIndex));			//we keep references to the original functions to use in our callbacks
	for (int index = 0; index < funcIndex ; index++) {
		originalFuncs[index] = result->x1[index];		//cache the original function pointers
		myResult->x1[index] = myCallbacks[index];     //put dummy callbacks into the result
	}
	myResult->x1[funcIndex] = NULL;
	
    NSLog(@"CA_copyRenderValue; result=0x%08X, result.x1=0x%08X, result.x2=%d", result, result->x1, result->x2);
	for (int index = 0; index < funcIndex; index++) {
		NSLog(@"CA_copyRenderValue; result->x1[%d]=0x%08X", index, result->x1[index]);
	}
	return myResult;
}

Now if we run the application, we get the following output:

2011-02-09 23:50:25.508 PageCurlDemo[10453:207] callback3 invoked, stack=(
	0   PageCurlDemo                        0x00006c2b callback3 + 50
	1   QuartzCore                          0x00d63347 CACopyRenderArray + 188
	2   QuartzCore                          0x00cc373e -[CALayer(CALayerPrivate) _copyRenderLayer:layerFlags:commitFlags:] + 1667
	3   QuartzCore                          0x00cc30b4 CALayerCopyRenderLayer + 55
	4   QuartzCore                          0x00cc11d2 _ZN2CA7Context12commit_layerEP8_CALayerjjPv + 122
	5   QuartzCore                          0x00cc10e1 CALayerCommitIfNeeded + 323
	6   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	7   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	8   QuartzCore                          0x00caf7b9 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 1395
	9   QuartzCore                          0x00caf0d0 _ZN2CA11Transaction6commitEv + 292
	10  QuartzCore                          0x00cdf7d5 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 99
	11  CoreFoundation                      0x00ef8fbb __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 27
	12  CoreFoundation                      0x00e8e0e7 __CFRunLoopDoObservers + 295
	13  CoreFoundation                      0x00e56bd7 __CFRunLoopRun + 1575
	14  CoreFoundation                      0x00e56240 CFRunLoopRunSpecific + 208
	15  CoreFoundation                      0x00e56161 CFRunLoopRunInMode + 97
	16  GraphicsServices                    0x0184c268 GSEventRunModal + 217
	17  GraphicsServices                    0x0184c32d GSEventRun + 115
	18  UIKit                               0x002d242e UIApplicationMain + 1160
	19  PageCurlDemo                        0x00002904 main + 102
	20  PageCurlDemo                        0x00002895 start + 53
)
2011-02-09 23:50:25.514 PageCurlDemo[10453:207] callback3 will return result:  9
2011-02-09 23:50:25.519 PageCurlDemo[10453:207] callback3 invoked, stack=(
	0   PageCurlDemo                        0x00006c2b callback3 + 50
	1   QuartzCore                          0x00ce5d12 _ZN2CA6Render7Encoder13encode_objectEPKNS0_6ObjectE + 30
	2   QuartzCore                          0x00ce670d _ZNK2CA6Render5Array6encodeEPNS0_7EncoderE + 113
	3   QuartzCore                          0x00ce5f24 _ZNK2CA6Render5Layer6encodeEPNS0_7EncoderE + 458
	4   QuartzCore                          0x00ce5cdb _ZN2CA6Render17encode_set_objectEPNS0_7EncoderEmjPNS0_6ObjectEj + 91
	5   QuartzCore                          0x00cc1215 _ZN2CA7Context12commit_layerEP8_CALayerjjPv + 189
	6   QuartzCore                          0x00cc10e1 CALayerCommitIfNeeded + 323
	7   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	8   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	9   QuartzCore                          0x00caf7b9 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 1395
	10  QuartzCore                          0x00caf0d0 _ZN2CA11Transaction6commitEv + 292
	11  QuartzCore                          0x00cdf7d5 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 99
	12  CoreFoundation                      0x00ef8fbb __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 27
	13  CoreFoundation                      0x00e8e0e7 __CFRunLoopDoObservers + 295
	14  CoreFoundation                      0x00e56bd7 __CFRunLoopRun + 1575
	15  CoreFoundation                      0x00e56240 CFRunLoopRunSpecific + 208
	16  CoreFoundation                      0x00e56161 CFRunLoopRunInMode + 97
	17  GraphicsServices                    0x0184c268 GSEventRunModal + 217
	18  GraphicsServices                    0x0184c32d GSEventRun + 115
	19  UIKit                               0x002d242e UIApplicationMain + 1160
	20  PageCurlDemo                        0x00002904 main + 102
	21  PageCurlDemo                        0x00002895 start + 53
)
2011-02-09 23:50:25.527 PageCurlDemo[10453:207] callback3 will return result:  9
2011-02-09 23:50:25.536 PageCurlDemo[10453:207] callback3 invoked, stack=(
	0   PageCurlDemo                        0x00006c2b callback3 + 50
	1   QuartzCore                          0x00ce5d34 _ZN2CA6Render7Encoder13encode_objectEPKNS0_6ObjectE + 64
	2   QuartzCore                          0x00ce670d _ZNK2CA6Render5Array6encodeEPNS0_7EncoderE + 113
	3   QuartzCore                          0x00ce5f24 _ZNK2CA6Render5Layer6encodeEPNS0_7EncoderE + 458
	4   QuartzCore                          0x00ce5cdb _ZN2CA6Render17encode_set_objectEPNS0_7EncoderEmjPNS0_6ObjectEj + 91
	5   QuartzCore                          0x00cc1215 _ZN2CA7Context12commit_layerEP8_CALayerjjPv + 189
	6   QuartzCore                          0x00cc10e1 CALayerCommitIfNeeded + 323
	7   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	8   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	9   QuartzCore                          0x00caf7b9 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 1395
	10  QuartzCore                          0x00caf0d0 _ZN2CA11Transaction6commitEv + 292
	11  QuartzCore                          0x00cdf7d5 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 99
	12  CoreFoundation                      0x00ef8fbb __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 27
	13  CoreFoundation                      0x00e8e0e7 __CFRunLoopDoObservers + 295
	14  CoreFoundation                      0x00e56bd7 __CFRunLoopRun + 1575
	15  CoreFoundation                      0x00e56240 CFRunLoopRunSpecific + 208
	16  CoreFoundation                      0x00e56161 CFRunLoopRunInMode + 97
	17  GraphicsServices                    0x0184c268 GSEventRunModal + 217
	18  GraphicsServices                    0x0184c32d GSEventRun + 115
	19  UIKit                               0x002d242e UIApplicationMain + 1160
	20  PageCurlDemo                        0x00002904 main + 102
	21  PageCurlDemo                        0x00002895 start + 53
)
2011-02-09 23:50:25.566 PageCurlDemo[10453:207] callback3 will return result:  9
2011-02-09 23:50:25.578 PageCurlDemo[10453:207] callback4 invoked, stack=(
	0   PageCurlDemo                        0x00006cca callback4 + 50
	1   QuartzCore                          0x00ce670d _ZNK2CA6Render5Array6encodeEPNS0_7EncoderE + 113
	2   QuartzCore                          0x00ce5f24 _ZNK2CA6Render5Layer6encodeEPNS0_7EncoderE + 458
	3   QuartzCore                          0x00ce5cdb _ZN2CA6Render17encode_set_objectEPNS0_7EncoderEmjPNS0_6ObjectEj + 91
	4   QuartzCore                          0x00cc1215 _ZN2CA7Context12commit_layerEP8_CALayerjjPv + 189
	5   QuartzCore                          0x00cc10e1 CALayerCommitIfNeeded + 323
	6   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	7   QuartzCore                          0x00cc1069 CALayerCommitIfNeeded + 203
	8   QuartzCore                          0x00caf7b9 _ZN2CA7Context18commit_transactionEPNS_11TransactionE + 1395
	9   QuartzCore                          0x00caf0d0 _ZN2CA11Transaction6commitEv + 292
	10  QuartzCore                          0x00cdf7d5 _ZN2CA11Transaction17observer_callbackEP19__CFRunLoopObservermPv + 99
	11  CoreFoundation                      0x00ef8fbb __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 27
	12  CoreFoundation                      0x00e8e0e7 __CFRunLoopDoObservers + 295
	13  CoreFoundation                      0x00e56bd7 __CFRunLoopRun + 1575
	14  CoreFoundation                      0x00e56240 CFRunLoopRunSpecific + 208
	15  CoreFoundation                      0x00e56161 CFRunLoopRunInMode + 97
	16  GraphicsServices                    0x0184c268 GSEventRunModal + 217
	17  GraphicsServices                    0x0184c32d GSEventRun + 115
	18  UIKit                               0x002d242e UIApplicationMain + 1160
	19  PageCurlDemo                        0x00002904 main + 102
	20  PageCurlDemo                        0x00002895 start + 53
)

Followed by a crash. Something causes the attempt to invoke the fourth callback function to die; probably related to the questionable way that I’m passing arguments to it. Not knowing what the proper signature for the callback functions is, I’ve made them all accept a variable number of ‘id’ parameters, which should cover most cases. However the best way to pass these arguments on to the original implementation is not clear.

For what it’s worth, I tried a number of alternate ways to invoke this function, all of which resulted in a crash. Skipping the invocation and just returning a hard-coded value from my callback prevented the crash, but didn’t result in any more callbacks being invoked. Presumably core-animation noticed that my hard-coded return value didn’t match what it was expecting, and decided to abort the rest of its rendering transaction.

And unfortunately, here is where I need to leave this interesting little diversion for now, unless/until I can figure out a way to move it forward. If you have any suggestions please don’t hesitate to let me know. I feel like I’m getting close to the answer here, but it’s still quite a ways away.

Update

If anyone is interested, you can download a complete XCode project containing the latest revision of my code. If you decide to take a crack at solving this problem, I wish you luck, and please do consider reporting back with your results.

Posted in coding, objective-c | Tagged , , , | 10 Comments

[Site Status] Server Instability

The 10-year-old PC that I’m currently using to host this site seems to be going through an end-of-life crisis. If you’re reading this then it must currently be up (or you accessed Google’s cached version), but there’s no guarantee it will stay that way. So don’t be surprised if this blog is intermittently down for the next few days. If it does go down, Google’s cache of the entire Internet is a good place to look.

And no worries, I’ve already ordered the parts for a replacement box. It will be an Atom D510 based server, with a meager 60 GB SSD, and it should be anywhere from 2-3 times as fast as the current aging server while using less than one-third as much power and running in complete silence. Progress is awesome sometimes.

I had really wanted to wait and build an AMD Brazos based server (would be about 50% faster than the Atom D510 while using a comparable amount of energy), but the replacement needs to be put in place sooner rather than later, and there are currently no Brazos chips/boards available here in Australia. Oh well. I’ll still have a bit of fun assembling, testing, and configuring the Atom box. Maybe I’ll add a quick post documenting the process, as well.

Posted in status | Leave a comment

[Objective-C + Cocoa] NSAutoreleasePool and Threads

If you have a multithreaded iPhone/iPad/Cocoa application, you are probably aware that for each thread you create you need to set up an auto-release pool for that thread. If you don’t do this then you’ll get some nice messages in your debugger log informing you that your app is leaking memory (for shame!). Personally I think that any boilerplate code that must be added to each thread should be handled automatically by the SDK/runtime environment, but that’s completely beside the point here. The point here is that the standard example given for this boilerplate code is generally something along the lines of:

- (void) myThreadEntryPoint {
	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];  //set up a pool
	// [do work here]
	[pool drain];
}

And this is all well and good for simple use-cases, but often a developer wants to have a thread that runs forever (or for the lifetime of the application), in which case the ‘[do work here]‘ section might look like:

while (! [self shouldTerminate]) {
	//[do some stuff]
	[NSThread sleepForTimeInterval:10.0s];  //sleep for a bit
}

If we insert this code into the standard boilerplate example, we get the following:

- (void) myThreadEntryPoint {
	NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];  //set up a pool
	while (! [self shouldTerminate]) {
		//[do some stuff]
		[NSThread sleepForTimeInterval:10.0s];  //sleep for a bit
	}
	[pool drain];
}

And now we have a problem, one that’s particularly easy for new developers to create. Technically this code is following the standard example, but it is also creating a slow memory leak, assuming that any amount of non-trivial work is being performed in the ‘[do some stuff]‘ section. The problem is that the auto-release pool is never drained until the thread is ready to terminate, meaning that all the objects that are in the pool do not get released. They simply accumulate in memory. If you write code like what’s shown above, a crash is inevitable; it’s only a question of when.

In an environment as memory-constrained as an iPhone, anything accumulating in memory is a Very Bad Thing™. Doubly so in this case because the issue will not be detected in debugging tools like Leaks or Allocations. You will not get any console messages nagging you about memory leaks. You have an auto-release pool in place, after all, and you’re releasing it properly, so how is the compiler or any other tool to know that there’s an issue (in fact, in order to detect that there is an issue in the above code the compiler would have to be able to solve the halting problem)?

If you code like this, your application will just slowly consume more and more memory until it eventually crashes. And you can’t even count on getting a reliable stack-trace when it does crash, because the allocation that finally brings the thing crashing down might be nowhere near the code associated with the actual leak.

Luckily, this error is simple to avoid. Just change the code so that it’s like this:

- (void)myThreadEntryPoint {	
	while (![self shouldTerminate]) {
		NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
		
		//[do work here]

		[pool drain];  
		[NSThread sleepForTimeInterval:10.0s];  //sleep for a bit
	}
}

And problem solved. The auto-release pool is released and reset on each iteration of the loop, as soon as we are done doing our actual work. Objects are released, memory is freed, everyone is happy, and on the next loop iteration the process starts over again. It’s a very simple fix, but the issue that it addresses is easy to overlook, and difficult to track down once overlooked.

Note that this is mentioned in Apple’s official documentation on the subject, which states:

If your application or thread is long-lived and potentially generates a lot of 
autoreleased objects, you should periodically drain and create autorelease pools 
(like the Application Kit does on the main thread); otherwise, autoreleased 
objects accumulate and your memory footprint grows. If, however, your detached 
thread does not make Cocoa calls, you do not need to create an autorelease pool.

So if you missed it in the official documentation before, now you know; and hopefully you also know why it’s important to pay attention to this little piece of advice. If you don’t want your application to crash and die randomly, that is.

Posted in coding, objective-c | Tagged , | 1 Comment

[C (and variants)] Pointer Declarations: You’re Doing it Wrong

It happens on occasion, in the programming world, that bad coding conventions become the norm. Consider for instance the following two lines of code:

int* pointer1;
int *pointer2;

Both lines are equivalent as far as the compiler is concerned, but when a human being parses these lines intuitively, they say slightly different things. The first line parses as “create a variable called ‘pointer1‘ that is of type ‘int*‘”, while the second line comes across as “create a variable called ‘*pointer2‘ that is of type ‘int‘”. One of these interpretations matches the semantics of the language and what the compiler actually does when it processes this code, and the other does not. If you are familiar with pointer types, then you know which is which, and if not then I’ll just tell you that it’s the first version. The first line of code is the correct way to declare a pointer.

Sadly, current coding conventions actually favor the second line. The wikipedia page on this subject even attempts to justify this backwards way of declaring a pointer with the explanation that “when the program dereferences the pointer, it has the type of the object to which it points”. That may hold true in simple cases such as the above example, but now consider the following:

int *functionThatReturnsAPointer();

Am I declaring here a function such that “when the program dereferences [it], it has the type [int]”? No. Even if that made the slightest bit of sense semantically given that trying to dereference a function with the unary ‘*‘ operator is incorrect, it is not what I’m doing. I’m declaring a function that has a return type of ‘int*‘. Not a return type of ‘magic value that if I put a star in front of it will give me an int‘. If we move to a language with slightly different syntax, like Objective-C, the problem with the convention becomes even more pronounced. For instance:

- (NSString*) toString;   //this is a valid function declaration in Objective-C
- (NSString) *toString2;  //this will make the Objective-C compiler very, very sad

- (void) functionWithAString: (NSString*) string;   //also valid
- (void) functionWithAString2: (NSString) *string;  //not a chance

Objective-C does an excellent job of highlighting the problem with the standard convention because in Objective-C the return and parameter types in function declarations must be enclosed within parenthesis. And the asterisk must be included in the parenthesis with the name of the type, because it is part of the type of the object, and not part of its name as the standard convention tries to imply.

And that’s the other half of the justification that is used to defend the standard convention; that by “hiding” the pointer type so that it looks like part of the variable name developers (in particular, novice developers) can use pointer types without having to really understand or think about pointer types. This is bad for a couple of reasons. First off, it’s trying to twist the semantics of the language in a way that is not accurate. And more importantly, it discourages (and/or delays) developers from taking the time to understand what pointer types are and how they work. Yes, pointers can be confusing, but if you’re going to work in a language that includes them then you need to understand how they work. How they really work; not just how a confusing coding convention tries to make it seem like they work.

So at the end of the day, the asterisk character in C and C-like languages is part of the type being declared and not part of its name, and it’s time to start putting it where it belongs; with the type. A bad convention should not be allowed to stand just because it is the convention. Write code that intuitively matches its semantic meaning in the language, not code that is designed to trick people who don’t really understand how pointers work into being able to work with them anyways.

Lastly, and only somewhat tangentally, the following are all examples of very bad coding style, and you should not write code like this no matter where you put your asterisks:

int x, y;    //no
int *x, y;   //bad
int* x, y;   //not any better
int *x, *y;  //you get the idea... 
int* x, *y;
int x, *y;

Instead, simply do like so:

int x;
int* y;

It’s not like you pay by the newline when you write code. Limit your variable declarations to one-per-line. Trust me, you’ll write more readable code that way.

Posted in c, coding | Tagged , | 5 Comments

[Objective-C + Cocoa] Object Inspection

I’m a big fan of reflection. Always have been since I was first exposed to it in Java. For those not familiar with the concept, reflection (or introspection as it is alternately called) allows one to inspect and/or access the properties and methods of an object instance at runtime, without needing any specific details about its declared type or fields. Though it may seem like a fairly minor feature, reflection is used to great effect in the Java world by the likes of Spring and EasyMock, to name just a few.

So it’s a bit puzzling to me, then, that reflection seems to have been long forgotten in the realm of Objective-C. Apple’s official documentation on this topic even mildly discourages its use (“You typically do not need to use the Objective-C runtime library directly when programming in Objective-C”). Granted, the performSelector: method sees fairly frequent use in many Cocoa applications, but this is a minor concession in a language where virtually every method-call resolves to a table lookup to find the implementation of the method being called (and you can even swap method implementations around at runtime by mucking with the lookup table).

So as a demonstration of some of the neat things that can be done using reflection in Objective-C, I’ve put together some code that will “deconstruct” an arbitrary object. This code will:

  • Print the signature of any methods that exist on an object.
  • Print the name and type of any properties declared on the object
  • Print the name and type of any instance-level fields declared as members of the object.
  • Optionally recurse through any non-primitive non-nil field/property types.
  • Optionally recurse through the object’s superclasses until the root class (NSObject, typically) is reached.
  • Attempt to track and return the real amount of memory allocated to the object (not fully accurate).

The code is packaged as a category on NSObject, meaning that if you include it in your project you can simply call ‘[obj printObject:obj toDepth:0]‘ in order to print the details of any object you are interested in. Anyways, here is the code that works all the magic:

#import <objc/objc-class.h>
#import <malloc/malloc.h>

@implementation NSObject(object_print)

- (NSString*) appendTo: (NSString*) base with: (NSString*) rest {
	return [NSString stringWithFormat:@"%@%@", base, rest];
}

- (int) printObjectInternal:(id)anObject printState: (NSMutableArray*)state friendlyName: (NSString*) objName withIndent: (NSString*)indent fromDepth: (int)currentDepth toDepth: (int)maxDepth {
	if (anObject == nil || anObject == NULL || currentDepth > maxDepth) {
		//nothing to do
		return 0;
	}
	
	[state addObject:anObject];
	
	//process properties for the class and its superclass(es)
	int totalSize = 0;
	int mySuperclassDepth = currentDepth;
	Class processingClass = [anObject class];
	while (processingClass != nil && processingClass != [NSObject class] && mySuperclassDepth <= maxDepth) {
		unsigned int numFields = 0;
		
		//methods
		Method* methods = class_copyMethodList(processingClass, &numFields);
		NSLog(@"[%@] - %@  Printing object:  type=%@ : %@ ...", objName, indent, processingClass, class_getSuperclass(processingClass));
		NSLog(@"[%@] - %@  Printing object methods:  type=%@, numMethods=%d", objName, indent, processingClass, numFields);
		for (int index = 0; index < numFields; index++) {
			unsigned int numArgs = method_getNumberOfArguments(methods[index]);
			const char* name = sel_getName(method_getName(methods[index]));
			NSString* argString = @"";
			char* copyReturnType = method_copyReturnType(methods[index]);
			for (int argIndex = 0; argIndex < numArgs; argIndex++) {
				char* argType = method_copyArgumentType(methods[index], argIndex);
				if (argIndex > 2) {
					argString = [argString stringByAppendingFormat:@" argName%d: (%@) arg%d", argIndex - 2, [self codeToReadableType: argType], argIndex - 2]; 
				}
				else if (argIndex > 1) {
					argString = [argString stringByAppendingFormat:@" (%@) arg%d", [self codeToReadableType: argType], argIndex - 2];
				}
				free(argType);
			}
			
			if (numArgs <= 2) {
				NSLog(@"[%@] - %@ (%@)  - (%@) %s;", objName, indent, processingClass, [self codeToReadableType: copyReturnType], name);
			}
			else {
				NSLog(@"[%@] - %@ (%@)  - (%@) %s %@;", objName, indent, processingClass, [self codeToReadableType: copyReturnType], name, argString);
			}
			free(copyReturnType);
		}
		
		//properties (i.e. things declared with '@property')
		objc_property_t* props = class_copyPropertyList(processingClass, &numFields);
		NSLog(@"[%@] - %@  Printing object properties:  type=%@, numFields=%d", objName, indent, processingClass, numFields);
		for (int index = 0; index < numFields; index++) {
			objc_property_t prop = props[index];
			const char* fieldName = property_getName(prop);
			const char* fieldType = property_getAttributes(prop);
			NSLog(@"[%@] - %@ (%@) @property %@ %s;", objName, indent, processingClass, [self codeToReadableType: fieldType], fieldName);
			
			@try {
				id fieldValue = [anObject valueForKey:[NSString stringWithFormat:@"%s", fieldName]];
				totalSize += malloc_size(fieldValue);
				NSString* typeString = [NSString stringWithFormat:@"%s", fieldType];
				NSRange range = [typeString rangeOfString:@"T@\""];
				if (range.location == 0 && fieldValue && ! [state containsObject:fieldValue]) {
					//the field is an object-type, so print its size as well
					NSLog(@"[%@] - %@ (%@)\t  Expanding property [%s]:", objName, indent, processingClass, fieldName);
					totalSize += [self printObjectInternal: fieldValue printState: state friendlyName: objName withIndent: [NSString stringWithFormat:@"%@\t", indent] fromDepth: mySuperclassDepth + 1 toDepth: maxDepth];
				}
			}
			@catch (id ignored) {
				//couldn't get it with objectForKey, so try an alternate way
				void* fieldValue = NULL;
				object_getInstanceVariable(anObject, fieldName, &fieldValue);
				if (fieldValue != NULL && fieldValue != nil) {
					totalSize += malloc_size(fieldValue);
				}
			}
		}
		
		//ivars (i.e. declared instance members)
		Ivar* ivars = class_copyIvarList(processingClass, &numFields);
		NSLog(@"[%@] - %@ (%@) Printing object ivars:  type=%@, numFields=%d", objName, indent, processingClass, processingClass, numFields);
		for (int index = 0; index < numFields; index++) {
			Ivar ivar = ivars[index];
			id fieldValue = object_getIvar(anObject, ivar);
			
			const char* fieldName = ivar_getName(ivar);
			const char* fieldType = ivar_getTypeEncoding(ivar);
			
			NSLog(@"[%@] - %@ (%@) %@ %s;", objName, indent, processingClass, [self codeToReadableType: fieldType], fieldName);
			int mSize = malloc_size(fieldValue);
			totalSize += mSize;
			
			@try {
				NSString* typeString = [NSString stringWithFormat:@"%s", fieldType];
				NSRange range = [typeString rangeOfString:@"@"];
				if (range.location == 0 && (! [state containsObject:fieldValue]) && mSize > 0) {
					//the field is an object-type, so print its size as well
					NSLog(@"[%@] - %@ (%@)\t  Expanding ivar [%s]:", objName, indent, processingClass, fieldName);
					totalSize += [self printObjectInternal: fieldValue printState: state friendlyName: objName withIndent: [NSString stringWithFormat:@"%@\t", indent] fromDepth: mySuperclassDepth + 1 toDepth: maxDepth];
					
					//see if it's a countable type, just for fun
					if ([fieldValue respondsToSelector:@selector(count)]) {
						//if we can count it, print the count
						NSLog(@"[%@] - %@ (%@)\t\t  Container Count:  name=%s, type=%s, count=%d", objName, indent, processingClass, fieldName, fieldType, [fieldValue count]);
					}
				}
			}
			@catch (id ignored) {
				//couldn't print it
			}
		}
		
		//process indexed ivars (extra bytes allocated at end of object; no name available, just size)
		void* extraBytes = object_getIndexedIvars(anObject);
		NSLog(@"[%@] - %@ (%@) Printing object indexedIvars:  type=%@, extraBytes=%d", objName, indent, processingClass, processingClass, malloc_size(extraBytes));
		
		//process superclass
		NSLog(@"[%@] - %@ (%@) Superclass of %@ is %@", objName, indent, processingClass, processingClass, class_getSuperclass(processingClass));
		processingClass = class_getSuperclass(processingClass);
		mySuperclassDepth++;
	}
	
	return totalSize;
}

- (int) printObject: (id)anObject toDepth: (int) maxDepth {
	if (! anObject) {
		anObject = self;
	}
	if (maxDepth < 0) {
		maxDepth = 0;
	}
	NSMutableArray* state = [[NSMutableArray alloc] initWithCapacity: 1024];
	int result = [self printObjectInternal:anObject printState: state friendlyName: [[anObject class] description] withIndent: @"" fromDepth: 0 toDepth: maxDepth];
	[state release];
	
	return result;
}

@end

One minor omission from the above code is the codeToReadableType: function. This method simply takes an Objective-C type code (things like “^^f” and “C” and “:”) and parses it back into a human-readable format. It’s rather verbose for what it does, so it’s available at the end of this post.

Anyways, this code imbues any type derived from NSObject with a ‘printObject:toDepth:‘ method which does pretty much what its name implies. For a given object, it will print information about its methods, properties, and fields, and if you specify a depth greater than 0 it will also recurse through any non-primitive property or field and also the object’s superclass(es) to the specified depth limit. Want to know all 327 methods that exist on a UIView instance, including the ones that Apple doesn’t tell you about? Then invoke this method on a UIView instance, or on an instance of anything that extends UIView using a ‘toDepth:‘ of 1 or greater.

Also note that this method is designed to handle circular references in the object hierarchy and avoid getting trapped in a cycle; if you paid close attention to the code you will have noticed the ‘state‘ array, which keeps track of every object instance that has been encountered in the hierarchy. Before recursing through a new object instance the method first checks this array to make sure that instance hasn’t already been encountered, and avoids recursing through the object if it has already been seen. So you don’t have to worry about circular references killing the ‘printObject:toDepth:‘ routine.

All-told, this code can be quite fun to play with if you’re curious about your Objective-C runtime environment. It lets you see in a human-readable way what the object hierarchy looks like in memory. Note that while this code also attempts to keep track of the number of bytes allocated by each object it traverses, it does not do a complete job of it, and the returned value shouldn’t be assumed to be an accurate representation of the size of the object instance. It may work for simple types, but you certainly shouldn’t rely on it for anything significant.

Lastly, here is the codeToReadableType: implementation. It could almost certainly be more compactly written using regular expressions and pattern matching, but this will get the job done. Just include it as part of the category, and you’ll be all set.

- (NSString*) codeToReadableType: (const char*) code {
	NSString* codeString = [NSString stringWithFormat:@"%s", code];
	NSString* result = [NSString stringWithString:@""];
	
	bool array = NO;
	NSString* arrayString;
	//note:  we parse our type from left to right, but build our result string from right to left
	for (int index = 0; index < [codeString length]; index++) {
		char nextChar = [codeString characterAtIndex:index];
		switch (nextChar) {
			case 'T':
				//a placeholder code, the actual type will be specified by the next character
				break;
			case ',':
				//used in conjunction with 'T', indicates the end of the data that we care about 
				//we could further process the character(s) after the comma to work out things like 'nonatomic', 'retain', etc., but let's not
				index = [codeString length];
				break;
			case 'i':
				//int or id
				if (index + 1 < [codeString length] && [codeString characterAtIndex:index + 1] == 'd') {
					//id
					result = [self appendTo: (array ? @"id[" : @"id") with: result];
					index++;
				}
				else {
					//int
					result = [self appendTo: (array ? @"int[" : @"int") with: result];
				}
				break;
			case 'I':
				//unsigned int
				result = [self appendTo: (array ? @"unsigned int[" : @"unsigned int") with: result];
				break;
			case 's':
				//short
				result = [self appendTo: (array ? @"short[" : @"short") with: result];
				break;
			case 'S':
				//unsigned short
				result = [self appendTo: (array ? @"unsigned short[" : @"unsigned short") with: result];
				break;
			case 'l':
				//long
				result = [self appendTo: (array ? @"long[" : @"long") with: result];
				break;
			case 'L':
				//unsigned long
				result = [self appendTo: (array ? @"unsigned long[" : @"unsigned long") with: result];
				break;
			case 'q':
				//long long
				result = [self appendTo: (array ? @"long long[" : @"long long") with: result];
				break;
			case 'Q':
				//unsigned long long
				result = [self appendTo: (array ? @"unsigned long long[" : @"unsigned long long") with: result];
				break;
			case 'f':
				//float
				result = [self appendTo: (array ? @"float[" : @"float") with: result];
				break;
			case 'd':
				//double
				result = [self appendTo: (array ? @"double[" : @"double") with: result];
				break;
			case 'B':
				//bool
				result = [self appendTo: (array ? @"bool[" : @"bool") with: result];
				break;
			case 'b':
				//char and BOOL; is stored as "bool", so need to ignore the next 3 chars
				result = [self appendTo: (array ? @"BOOL[" : @"BOOL") with: result];
				index += 3;
				break;
			case 'c':
				//char?
				result = [self appendTo: (array ? @"char[" : @"char") with: result];
				break;
			case 'C':
				//unsigned char
				result = [self appendTo: (array ? @"unsigned char[" : @"unsigned char") with: result];
				break;
			case 'v':
				//void
				result = [self appendTo: @"void" with: result];
				break;
			case ':':
				//selector
				result = [self appendTo: @"SEL" with: result];
				break;
			case '^':
				//pointer
				result = [self appendTo: @"*" with: result];
				break;
			case '@': {
				//object instance, may or may not include the type in quotes, like @"NSString"
				if (index + 1 < [codeString length] && [codeString characterAtIndex:index + 1] == '"') {
					//we can get the exact type
					int endIndex = index + 2;
					NSString* theType = @"";
					while ([codeString characterAtIndex:endIndex] != '"') {
						theType = [NSString stringWithFormat:@"%@%c", theType, [codeString characterAtIndex:endIndex]];
						endIndex++;
					}
					theType = [self appendTo: theType with: @"*"];
					result = [self appendTo: theType with: result];
				
					index = endIndex + 1;
				}
				else {
					//all we know is that it's an object of some kind
					result = [self appendTo: @"NSObject*" with: result];
				}
				break;
			}
			case '{': {
				//struct, we don't fully process these; just echo them
				index++;
				int numBraces = 1;
				NSString* theType = @"{";
				while (numBraces > 0) {
					char next = [codeString characterAtIndex:index];
					theType = [NSString stringWithFormat:@"%@%c", theType, next];
					if (next == '{') {
						numBraces++;
					}
					else if (next == '}') {
						numBraces--;
					}
					
					index++;
				}
				result = [NSString stringWithFormat:@"struct %@%@", theType, result];
				
				index--;
				break;
			}
			case '?':
				//IMP and function pointer
				result = [self appendTo: @"IMP" with: result];
				break;
			case '[':
				//array type
				array = YES;
				arrayString = @"";
				result = [self appendTo: @"]" with: result];
				break;
			case ']':
				//array type
				array = NO;
				break;
			case '0':
			case '1':
			case '2':
			case '3':
			case '4':
			case '5':
			case '6':
			case '7':
			case '8':
			case '9':
				//for a statically-sized array, indicates the number of elements
				if (array) {
					arrayString = [NSString stringWithFormat:@"%@%c", arrayString, nextChar];
				}
				break;
			default:
				break;
		}
	}
	
	return result;
}
Posted in coding, objective-c | Tagged , | 3 Comments

Android vs. iPhone; A Developer’s Comparison

So I’ve had a bit of exposure to both the iPhone and Android SDK’s, and while my impression of both is generally positive, each one has some of its own unique strengths and weaknesses.

Interface Creation/Editing

To begin I’ll focus on one of the iPhone SDK’s biggest strengths: Interface Builder. This is a very slick and polished tool for building out an interface (or for more complex applications, the basic underpinnings of an interface) and hooking it up to the rest of your code, and it completely blows away Android’s layout-editor. With Interface Builder, it is generally possible to build exactly what you want using simple drag and drop interactions. With Android’s layout editor, often the best you can do is express the general idea of what you want using the graphical editing tools, and then you are forced to manually edit the XML document that the layout editor generates in order to fully realize your idea. Put simply, I have never once had to manually edit the XML output by Interface Builder, and I have never been able to build a non-trivial Android UI without having to do at least some manual editing of its XML output.

Interface Builder is not without its faults, however; its mechanism for configuring referencing outlets for UI elements is counter-intuitive to the newcomer, at best, and the way it integrates with your XCode project seems to be more as if by magic than through any discernable coupling. Perhaps you like such behind-the-scenes magic, but personally when I am coding something I like to be able to see how all the tools are fitting together, and that’s something that you can’t do with XCode and Interface Builder. And with respect to configuring referencing outlets I think Google has the better approach here, generating a single class that any other class can use to directly reference any bundled resource (interface components, images, properties files, etc.). I fault them for calling this class “R” and not something more meaningful like “Resource”, but the overall concept behind it is sound, and I think superior to Apple’s approach of requiring that each reference be manually configured by the developer.

And of course, Apple does have a bit of an unfair advantage in the realm of user-interface editing. They only need to worry about targeting a single device with a single interface resolution (the 2x resolution “retina display” models implement an internal scaling algorithm so that they, too, can be targeted using the original iPhone resolution), while Google’s Android SDK developers need to target a myriad of devices, each one with a potentially unique interface resolution. In essence the UI for an iPhone application can be effectively specified using a fixed/absolute layout, which greatly simplifies things for Interface Builder. On the other hand, the lack of a consistent interface resolution in Android devices means that layouts must be specified in a relative fashion that allows them to change as needed to accommodate different screen resolutions. This makes the task of building an effective interface editor much more complex for the Android guys.

As an aside, I find it interesting to note that both Android and iPhone use very similar XML-based layout systems, so in theory there is nothing preventing Google’s implementation from being just as smooth as Apple’s, apart from a lack of polish. And I would be that that will come in time.

Platform and Tools

Moving along, and pushing a bit closer to the realm of personal opinion, there is also the difference in development platform, tools, and language to consider. If you are doing iPhone development, then you have very little choice in the matter. You must develop on a Mac (or Hackintosh) running the latest version of OS X; your only real option as far as IDE’s are concerned is XCode, and you’ll be writing Objective-C. Android developers have a bit more freedom. You can choose to develop under OS X, Windows, or Linux (or presumably any other operating-system that can run Java), although you are still essentially locked into a single IDE; Eclipse (you can technically use other IDE’s, but the Android development plugin only works with Eclipse, so you probably won’t want to), and you’ll be writing code in Java.

While the merits of one OS versus another can be debated endlessly without ever reaching a definitive outcome, I think Android comes out the winner by virtue of letting developers choose their desired operating-system. Would it be terribly difficult for Apple to do the same? I don’t think so, however I very much doubt that they will in the near future. With respect to IDE’s, I do personally feel that Eclipse is a better product than XCode, particularly if you have multiple projects that you are working on, and doubly so when SCM is introduced. But then, I am sure there are some people, somewhere, who genuinely prefer coding in XCode, no?

And as for Java versus Objective-C, that’s mostly a matter of personal preference. Both languages are reasonable, although nowadays there are probably more developers who are comfortable with Java than with Objective-C, and in some areas Objective-C does show its age; particularly in the realm of memory management. There is also a wider variety of Java-based libraries and build tools available, and using them with an Android project is generally a bit simpler than accomplishing the same task in the Objective-C and iPhone world. But in the grand scheme of things, it’s really not enough of a difference to say that one platform is any better than the other here. Both languages are reasonable, and a skilled Java developer should not have much difficulty adapting to Objective-C, or vice-versa.

SDK Architecture

Even further in the realm of personal opinion lies the relative merits of the SDK frameworks themselves. The iPhone SDK is built strictly around the model-view-controller pattern, and the development tools all but force you to structure you applications in this same pattern. You can subvert it if you want to (not that you would), but you really have to go out of your way to do so. The Android SDK, on the other hand, is a bit more free-form. Yes, there are model-view-controller overtones throughout, but the pattern is not as thoroughly pervasive as in the iPhone SDK, and more significantly, the developer can choose to follow some other pattern if they prefer without being penalized by the framework SDK and its tools.

Of course, this freedom comes at a price; model-view-controller is widely viewed as a very good pattern to follow in many circumstances, and freedom to choose a different pattern means freedom to choose a less appropriate pattern, a bad pattern, or even no pattern at all. Ultimately each SDK is following a different philosophy here, but I don’t think that one approach is inherently any better than the other. The rigidity in Apple’s approach may frustrate beginners and advanced users alike while giving intermediate-level coders some comfort in its uniformity and predictability. And while the extra freedom afforded by the Android SDK may prove useful to the skilled developers, it also makes it easier for neophytes to dig themselves into a hole and requires that the intermediate-level developer devote more thought to the overall structure and architecture of their code. Each approach has its own pros and cons.

Persistence Layer

Both platforms start out equal here, with both Android and iPhone providing developers with access to an SQLLite database for use within their applications. Apple goes one step further, however, and provides Core Data; an ORM framework that runs on top of the SQLLite database and abstracts away the actual database operations. As such, the iPhone SDK enjoys a slight advantage here. While Core Data is a bit crufty in its terminology and somewhat restricted in its functionality, it is still much better in most cases than writing SQL directly.

But even ignoring its pedantic nature and functional limitations, Core Data is not perfect. If you change your data model after your application has been released you will be left with little choice but to go through the Core Data migration process (if you don’t, you app will simply crash the moment it tries to access the old database instance). And while this process is generally adequate for simple migrations using simple schemas populated with a relatively small number of entities, you are in for an onerous time if you have a complex data model populated with several hundred (or more) entities. You may experience crashes caused by memory-management issues internal to the Core Data framework (I suspect that it doesn’t periodically re-fault entities as it migrates them, causing them to “leak” during the migration process), and there is no straightforward way to wrest control away from the framework to try and rescue things yourself.

In this sense, having to write some manual SQL statements to update your data model may well be preferable to Core Data. You have full control over the process yourself, and the ability to ensure that the migration is performed in an efficient way that won’t crash the system or cause the app to fail if it is opened with an outdated schema revision. And it’s worth noting that while Android doesn’t provide any explicit ORM framework for developers to use, it also doesn’t prohibit developers from bundling an ORM framework of their choice with their app. As the platform uses a custom Java SDK and runtime, there are quite a few potential candidates available; though sadly not long-time favorites like Hibernate and other similar tools.

So you can have ORM on Android if you want it, you just have to pay the cost of bundling and configuring it yourself. Overall it’s not a bad solution, but it still lacks the convenience of Apple’s approach with Core Data, which is entirely adequate to cover the needs of most developers, most of the time. As such, the iPhone SDK comes out slightly ahead of the Android SDK here.

Device Emulators

Another important part of both the iPhone and the Android SDK is their device simulator/emulator software. And here is another area where the iPhone SDK shows its polish. The iPhone emulator is fast, sleek, and looks and feels very much like an actual iPhone. It has an on-screen keyboard that works just like the on-screen keyboard on the actual device, and the same applies to all the standard built-in navigation components as well. The Android emulator, on the other hand, feels and looks a lot more like a window with an Android UI drawn into it. There is no on-screen keyboard, instead you get a kind of sad-looking grid of keys that sits next to the Android window, and the same applies to the navigation buttons. The Android simulator feels a lot less like a simulator, and a lot more like a tool.

But again, much of this is not Google’s fault. Apple only has exactly one device to worry about, and they have complete control over and advance knowledge of its capabilities. Furthermore, they benefit more from having a robust simulator, because they are in the business of manufacturing physical handsets far more than Google is or ever was (Nexus One notwithstanding). Android has literally dozens of different devices to worry about, some of which have physical keyboards, some of which do not, and each of which may implement a unique navigation layout/paradigm. In this light some of their simulator’s limitations start to make sense, and I will say that Google has come up with a good solution to this issue; creating a virtual Android device of any desired configuration is quite simple, and works exactly as it should. In the end, while the iPhone emulator certainly feels a bit nicer to use and delivers an experience that more closely mirrors the actual device that it is simulating, both emulators are acceptable when it comes to actually getting the job of testing and debugging done.

One other thing I feel I must say about the Android simulator, unfortunately, is that it is slow. Painfully so. It’s tempting to blame Java for this shortfall, but I’ve been around Java enough to know that well-designed and optimized Java code is very nearly as fast as compiled binaries. The difference in performance between the Android emulator and the iPhone emulator is an order of magnitude greater than any variance that I would credit to the Android emulator being Java-based where the iPhone emulator is a native binary. As with some of the other Android development tools, I suspect the problem here is mainly a lack of polish, and that the situation will improve in time.

Device Provisioning

Lastly, deploying a development copy of an application to a physical iPhone for testing is a ridiculously circuitous process (at least initially), requiring multiple round-trips between the developer and Apple to generate, configure, and download the certificates and provisioning profiles necessary to cajole an iPhone into accepting a developer’s application. To further complicate matters, these certificates expire periodically and must be refreshed, and they also impose various other limitations on what actions a developer or development team is allowed to take. All told, it is a very developer-unfriendly process, and probably an offshoot of Apple’s “we control everything that happens on our devices” paranoia.

Contrast this with the Android approach, where all you need to install a development copy of an application is a device and a USB cable (and a custom USB driver, if you are developing on Windows). There is simply no contest here, Android’s approach to application deployment/provisioning is vastly superior to Apple’s from the developer’s point of view. That is, unless you happen to like jumping through a series of hoops for no good reason; in which case Apple has you covered.

Conclusion

So which is better; iPhone SDK or Android SDK? Neither, really. While the iPhone SDK is absolutely more polished than the Android SDK, both provide usable tools that make building an iPhone or Android application a relatively straightforward process. What the Android SDK lacks in polish it tends to make up for in flexibility, accessibility/developer-friendliness, and a greater availability of open-source third-party libraries, tools, and plugins. So pick your preference, and start coding!

Posted in banter, coding | Tagged , | 4 Comments

[Java] Defeating CAPTCHA Images

Disclaimer: Depending upon the country you currently reside in, programmatically defeating CAPTCHA images may technically be illegal. Whether or not there is any merit behind such a law I leave as a matter for you to work out with your representatives or equivalent lawmaking body. But suffice to say, the information in this post is intended for educational and informative purposes only, and should not be used in any other context. It should also be noted that the CAPTCHA images that are used in this example are quite old, and were cracked by others long ago.

I’ve always been mildly amused at the continually growing use of CAPTCHA images, or more accurately, at their ever-increasing complexity. It seems that the only truly effective CAPTCHA’s are the ones that even human beings can barely decipher. But more interesting to me is the fact that these distorted snippets of letters and numbers have become a sort of de-facto Turing test. If you can determine what the characters are, then you are human; otherwise you are not. For whatever reason, these images have become a symbolic line in the sand separating man from machine, and by exploring ways to cross this line we may move ever so slightly closer towards the creation of true artificial-intelligence.

So let’s examine a very basic CAPTCHA image, one that was used in a popular online-forum distribution before it was cracked long ago:

PHPBB2 CAPTCHA

This CAPTCHA works on the principle of contrast. Human beings can discern distinct regions in an otherwise noisy image so long as each distinct region meets some minimum contrast level above/below that of the background noise. This kind of image can be difficult to decipher computationally, because pulling out coherent regions from amongst the background noise requires contextual understanding of large portions of the image at once, which is generally a difficult thing to accomplish programmatically. That isn’t to say it can’t be done, however.

A human being looking at this image is able to recognize that there is some threshold created by the background noise that has been introduced, above which an element is part of the encoded data, and below which an element is simply part of the background noise and should be discarded. Once that is done discerning the text becomes a simple matter of discarding everything below the noise threshold, and keeping everything above. So let’s see if we can code it. First, we need a way to determine the noise threshold:

        //init, determine the average color intensity of the image
        int average = 0;
        for (int row = 0; row < image.getHeight(); row++) {
                for (int column = 0; column < image.getWidth(); column++) {
                        int color = image.getRGB(column, row) & 0x000000FF;  //only need the last 8 bits
                        average += color;
                }
        }
        average /= image.getWidth() * image.getHeight();

This bit of code determines the average color intensity of the entire image (216 / 255 in this case). Because this CAPTCHA is in grayscale it only needs to look at a single component of the pixel color, but colorized CAPTCHA images could be processed in a similar fashion by computing the intensity using the full RGB value. In any case, now we have a basic threshold that we can use for determining which parts of the image contain valuable data, and which parts contain only noise. We can do that like so:

        //first pass, mark all pixels as WHITE or BLACK
        for (int row = 0; row < image.getHeight(); row++) {
                for (int column = 0; column < image.getWidth(); column++) {
                        int color = image.getRGB(column, row) & 0x000000FF;  //only need the last 8 bits
                        if (color <= average * .70 ) {
                                image.setRGB(column, row, BLACK);
                                darkRegion = true;
                        }
                        else if (color < .85 * average && darkRegion && row < image.getHeight() - 1 
                                && (image.getRGB(column, row + 1) & 0x000000FF) < .85 * average) {
                                image.setRGB(column, row, BLACK);
                        }
                        else if (color < .85 * average && ! darkRegion && row < image.getHeight() - 1 && column > 0 
                                && column < image.getWidth() - 1 
                                &&  (((image.getRGB(column, row + 1) & 0x000000FF) < color) 
                                        || ((image.getRGB(column + 1, row) & 0x000000FF) < color) 
                                        || ((image.getRGB(column - 1, row) & 0x000000FF) < color))) {
                                image.setRGB(column, row, BLACK);
                                darkRegion = true;
                        }
                        else {
                                image.setRGB(column, row, WHITE);
                                darkRegion = false;
                        }
                }
        }

Note that this code assumes that darker pixels are part of the data and lighter pixels are part of the background noise, because that is how the input CAPTCHA is set up. A smarter approach would be to look at the number of pixels falling above the noise threshold and the number of pixels falling below, and then keep whichever group is smaller. For a CAPTCHA like this one to be effective, there must be more noise than data, so it follows that the data that you’re looking for will always be in the smaller group of pixels.

In any case, what the above code does is traverse the image, and turn any pixels that appear to be noise white, and any pixels that appear to be data black. Note that it includes some rudimentary region-detection code, owing to the fact that we expect our data pixels to be tightly clustered together in distinct regions. So when the code encounters a pixel that it considers to be part of the data, it also lowers the selection criteria for the next pixel because there is a strong possibility that the next pixel will also be data. This helps prevent false-negatives from erroneously dropping out valuable pieces of data. Let’s take a peek at what our CAPTCHA image looks like at this point:

PHPBB2 CAPTCHA, after first pass

It’s not perfect, but it is definitely improved. We have successfully removed all of the background noise from the image, but unfortunately we have also removed some pieces of the actual data. The data that is left is all in the right place, however, so perhaps we can amplify and/or reconstruct it:

                //second pass, eliminate horizontal gaps
                for (int row = 0; row < image.getHeight(); row++) {
                        for (int column = 0; column < image.getWidth(); column++) {
                                int color = image.getRGB(column, row) & 0x000000FF;  //only need the last 8 bits
                                if (color == 255) {
                                        consecutiveWhite++;
                                }
                                else {
                                        if (consecutiveWhite < 3 && column > consecutiveWhite) {  
                                                for (int col = column - consecutiveWhite; col < column; col++) {
                                                        image.setRGB(col, row, BLACK);
                                                }
                                        }
                                        consecutiveWhite = 0;
                                }
                        }
                }
                consecutiveWhite = 0;
                
                //third pass, eliminate vertical gaps
                for (int column = 0; column < image.getWidth(); column++) {
                        for (int row = 0; row < image.getHeight(); row++) {
                                int color = image.getRGB(column, row) & 0x000000FF;  //only need the last 8 bits
                                if (color == 255) {
                                        consecutiveWhite++;
                                }
                                else {
                                        if (consecutiveWhite < 2 && row > consecutiveWhite) {
                                                for (int r = row - consecutiveWhite; r < row; r++) {
                                                        image.setRGB(column, r, BLACK);
                                                }
                                        }
                                        consecutiveWhite = 0;
                                }
                        }
                }

This code fills in any small vertical and horizontal runs of white pixels with black pixels, the rationale being that any small group of white pixels that is surrounded on either end by black pixels is virtually guaranteed to be part of the data that was erroneously discarded. Again we can take a peek at our result:

PHPBB2 CAPTCHA, after third pass

Getting better, but we’re not quite there yet. Our characters are much more distinct, but there is still some missing data. A fair bit of the missing data is now contained in small regions of white pixels that are actually encapsulated within our characters. Filling them in is a relatively simple matter:

                //fourth pass, attempt to fill regions
                for (int row = 0; row < image.getHeight(); row++) {
                        for (int column = 0; column < image.getWidth(); column++) {
                                if (image.getRGB(column, row) == WHITE) {
                                        int height = countVerticalWhite(image, column, row);
                                        int width = countHorizontalWhite(image, column, row);
                                        int area = width * height;
                                        if ((area <= 12) || (width == 1) || (height == 1)){
                                                image.setRGB(column, row, BLACK);
                                        }
                                }
                        }
                }
                
                //fifth pass repeats the fourth
                for (int row = 0; row < image.getHeight(); row++) {
                        for (int column = 0; column < image.getWidth(); column++) {
                                if (image.getRGB(column, row) == WHITE) {
                                        int height = countVerticalWhite(image, column, row);
                                        int width = countHorizontalWhite(image, column, row);
                                        int area = width * height;
                                        if ((area <= 12) || (width == 1) || (height == 1)){
                                                image.setRGB(column, row, BLACK);
                                        }
                                }
                        }
                }

Here we check, for each white pixel, how many adjacent white pixels exist both vertically and horizontally. This gives us a rough estimate of the size of the current region of white pixels. If the size is too small, then the code assumes that the white pixel is actually supposed to be part of the data, and turns it black. Note that the algorithm is methodical in its approach, in that when it detects a small region of white pixels, it toggles only the initial pixel that it tested in that region. This toggling will reduce the region-size reported for any adjacent white pixels, increasing the likelihood that they will be toggled as well on the next iteration, which is why two passes of the same algorithm are applied. And yes, I know having the same code repeated twice is poor coding style, but for illustrative purposes it gets the job done. Anyways, we now have:

PHPBB2 CAPTCHA, after fifth pass

Many of the gaps are now filled in, and the text is starting to look fairly legible. There are now, however, a few spurious black pixels that have cropped up along the edges of the characters. We could go back and refine the previous step, but instead let’s just prune out these outliers:

                //sixth pass, clear any false-positive
                for (int row = 0; row < image.getHeight(); row++) {
                        for (int column = 0; column < image.getWidth(); column++) {
                                if (image.getRGB(column, row) != WHITE) {
                                        if (countBlackNeighbors(image, column, row) < 3) {
                                                image.setRGB(column, row, WHITE);
                                        }
                                }
                        }
                }

This pruning step removes any black pixels that are bordered by 3 or fewer black pixels. This is a fairly strict threshold, and will have the effect of smoothing/rounding out corners (i.e. some legitimate data will be discarded), but it will also clear out any spurious black pixels that exist in the image. Now our image looks like so:

PHPBB2 CAPTCHA, after sixth pass

The letters have taken on a softer, more rounded quality. They also happen to look vaguely reminiscent of what you might get if you were to scan a text document using an older scanner. Which is worth mentioning because we will eventually be feeding our cleaned-up CAPTCHA image to an optical-character-recognition program that is designed to process just this sort of data. First, however, our characters are all misaligned. We’ve come this far, so we might as well fix the alignment issue while we’re at it:

                //now find the characters
                List<CharacterBox> characters = new ArrayList<CharacterBox>();
                int totalCharWidth = 10;
                int maxCharHeight = 0;
                for (int column = 0; column < image.getWidth(); column++) {
                        int highestBlack = countVerticalWhite(image, column, 0);
                        if (highestBlack < image.getHeight()) {
                                totalCharWidth += 5; //5 px spacing in between chars
                                CharacterBox box = new CharacterBox();
                                box.setX(column);
                                while (column < image.getWidth() && countVerticalWhite(image, column, 0) < image.getHeight()) {
                                        int currentBlack = countVerticalWhite(image, column, 0);
                                        if (currentBlack < highestBlack) {
                                                highestBlack = currentBlack;
                                        }
                                        column++;
                                }
                                box.setWidth(column - box.getX());
                                box.setY(highestBlack - 5);
                                box.setHeight(image.getHeight() - highestBlack + 5); //can trim this later
                                if (box.getHeight() > maxCharHeight) {
                                        maxCharHeight = box.getHeight();
                                }
                                totalCharWidth += box.getWidth();
                                characters.add(box);
                        }
                }

Here we simply compute a bounding box for each distinct region of black pixels (i.e. each character), plus some additional padding so that our output image will draw nicely. Speaking of output image, we can now create it by positioning our characters in correct alignment with each other in a new image, like so:

                //output a new image with aligned characters
                BufferedImage dst = new BufferedImage (totalCharWidth, maxCharHeight,
                                                           BufferedImage.TYPE_INT_BGR);
                for (int column = 0; column < dst.getWidth(); column++) {
                        for (int row = 0; row < dst.getHeight(); row++) {
                                dst.setRGB(column, row, WHITE);
                        }
                }
                int xPos = 5;
                int yPos = 0;
                for (CharacterBox box : characters) {
                        for (int oldY = box.getY(); oldY < box.getY() + box.getHeight(); oldY++) {
                                for (int oldX = box.getX(); oldX < box.getX() + box.getWidth(); oldX++) {
                                        dst.setRGB(xPos + (oldX - box.getX()), yPos + (oldY - box.getY()), image.getRGB(oldX, oldY));
                                }
                        }
                        xPos += box.getWidth() + 5;
                }
                ImageIO.write(dst, "png", new File(OUTPUT));

Now we have the following:

PHPBB2 CAPTCHA, fully processed

The characters are nicely aligned and uniformly spaced. We now have something that is suitable for sending into a character-recognition program. For this example we use tesseract, a free and open-source OCR program that provides a good level of accuracy. We can send our output to tesseract like so:

                Process tesseractProc = Runtime.getRuntime().exec(TESSERACT_BIN + " " + OUTPUT + " " + TESSERACT_OUTPUT);
                tesseractProc.waitFor();

This invokes tesseract on our output image, and it writes its results to a text file located at ‘TESSERACT_OUTPUT‘. In this case, the text file contains the following:

IKEECL

…which is 100% correct.

Using a handful of very simple image filtering loops based around a brief examination of how a human being would approach the image, and some existing OCR software, the CAPTCHA has been defeated. Of course, this only works for this one specific style of CAPTCHA, but the basic approach of reducing noise, amplifying data, and isolating characters should be broadly applicable to a wide range of different CAPTCHA styles. The challenge lies not in breaking the CAPTCHA, but in devising an algorithm that can attempt to break any number of different CAPTCHA styles dynamically and with a success rate comparable to that of a human being. It needs a way to determine, from the CAPTCHA image itself, what kind of noise exists and how it should best be removed. That is the real challenge, and it’s beyond the scope of this article.

Note that for the sake of preserving some sense of brevity I’ve left out the implementation of some minor utility functions and variable declarations and the like. In general, you can assume that a function (or variable) does what its name implies. If, however, you would like a complete copy of the source-code used, you can download it using this link (zipped Eclipse project).

Note that in order to get it to run you will also need to install tesseract on your system, and edit the values at the start of the Java code to point at your local tesseract installation.

Posted in coding, java | Tagged , , , | 30 Comments