The Amazing Adventures of NSArray

Let's start with something absurd.

[[NSArray alloc] isKindOfClass:[NSMutableArray class]];
[[NSMutableArray alloc] isKindOfClass:[NSArray class]];

It turns out that both of these lines are true. Feel free to try it out but have a bucket ready nearby in case your brain starts leaking; it's more than kindOfCrazy.

Peeking into Foundation

Compared to many other languages, Objective-C is just plain funky. The syntax is actually quite small and leads to a lot of patterns being implemented via trickery (reread those two lines above). But by reading, poking, prodding, probing, and exploration the classes the Apple has written it is easy to get insight into some of the more common patterns as well as some of the tricks that at first seem quite bizarre.

Looking at just NSArray, NSNumber, NSString, NSDictionary, and NSObject (or Foundation in general) will reveal tons of tricks and patterns that are both intriguing, educational, and a healthy dose of fun. For this post we'll focus on the insanity that lies within NSArray. Here's some of what will be explored:

  • Literal Syntax
  • Subscripting
  • Class Clusters
  • Toll-free Bridging
  • Indexed Ivars

Literals

Literals are the easiest "magic" to digest that surrounds NSArray. They're relatively new to the language, added in Clang 3.5. The syntax is quite simple:

NSArray *a = @[ foo, bar, baz ];

and is just a literal translation (pun intended) to the following array construction.

id objects[] = { foo, bar, baz };
NSArray *a = [[NSArray alloc] initWithObjects:objects count:3];

Unfortunately there are no hooks to tap into such syntax. It's only available for NSArray and NSDictionary and so there isn't much more to do with it.

Subscripting

Clang 3.5 also added support for using square brackets for subscripting.

IndexedContainer *ic = ...;
ic[3] = @"foo";
id obj0 = ic[5];

KeyedContainer *kc = ...;
kc[@"qux"] = @"bar";
id obj1 = kc[@"bazzle"];

It's super simple and just results in a simple translation that depends on the subscript being an integral type (indexed) or an object (keyed).

IndexedContainer *ic = ...;
[ic setObject:@"foo" atIndexedSubscript:3];
id obj0 = [ic objectAtIndexedSubscript:5];

KeyedContainer *kc = ...;
[kc setObject:@"bar" forKeyedSubscript:@"qux"];
id obj1 = [kc objectForKeyedSubscript:5];

This code is identical to the previous block but its easier to read. Thankfully NSArray and NSDictionary implement these methods as we'd expect and suddenly things feel just a little nicer.

Inspecting the Memory of a Constant Array

Let's construct a simple, constant array. (Note that everything that follows is being compiled and run on a 64-bit architecture - it doesn't matter if it's iOS or OS X).

NSArray *array = @[ @"one", @"two", @"three" ];

The object that we get it really just a pointer to a region of memory (note the asterisk). Typically we acknowledge the pointer-ness of Objective-C objects as nothing more than an implementation detail, but it is fun to wonder what the memory region that it points to looks like. Let's start by finding out how large the memory region that we are looking at is.

#import <objc/runtime.h>

NSLog(@"%lu", class_getInstanceSize([array class]);

> 16

Alright, so there's 16 bytes in there. The first 8 are the isa pointer, as is for every object, so that leaves only 8 more bytes to actually look at. Within those 8 bytes we hope to find some semblance of the structure of an array (such as a length and an actual id*, or something similar).

Let's look at the 8 bytes following the isa pointer and see if those look like a pointer, perhaps to some internal data structure just a simple buffer that holds the contents. (If you aren't sure why the bridge-cast was needed it might be worth reading my writeup on Objective-C ARC.)

char *bytes = (char *)(__bridge void *)array;
    
NSLog(@"%x", *(uint64_t *)(bytes + 8));

> 3

That looks promising. The length of the array we used was 3, but this might just be a coincidence. Let's try a few more lengths.

char *bytesArrayLen4 = (char *)(__bridge void *)@[ @1, @2, @3, @4 ];
NSLog(@"%x", *(uint64_t *)(bytesArrayLen4 + 8));

char *bytesArrayLen7 = (char *)(__bridge void *)@[ @1, @2, @3, @4, @5, @6, @7 ];
NSLog(@"%x", *(uint64_t *)(bytesArrayLen7 + 8));

> 4
> 7

Sweet, we found the length! But it's possible that the length is not actually stored in all 8 bytes and is really only in the first 4. So let's see what happens if we look at bytes 12-15 as an 4-byte integer.

NSLog(@"%x", *(uint32_t *)(bytes + 12));

> 0

Well darn. Either those bytes are not necessarily in use, represent an all-zero set of flags, or the length really is stored in a 64-bit quantity.

Hunting for the Rest of the Object

Regardless of what's going on, we have found a serious problem. The 16 bytes available contain the isa, the length, and not much more. Where on Earth are the actual elements?!

Perhaps there is a global map somewhere that maps array addresses to array contents. However if that is the case then I am just going to retire and give up on this insanity. Luckily that isn't the case. If you recall, we used class_getInstanceSize to find the size of the instance, but what if somehow the chunk of memory that we are looking at is actually bigger? Let's look at how big it is instead of how big we expect it to be.

#import <malloc/malloc.h>

NSLog(@"%lu", malloc_size(bytes));

> 48

Ummm what? Instances of this class are only supposed to be 16 bytes. Where did the the other 32 come from? Let's skip that question for now and look inside the next 32 bytes that we apparently have.

NSLog(@"%@", *(uint32_t *)(bytes + 16));
NSLog(@"%@", *(uint32_t *)(bytes + 24));
NSLog(@"%@", *(uint32_t *)(bytes + 32));
NSLog(@"%@", *(uint32_t *)(bytes + 40));

> one
> two
> three
> (null)

We found the contents. How fascinating, they are actually stored directly inline within the memory region that the array variable is pointing to. It's super easy to check that making an array of length 4 occupies that last slot. An array of length 5 or 6 will have a region of memory that is 64 bytes long, 7 and 8 is 80 bytes, and so on. The malloc-size is the smallest multiple of 16 that is large enough to hold the 8-byte isa, the 8-byte length, and then a pointer to each object in the array.

Here is a struct that has equivalent layout to a constant NSArray.

struct MyArray {
  Class isa;
  uint64_t length;
  id object0;
  id object1;
  id object2;
  ...
};

 

Allocating the Memory for an Instance

Now that we understand the layout we need to figure out how the memory became that large in the first place. Whoever allocated the memory clearly didn't just use class_getInstanceSize. There was a chunk of bytes added on at the end. Time to go crawl through <objc/runtime.h>, my favorite header. 

There appears to only be one method for creating an instance of a class.

id class_createInstance(Class cls, size_t extraBytes)

And it allows you to specify a number of bytes to throw onto the end. Setting a symbolic breakpoint on this function in LLDB shows that the extraBytes parameter is always n*sizeof(id) and the mystery is solved. There is also a function for getting the address of the chunk of extra bytes which we can use to see exactly what we'd expect at this point.

Compiled with MRR (-fno-objc-arc)

// object_getIndexedIvars is not available in ARC.

NSArray *array = @[ @"one", @"two", @"three" ];

void *indexedIvars = object_getIndexedIvars(array);

NSLog(@"%@", *(id *)(indexedIvars + 0));
NSLog(@"%@", *(id *)(indexedIvars + 8));
NSLog(@"%@", *(id *)(indexedIvars + 16));

> one
> two
> three

Suddenly the world is starting to make sense and we have unraveled that memory layout of a constant NSArray. It might seem like there are no surprises left but think carefully about when class_getInstanceSize could be called.

Let's use alloc-int directly instead of the container literal syntax so that we can get a grasp of what is happening.

id objects[] = { @"one", @"two", @"three" };
NSArray *a = [[NSArray alloc] initWithObjects:objects count:3];

The allocation, a call to calloc using the sized returned from class_getInstanceSize would be expected to occur in the call to alloc in the above snippet. But how does it possibly know what size to allocate? The information related to how big the array is doesn't appear until the init instance method is called on the already allocated region.

It is indeed possible that the init method releases the memory and allocates the correct size (see the discussion on init consuming self to understand why this is semantically acceptable) except that doing so would be incredibly wasteful.

To start with, let's inspect the object returned from the call to alloc. We have to return to MRR again because ARC will vomit if you call alloc but not init.

Compiled with MRR (-fno-objc-arc)

NSArray *array = [NSArray alloc];
NSLog(@"%lu", malloc_size(array));

id objects[] = { @"one", @"two", @"three" };
array = [array initWithObjects:objects count:3];

NSLog(@"%lu", malloc_size(array));

> 16
> 48

There's no arguing that. Clearly a new piece of memory is being returned from the init method. How wasteful... :'( Although perhaps we could return something intelligent from alloc so that we don't have to waste it.

Class Clusters and Returning a Factory from Alloc

A sneaky thing to do would be to implement the alloc method to return a factory (which could easily be a global singleton if we wanted it to). Then implement the init methods on that factory to return new objects, allocated exactly as you need. This is exactly what NSArray does. Every time you call alloc on the NSArray class it returns a global instance of NSPlaceholderArray, the magical array factory.

This trick of returning a factory from alloc is actually quite common. It allows you to defer any allocation until the init method when you have all the information to construct exactly what you need. This allows such tricks as appending extra bytes to the instance or choosing a specific subclass to allocate.

The pattern of using a factory to instantiate a specific subclass is called a class cluster. The interface of an abstract class is used but in such a way that you could be dealing with any specific subclass without having to worry about it.

  • NSString is a cluster that creates instances best suited for the underlying characters (such as ASCII or UTF8).
  • NSNumber is a cluster that returns global singletons (such as for @YES and @NO), tagged pointers, or CFNumbers (good ol' toll-free bridging).
  • NSDictionary and the other collection classes also serve as clusters.

One of the primary concepts of a cluster is that you never actually have an instance of the abstract base class, but instead of some private subclass. In the case of the example that has been used to far, it is __NSArrayI, the "in-place" NSArray subclass.

NSLog(@"%@", [@[ @"one", @"two", @"three" ] class]);

> __NSArrayI

There's actually more than 20 subclasses of NSArray in Foundation alone. That's an impressive cluster. :D

NSPlaceholderArray

Now that we know that NSArray's alloc method returns a factory, it's worthwhile to examine the object in a little more detail. The object clearly has all of the init methods (since it uses them to create instances) but what exactly is it?

Well, it definitely should be an NSArray. Returning something from alloc that wasn't one sounds completely nonsensical although only in the most pedantic of reasoning. Thus it's fair to assume that NSPlaceholderArray is a subclass of NSArray, but this causes some subtle trickery.

It turns out that for all the same reasons explained so far the alloc method on NSMutableArray also returns an NSPlaceholderArray singleton (not the same one though). This means that NSPlaceholderArray must also be a subclass of NSMutableArray, and it is. The inheritance hierarchy is thus:

NSArray ← NSMutableArray ← NSPlaceholderArray

Which leads to the silliness of both these two lines being true:

[[NSArray alloc] isKindOfClass:[NSMutableArray class]];
[[NSMutableArray alloc] isKindOfClass:[NSArray class]];

And so are both of these lines:

[[NSNumber alloc] isKindOfClass:[NSValue class]];
[[NSValue alloc] isKindOfClass:[NSNumber class]];

Since NSNumber and NSValue are both class clusters that return a factory from alloc and have the inheritance hierarchy:

NSValue ← NSNumber ← NSPlaceholderValue ← NSPlaceholderNumber

If you think carefully about this you'll see that means that the object returned from NSValue's alloc method is a subclass of NSNumber. So, can we NSPlaceholderValue to vend an NSNumber for us?

NSLog(@"%@", [(id)[NSValue alloc] initWithInt:8]);

> *** Terminating app due to uncaught exception 'NSInvalidArgumentException',
reason: '*** initialization method -initWithInt: cannot be sent to an abstract
object of class NSPlaceholderValue: Create a concrete instance!'

Darn it's too smart for us.

The Adventure Ends

Hopefully you enjoyed this somewhat unusual look into the internals of an Objective-C class. Looking into Apple's classes reveals a plethora of common and uncommon tricks. Learning about them is not only fun but helps understand how impressive and expressive the language actually is. If any of this actually changes any of the code that you are writing then I would love to know what on Earth you are doing. But at least now you know.