Posts Tagged performance
Improving application performance: Object Pool (C#) with automatic object collection
Introduction:
As you know, creating and destroying objects is a costly operation, in terms of performance. In C++ the allocation of objects is not cheap (first the application looking for available memory, validating it and only then allocating). In .NET the allocation of objects is actually very fast, due to GC memory optimizations in each collection – but this is exactly the reason why destroying objects is expensive: it makes pressure on Garbage Collector, making it work harder and thus affecting application performance.
The trick of object pool is to pay for creation and destruction of the object only once: after the object created (on application startup or on demand) it not destroyed, but cashed for the future use. This way the object is reused multiple times during the whole application lifetime.
The object pool can do very noticeable boost to application performance, especially when the creating of object is time consuming operation, and when the rate of objects creation is high.
The classic examples for object pools are ThreadPool (.NET) and connection pool: both creation of thread or DB connection is time and resource consuming and object pool used here to improve performance. By the way, talking about ThreadPool – on application startup 500 (!!!) threads created to be used during application lifetime.
Design and Fundamental Ideas
The object pool is nothing more than a collection of objects. On application startup we creating objects and storing them inside that collection. If someone requesting object – we removing it from collection and passing it out; when the object is being returned, we just adding it back to the collection. Now, the big question is: who responsible to return object back to collection?
I am not asking about who getting object from collection, because it is easy: the one who creating these objects, having object pool and instead of creating object using ‘new’ keyword, asking pool to get it. However, the one who receiving object doesn’t know (it shouldn’t know!) where this object belongs to. From architectural point of view, the receivers of the object (could be third party code) should not know the origin of the object. Allowing them this knowledge will introduce unnecessary code dependencies. Consider the following example: you generating the same type of object in 20 different places (creators). All created objects are passed to one single place (receiver), maybe for some kind of processing. After the object has been processed, it has to be returned back to object pool. The receiver doesn’t know which object pool (if any!) is an owner of this object. So who does know where this object belongs to? The answer is the object itself.
When the object created first time, he can receive information about the creator, and use this information to return back to home, when necessary. How we can know if it necessary? Very easy – when no one holds reference to that object and we want to ‘destroy’ it – instead of being destroyed it simply returned back to object pool. Here I use the idea of Object Resurrection.
The Object Resurrection
The idea of object resurrection described the best by Jeffrey Richter in his book ‘CLR via C#’. In short, when Garbage Collector thinks it is a time to destroy an object and calling to its Finalize method (in case it has one), the object can ‘escape from death’. Inside the Finalize method, object can return to pool, thus obtaining back reference to itself and preventing garbage collection. As you can see, the receiver of the object doesn’t have to do anything special: it uses object and forgetting about it – the object will return ‘automatically’ back to pool.
Implementation
First, let’s examine the BasePoolObject object – this is object designed to “find its way back to home”.
The Init method:
public void Init(ObjectHandler handler)
{
_handler = handler;
}
ObjectHandler is the delegate to the ReturnObject method of the ObjectPool class. The reason I not storing reference to the ObjectPool object itself, is because I want to keep ReturnObject method private, to avoid direct use of it. This way I am emphasizing the object’s exclusive responsibility to return itself back to pool.
The Finalize method:
if (_handler != null)
{
_handler(this);
GC.ReRegisterForFinalize(this);
}
When this method being called by GC, first object returns back to pool, and second: I re – registering object back to finalize queue, otherwise the Finalize method will never be called again.
However as you know, unlike C++ when you removing any reference to object and going out of variable scope, its Finalize method doesn’t called immediately. Instead, GC starts garbage collection when it thinks it should be done (when generation zero is full). So when the object creation rate is very high, it can introduce kind of memory leak: objects will be created faster than they die, thus making memory constantly grow. By the way, you will experience this problem with any consuming resources object. That’s why in .NET special interface defined: IDisposable. If object has to be returned to pool immediately, calling to its Dispose method will do that. Notice, in Dispose method I am not calling to GC.ReRegisterForFinalize. That’s because not the GC who calling to Dispose method, and object remains registered to Finalize queue after Dispose was called. Also, as you can see there is no virtual protected Dispose method. This is because I want to ensure, the resources is not going to be released while object is still in pool. Instead, release all resources inside the Terminate method.
The Reset method:
One of the pifalls when pooling objects, is their state. If someone before us used the object, and put into some data – probably when I resuing object I expecting to get it ‘clean’ – like if I just created it first time. The Reset method called by pool each time object requested. It is a developer responsibility to reset everithing that should be reseted inside this method.
The Terminate method:
This method called when the object is actually die (not going to be ressurected). Tipically, this will happen when the object pool destroyed.
Inside this method, developer responsible to release all resources holded by this object – close all handles, release unmanaged resources, etc.
Now the ObjectPool object:
As I said in introduction, the object pool is simply collection of objects with small additions:
• Object creation: each created object receiving delegate to ReturnObject method.
• Multithreading: obvious, we have to consider in our implementation thread safety: most of the time will be involved at least two threads – yours and the Finalizer thread. Thread safety can be achieved by using simple queue and synchronizing it by critical sections (just as ThreadPool implemented). However, using locks can potentially introduce dead locks. If this will happen, it will be a disaster – the garbage collection thread will be infinitely blocked – say goodbye to memory reclaiming. Therefore, I using there my ConcurrentLinkedQueue – it is thread safe queue and it has no locks (you can download its source code from here).
• Getting objects: the policy I choose is as the following – if there are no more available objects in the pool, perform garbage collection to get back all not used anymore objects. If even after garbage collection we have no available objects, this means all objects in use, and we have to regrow our pool – I growing it twice each time. Of cause, this is a place for many possible alternatives. For instance, you could try first to perform collection of generation zero, if didn’t helped then generation one and finally generation two. The problem I see there is the collection of each generation also collects all previous. For example, collection of generation one also collects generation zero. I guess (didn’t proved this) in average it faster to perform collection of 3 generations at once (0 + 1 + 2), rather than in worst case perform 6 collections (0 + 0,1 + 0,1,2). However, if your objects lifetime is short – I suggest always try to collect zero generation first.
• Another possible tweak: using weak references instead of strong – it is possible to use weak references from pool to objects, to possibly reduce amount of used memory. Personally, I think this will miss the whole idea of the object pool: if object will be destroyed due to garbage collection, I will have later to create it again – thus loosing all the benefits of the object pool. However, I pointing on this to give you another idea for possible customization to match best your specific needs.
The full source code can be downloaded from here (hosted by RapidShare). As always, I will glad to receive any feedbacks, bugs and everything you like.
6 comments October 8, 2008
Volatile and Interlocked.CompareExchange
Recently, when I tried to CAS volatile variable, I got this warning: a reference to a volatile field will not be treated as volatile. So the question is: does CAS succeed or not? I consulted with Jeffrey Richter on this subject, and here is his explanation: when passing volatile variable by ref to method, it is always treated as non volatile. Compiler just don’t distinguish between CompareExchange and all other methods, so it warnings in any case. The CAS operation will succeed, even with volatile variable (or more correct is to say: it will not failure because of volatile variable). Well, I think that make sense, since the only difference between volatile and non volatile is volatile can’t be optimized and stored in shared memory rather than in CPU cache.
In addition he recommended to use Thread.VolatileRead/Write methods instead of declaring variable as volatile for better performance. This is contrasting to conclusion from this article, so I don’t decided which direction is better (despite my personal respect to Jeffrey).
If we talking about multithreading, here are couple bullet points for thinking in spare time:
-
According to .NET specifications, reads and writes to/from variables (even non volatile) are atomic (when memory is properly aligned). However, in contrast with volatile variables, other CPUs not guaranteed to see changes “in real time”. Here is nice article explaining .NET 2.0 memory model and phenomenons such as movement of reads and writes in time.
-
Operations such as increament are not atomic: to perform it CPU has to follow these steps: 1- Load a value from an instance variable into a register, 2- Increment (or decrement) value, and 3 - Store the value in the instance variable. The same is true for volatile variable.
-
It is possible to create concurrent (thread safe) objects and collections without using even single lock (just using volatile and CAS)! Say goodbye to deadlocks, priority inversions, and performance bottlenecks! There are many research on this subject, here are some nice links (I sorted them from basics theoretic to advanced practical):
-
Lock-free and wait-free algorithms - basic idea and definition from wikipedia
-
Software transactional memory - another interesting idea: access and modify shared objects by splitting operations to separate transactions (something like in DB)
-
Here one guy explaining idea of synchronization using CAS operation.
-
Multi Processor Progamming cource from Tel Aviv University - very cool slides with detailed explanations and examples on Java
-
University of Cambridge - including Lock-free library (based on STM idea, implemented on Linux C++). Also I highly recommending to read Concurrent programming without locks paper – extremely interesting!
-
Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms- by Maged M. Michael and Michael L. Scott
-
SXM- C# Software Transactional Memory (STM) by Tim Harris. Unfortunately the licence is not permitting commercial use (yet).
-
NSTM - C# STM by Rulf Suderbucher (Open Source, New BSD License)
-
DSTM2 - Java Software Transactional Memory.
-
java.util.concurrent namespace.A Java library developed by Lea Doug, which is including everything you may need in concurrent programming: free lock hash table, lists, sets, java implementation of M.M. Michael queue and many many more. The full JDK code could be downloaded from here. I am not a lawyer, but as much I understand, this open source code could be modified but is not allowed to be ported to any other language but Java
-
- Is it just me or .NET has a HUGE gap to fill for concurrency support compared to Java? Last point: Microsoft preparing to release in .NET 4.0 library for parallel computations. It will provide API, which will allow to execute separate calculations on different processors. Couple month ago they released preview (not for commercial use yet). The problem is when these calculations performed on shared memory, you still have to synchronize it – and here creating performance bottleneck. Anyone beside me see how to solve that problem?
- Update (11.03.08): Here is one additional product: Intel Threading Building Blocks (TBB), which includes concurrent collections and parallelism API (Microsoft doing now something similar). Language – C++. Available in commercial and Open Source versions.
Add comment March 7, 2008
Another thoughts about hash tables.
As you probably understand from my previous posts, I working a lot with hash tables. During my work on the current project, I saw people doing so badly things with them, so I decided to write something on this topic.
In my atricle about hash table keys I discussed how is bad to use value types as keys (this is applied to non generic hash table). Today I saw another stupid thing, which is worth to mention. Consider the following code:
if(!_hashTable.ContainsKey(someKey))
{
hashTable.Add(someKey, someValue);
}
You see, each time you calling to ContainsKey, it has to call GetHashCode(), find proper bucket in the hash table and then, in case of collisions, go over all entities of that bucket. After this you adding your value, and during this the same is doing again. Much cheaper is just to add value in try/catch block, and catch ArgumentException in case of duplication. Even more, you can use indexer – in this case the duplicate will be overwritten, without exception being thrown.
The same with removing values:
//Bad Code!!
if(_hashTable.ContainsKey(someKey))
{
_hashTable.Remove(someKey);
}
In any case, no exception is thrown. Remove method returns true or false according to success of operation. The same with getting value from hash table. Instead of calling to ContainsKey method, much cheaper to do that in try/catch block, and catch KeyNotFoundException. You should use ContainsKey only when you have to do some action based on the result, but this action should not be done on hash table (or else you will pay twice).
Another point: are you sure you need to use hash table? Because if you adding and removing objects in high rate, probably you should consider to use another data structure. For many people I know, hash table looks as best data structure because of O(1) add/search/remove operations. However, this O(1) can take significant time. Each access to hashtable requires hash computation, which is relative slow operation (even if it done in constant number of steps). In case you have collisions, you will have to go over all entries of the bucket. And in case you reaching hash table capacity, you have to rebuild and rehash the whole table. Therefore, when the number of objects is not so big (I didn’t tested for exact limit) is better use other data structure, such as balanced tree (if n is not big, log(n) can be faster than O(1) – think about this). Anyway, if you going to use hashtable, make it BIG (at least twice bigger than you need) to avoid rebuilds and collisions as much as possible. By default, the size of hashtable (if I remember correctly) is 17.
Important remark: from people responses, I did a conclusion that I have to explain better the main point: you can actually improve performance by using try/catch block, only when GetHashCode is expensive operation. For example, GetHashCode from integer (in default implementation) is very fast – the hash code is the integer itself. In this case using try/catch block is bad option, it will perform slower. However, when you calling GetHashCode on reference type key (espessially on string) – it is better to use trycatch than call to ContainsKey. Today, when I tryed to explain that to some guy, I did test with string key (10-12 chars). Implementation with try/catch block performed 25% faster. This leads us to important conclusions: to achieve best performance you have to understand whats going on under the hood, and for each rule the is an exception.
Add comment March 4, 2008
Revealing synchronization magic of ArayList.Synchronized collection
Today I looked (and still looking) for some kind of synchronized non blocking container. When I saying “non blocking”, I mean while threads are waiting to take access over a container, they continue to do their work – performance improvement (on multi processors machine only, of cause).
I always prefer to find some existing, working and well tested solution, instead of reinventing wheel. One my colleague suggested to use ArrayList.Synchronized method, which is creates synchronized collection from requested type (ArrayList or any other collection implementing IList interface). He claimed that synchronization of ArrayList.Synchronized collection is implemented better than an ordinary lock, and its performance is better as well. I decided to test this collection, and used Reflector to dig inside it.
After locating Synchronized method inside the ArrayList class, I noticed that actually SyncArrayList wrapper created around provided collection (ArrayList ot IList), and looks as the following:
public static ArrayList Synchronized(ArrayList list)
{
if (list == null)
{
throw new ArgumentNullException(“list”);
}
return new SyncArrayList(list);
}
Ok, now the question is: how Add method implemented inside SyncArrayList? Here is the answer:
public override int Add(object value)
{
lock (this._root)
{
return this._list.Add(value);
}
}
SyncArrayList actually using the same lock we all know! Therefore, there is nothing special about ArrayList.Synchronized synchronization, performance is exactly the same. Moreover, if you have to add 200 items to your synchronized collection, you going to lock and unlock it 200 times! On the other hand, if you are the one who controlling synchronization (manually locking) – then you will lock and unlock only once – just before and after adding all the 200 items.
From the all stated above, the conclusion is obvious: use ArrayList.Synchronized collection only when you want to guarantee its synchronization, even by cost of performance.
P.S. As I said, I still looking solution to my problem. I will appreciate any help on this.
Add comment February 14, 2008
Key’s Equality in Hashtables (or who equal to who?)
Consider the following simple class:
public class SomeClass
{
uint _id = 125;public SomeClass(uint id)
{
_id = id;
}
public override int GetHashCode()
{
return (int)_id;
}
}
We want to add instance of this class to hashtable, where the key is object’s ID, and the value is object itself. If we doing the following, it works:
SomeClass someClass = new SomeClass(125);
//1.
//========
hashTable[someClass.GetHashCode()] = someClass;
object returnValue = hashTable[125];
The returnValue is exactly our someClass instance. However, using value types as keys as bad, due to boxing each time key is passed to the hashtable. How we can improve this? Well, as known, inside hashtable ‘GetHashCode’ method is being called to determine the key. Therefore we can pass as a key the object itself, and override the ‘GetHashCode’ method (which is exactly what we did). Let’s try this code:
//2.
//===========
hashTable[someClass] = someClass;
SomeClass key = new SomeClass(125);
returnValue = hashTable[key];
If you will run this code, you will see the returnValue is null. So what is wrong? When you trying to retrieve your object from hashtable and passing the key, hashtable using ‘Equals’ method, to find the right key. By default, after confirming the hashcode is the same, it comparing references, so in the code above the result is false. Obvious, the solution is to override the ‘Equals’ method (here is the one possible solution):
public override bool Equals(object obj)
{
return (_id == obj.GetHashCode());
}
If we run this code, that works ok. Now, we want to retrieve the object by passing the ID directly, without creating wrapper around it (as I said before, this is bad because of boxing – don’t do that in production code):
hashTable[someClass] = someClass;
returnValue = hashTable[125];
We overrided the Equals method, so this will work? No, it will not. You see, the hashtable comparing the passed key to the stored keys, and not storing keys to passed one. As a result, the ‘Equals’ method is invoked on the passed key – in our example it’s an integer – and it never equal to SomeClass, ofc. The conlcusion is: if you using your custom classes as hashtable keys, always override ‘GetHashCode’ and ‘Equals’ methods. And remember to use only your classes for the lookup inside the hashtable.
2 comments January 15, 2008
Fast rendering images in c#
I will start this article with example: let say, we have some kind of GPS application. Our goal is to draw object’s current position on map. We represent object by symbol and maybe by some string near it. The symbol and its related text not changing, the only thing changing is position of the object on the map. We have to update object position very frequently, let say once at 500 ms.
Obvious, this is waste of CPU time to draw the symbol and text again and again each 500 ms. One way to improve performance is to store our object representation image (symbol or/and text, whatever) in memory, and each time copy its bits to different place on the map. At this point, everyone who claims to be Windows programmer should flash in his head word “BitBlt”. If you didn’t, I suggest you to review Windows programming basics, start from reading ‘Programming Windows’ by Charles Petzold. In short, BitBlt is Windows API function, which is copying bits from one Device Context (DC) to other. The common use of this function is implementation of Double Buffering – well known techniques to draw without “flickering”.
In demo application I prepared (download link here – hosted by RapidShare), I drawing text “Test string” using 3 ways:
- Graphics.DrawString
- Graphics.DrawImageUnscaled (which is fastest method in DrawImageXXX series)
- Calling directly to unmanaged BitBlt function
To see differences in performance, I call each way 10000 times, and mesuring start and end times. Here are results (Athlon 64 bit dual core):
- Calling Graphics.DrawString – 3984.375 ms
- Calling Graphics.DrawImageUnscaled – 7765.625 ms
- Calling BitBlt – 1015.ms
Copying bits from memory is x8 times faster than drawing image directly!!! And x4 times faster rendering text. In additional, you have in your desposal all raster operations, which you can use inside BitBlt method, such as invert colors – very useful staff. By the way, .NET framework do provide one method which is wraps BitBlt – Graphics.CopyFromSceen method. As expected, inside it Microsoft guys calling BitBlt with 2 DCs: current graphics and Desktop hwnd (hwnd = NULL).
As far I know, this is the best you can do using GDI. Still, you can notice that my application even using BitBlt consuming very high CPU (on my machine – 50%). In case you need better performance, consider to use XNA (managed DirectX).
2 comments January 5, 2008
Fast serialization from .NET to non .NET applications.
In .NET, in case you need perform data exchange between .NET and non .NET applications, the only built in way do do this, is to serialize data using SOAP serialization. However, creating and sending XML is too slow for time critical applications. Therefore you need to find some alternate solution to this problem. No matter to where you sanding data (network or whatever…), somehow you need to create byte buffer, populated with your data. And, for best performance you need to do this as fast as possible. I aware about 2 ways to create byte array with data populated into it:
- Create MemoryStream and BinaryWriter. Write into steam all data, and at the end call MemoryStream.ToArray().
- Use BitConverter class for each value type you have. Use Encoding.GetBytes for strings. Combine all together using Buffer.BlockCopy().
I presenting here my way, which is faster 10-20 times than first solution and faster 3-5 times than second one. It is very simple: I creating buffer in advance with desired size (i calculating the whole size before serialization). Creating pointer to it, and using cast to copy values inside. After each copy, I moving pointer forward for the amount of copied data. For example, if I copied boolean value, I should move pointer 4 positions forward, because of the size of boolean type (4 bytes). The same for each type, consult MSDN if you need to know size of each type.
Example – serialization of double value:
I know, that the size of double is 8 bytes. Therefore, I creating buffer in same size:
byte[] buffer = new byte[8];
fixed(byte* pBuffer = buffer) //must use fixed keyword – avoids GC move buffer in memory
{
byte* pBufferWriter = pBuffer; //must create another pointer, because pBuffer is readonly
*((double*)pBufferWriter) = yourDecimalValue; //copy value
pBufferWriter+=8; //move pointer forward 8 bytes.
}
Example – deserialization of double value:
…. //take here pointer pBufferReader to byte[] with data
double result = *((double*)pBufferReader);
pBufferReader+=8;
Same is for any value primitive(int, short, bool. etc..).
The real challenge is serialization of strings. To serialize string, first step is to take pointer to it.
fixed(char* pStr = yourStringValue)
from this point, you just need to copy all bytes from pStr to your buffer. Remember, that .NET uses Unicode, and each char is represented by 2 bytes. I suggesting here function I using to copy strings – its performance is much better that coping byte by byte:
public sealed class SerializationHelper
{
/// <summary>
/// The StringCopy optimized to copy chars.
/// </summary>
/// <param name=”dmem”>Pointer to destination</param>
/// <param name=”smem”>Pointer to source.</param>
/// <param name=”charCount”>Count of chars to copy.</param>
public static unsafe void StringCopy(char* dmem, char* smem, int charCount)
{
if (charCount > 0)
{
if ((((int)dmem) & 2) != 0)
{
dmem[0] = smem[0];
dmem++;
smem++;
charCount–;
}
while (charCount >= 8) //it is eight here, not smile )
{
*((int*)dmem) = *((int*)smem);
*((int*)(dmem + 2)) = *((int*)(smem + 2));
*((int*)(dmem + 4)) = *((int*)(smem + 4));
*((int*)(dmem + 6)) = *((int*)(smem + 6));
dmem += 8;
smem += 8;
charCount -= 8;
}
if ((charCount & 4) != 0)
{
*((int*)dmem) = *((int*)smem);
*((int*)(dmem + 2)) = *((int*)(smem + 2));
dmem += 4;
smem += 4;
}
if ((charCount & 2) != 0)
{
*((int*)dmem) = *((int*)smem);
dmem += 2;
smem += 2;
}
if ((charCount & 1) != 0)
{
dmem[0] = smem[0];
}
}
}
}
After you finishing copying, don’t forget to advance buffer pointer .
Deserialization of string very simple:
string myString = new string((char*)pBuffer); // pBuffer is pointer to your buffer
Suppose, you have need copy block of memory to your buffer (maybe it is internal class, serialized before) – you can use the following method:
public static unsafe void ByteCopy(byte* ps, byte* pd, int count)
{
// Loop over the count in blocks of 4 bytes, copying an
// integer (4 bytes) at a time:
for (int n = 0; n < count / 4; n++)
{
*((int*)pd) = *((int*)ps);
pd += 4;
ps += 4;
}
// Complete the copy by moving any bytes that weren’t
// moved in blocks of 4:
for (int n = 0; n < count % 4; n++)
{
*pd = *ps;
pd++;
ps++;
}
}
As you can test in your applications, the performance is awesome. However, performance always must cost something. In this case, it costs readability of code and errors prone. It is very easy to do mistake here, for example forwarding pointer too many or too less. You must be very careful using this, and double check every line of code. You may ask: why not create helper functions to encapsulate serialization of every type. The answer is performance penalty. In my tests, calling to function works 2 times slower than performing the same code “on the fly”. However, I leave freedom to you to choose what is better for your needs. By the way, calling static function is faster than calling instance one.
Add comment December 24, 2007
Making UI fast and responsive
Your application looks “heavy”? It takes time to move child window inside it, rearrange buttons, and overall performance is bad? Probably one of the reasons you using wrong threads for UI update. Here Ian Griffiths explains the right way to do this much better than I will could to.
Add comment December 23, 2007
Is using memcpy worth it in .NET ?
Recently, I decided to boost my .NET / MFC serialization. Instead of copying object fields field by field, I would like use memcpy to speedup this process. However, it is not trivial at all to use memcpy in .NET, because the code is managed, and you have no control over the managed types. You can however, override this by using many tricks, such as converting reference types to value, by generating special structures and so on. However this makes code very hard to maintenance, and difficult to debug. So the main question is: Is it worth to do all this just to use memcpy, instead of copying field by field?
To answer this, I created very simple application. I copying struct with 5 fields by 2 ways: using memcpy, and second is coping field by field. I repeating coping in loop, for big number of iterations (in my example – 300000000 iterations). I measuring time in each way. Here is average results that I got (Windows XP, AMD athlon 64 dual core):
Compiled in debug mode: Coping using memcpy: 9500ms , Coping field by field: 1750ms.
Compiled in release mode (disabling optimizations) – same as in debug mode.
Compiled in release mode (optimize: minimize size): Coping using memcpy: 7050ms, Coping field by field: 0ms
Compiled in release mode (optimize: maximize speed): Coping using memcpy: 0ms, Coping field by field: 0ms. I suspect that because of optimization, it not entering my loop at all.
Compiled in release mode (optimize: full optimization) – Same as optimize: maximize speed.
When my friend tried the same on Intel 32 bit machine, he said memcpy performed faster in about 300 ms than coping field by field.
Conclusion: as you can see, the difference in performance is very minor (in such a high rate of coping). Therefore, unless you know some nice and clear way to use memcpy in .NET, it NOT worth the overhead to use it.
3 comments December 21, 2007