The mean is a horrible measure for profiling/benchmarking. When the computation chokes, it generally results in a peak execution time. The mean as measure is not good when you have extreme outliers like that. The median is much more robust in such cases.
So: Don't average over your execution time when you benchmark an algorithm, take the median!