Shortly after my post about speeding up Python with Cython, I was contacted by Mark Dufour, creator of ShedSkin, a Python-to-C compiler, who wanted to try my code with his compiler. I had heard of ShedSkin before, but I chalked it up as something to try later, or something too hard to try (C is not my forte).
After Mark contacted me, I decided to give it a go on the code of the post, and, to my great, surprise, it performed a bit better than Cython with no changes to my code. ShedSkin does require that you program in a restricted subset of Python, but most of my scientific code is written in that style anyway (it’s not really that restricting). After that point, I used ShedSkin for all my other assignments, and now I’m writing about it.
A few days ago I had a bioinformatics assignment, and the goal was to recognize protein location from their structure. I wrote an SVM to classify the proteins, compiled it with ShedSkin and ran it. I will give you a sample of the Python code and the same code modified for ShedSkin.
Before:
def train_adatron(kernel_matrix, label_matrix, h, c):
tolerance = 0.5
alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
bias = [0.0] * len(label_matrix[0])
labelalphas = [0.0] * len(kernel_matrix)
max_differences = [(0.0, 0)] * len(label_matrix[0])
for iteration in range(10*len(kernel_matrix)):
if not iteration % 100:
print "Starting iteration %s..." % iteration
for klass in range(len(label_matrix[0])):
max_differences[klass] = (0.0, 0)
for elem in range(len(kernel_matrix)):
labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
for col_counter in range(len(kernel_matrix)):
prediction = 0.0
for row_counter in range(len(kernel_matrix)):
prediction += kernel_matrix[col_counter][row_counter] * \\
labelalphas[row_counter]
g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
if difference > max_differences[klass][0]:
max_differences[klass] = (difference, col_counter)
After:
def train_adatron(kernel_matrix, label_matrix, h, c):
tolerance = 0.5
alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
bias = [0.0] * len(label_matrix[0])
labelalphas = [0.0] * len(kernel_matrix)
max_differences = [(0.0, 0)] * len(label_matrix[0])
for iteration in range(10*len(kernel_matrix)):
if not iteration % 100:
print "Starting iteration %s..." % iteration
for klass in range(len(label_matrix[0])):
max_differences[klass] = (0.0, 0)
for elem in range(len(kernel_matrix)):
labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
for col_counter in range(len(kernel_matrix)):
prediction = 0.0
for row_counter in range(len(kernel_matrix)):
prediction += kernel_matrix[col_counter][row_counter] * \\
labelalphas[row_counter]
g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
if difference > max_differences[klass][0]:
max_differences[klass] = (difference, col_counter)
You might notice that the two snippets are identical. That’s how awesome ShedSkin is. It didn’t need a single change, and on top of that, it gave me compile-time errors when I messed up my code.
The timings of the pure Python and ShedSkin compiled code are:
python shedskin
------------- ------------
4841.94 sec 103.30 sec
You can find my code in the ShedSkin repository.
That is a 47x speedup (not 47%, 47 times), just by running two commands to compile my code to C and C to machine code. Needless to say, I will be using ShedSkin a lot more in the future.