Speeding up Python code with ShedSkin

Shortly after my post about speeding up Python with Cython, I was contacted by Mark Dufour, creator of ShedSkin, a Python-to-C compiler, who wanted to try my code with his compiler. I had heard of ShedSkin before, but I chalked it up as something to try later, or something too hard to try (C is not my forte).

After Mark contacted me, I decided to give it a go on the code of the post, and, to my great, surprise, it performed a bit better than Cython with no changes to my code. ShedSkin does require that you program in a restricted subset of Python, but most of my scientific code is written in that style anyway (it’s not really that restricting). After that point, I used ShedSkin for all my other assignments, and now I’m writing about it.

A few days ago I had a bioinformatics assignment, and the goal was to recognize protein location from their structure. I wrote an SVM to classify the proteins, compiled it with ShedSkin and ran it. I will give you a sample of the Python code and the same code modified for ShedSkin.

Before:

def train_adatron(kernel_matrix, label_matrix, h, c):
    tolerance = 0.5
    alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    bias = [0.0] * len(label_matrix[0])
    labelalphas = [0.0] * len(kernel_matrix)
    max_differences = [(0.0, 0)] * len(label_matrix[0])
    for iteration in range(10*len(kernel_matrix)):
        if not iteration % 100:
            print "Starting iteration %s..." % iteration
        for klass in range(len(label_matrix[0])):
            max_differences[klass] = (0.0, 0)
            for elem in range(len(kernel_matrix)):
                labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
            for col_counter in range(len(kernel_matrix)):
                prediction = 0.0
                for row_counter in range(len(kernel_matrix)):
                    prediction += kernel_matrix[col_counter][row_counter] * \\
                                 labelalphas[row_counter]
                g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
                betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
                difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
                if difference > max_differences[klass][0]:
                    max_differences[klass] = (difference, col_counter)

After:

def train_adatron(kernel_matrix, label_matrix, h, c):
    tolerance = 0.5
    alphas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    betas = [([0.0] * len(kernel_matrix)) for _ in range(len(label_matrix[0]))]
    bias = [0.0] * len(label_matrix[0])
    labelalphas = [0.0] * len(kernel_matrix)
    max_differences = [(0.0, 0)] * len(label_matrix[0])
    for iteration in range(10*len(kernel_matrix)):
        if not iteration % 100:
            print "Starting iteration %s..." % iteration
        for klass in range(len(label_matrix[0])):
            max_differences[klass] = (0.0, 0)
            for elem in range(len(kernel_matrix)):
                labelalphas[elem] = label_matrix[elem][klass] * alphas[klass][elem]
            for col_counter in range(len(kernel_matrix)):
                prediction = 0.0
                for row_counter in range(len(kernel_matrix)):
                    prediction += kernel_matrix[col_counter][row_counter] * \\
                                 labelalphas[row_counter]
                g = 1.0 - ((prediction + bias[klass]) * label_matrix[col_counter][klass])
                betas[klass][col_counter] = min(max((alphas[klass][col_counter] + h * g), 0.0), c)
                difference = abs(alphas[klass][col_counter] - betas[klass][col_counter])
                if difference > max_differences[klass][0]:
                    max_differences[klass] = (difference, col_counter)

You might notice that the two snippets are identical. That’s how awesome ShedSkin is. It didn’t need a single change, and on top of that, it gave me compile-time errors when I messed up my code.

The timings of the pure Python and ShedSkin compiled code are:

python        shedskin
------------- ------------
4841.94 sec   103.30 sec

You can find my code in the ShedSkin repository.

That is a 47x speedup (not 47%, 47 times), just by running two commands to compile my code to C and C to machine code. Needless to say, I will be using ShedSkin a lot more in the future.

Stavros' Stuff

On programming and other things.

Speeding up Python code with ShedSkin

Conceived on Apr 27, 2010

Stavros

Guy who likes computers

Connect with me

This site is part of the webring:

Recent Posts

Made with ♥ in Greece