Revisiting Perl and Python’s Speed

I was really surprised to see the discussion that was generated as the result of my previous post comparing the speed of Python and Perl.  Many people much wiser than me posted a lot of valuable comments and suggestions, and two people were kind enough to post total rewrites of my routines which (to nobody’s surprise) were much faster than the codes I wrote.

A few people (both here on the blog and through other discussion) raised legitimate points:

  1. My Python code was recompiling the regex every loop iteration because I was confused by how regex compilation and regex match objects work.  Fixing this problem alone increased speed by 10%-25%.
  2. The timings I posted were sub-second and someone suggested that startup overhead may have been hurting Python.  To address this, I used a more “real-life” input file that was 3750 MB rather than the 8.588 MB input file I used earlier.
  3. The style of Perl I was using was archaic, and the style of Python I was using wasn’t terribly Pythonic.  I live in a programming bubble; I learned both of these languages from their respective O’Reilly books and that’s it.  I don’t know anyone who knows either Perl or Python in real life, and I have never seen anyone else’s code in either language.  But as it turns out, poorly written Perl and poorly written Python follow the same trends as well-written Perl and Python (see below).

So as to be a little more scientific about this (since I am a scientist and all), here are my starting parameters:

  • Software
    • Ubuntu Server 10.04 LTS
    • Python 2.6.5 provided by the distribution
    • Perl 5.10.1 provided by the distribution
    • data resides on an ext4 lvm
  • Hardware
    • HP DL360 G7
    • 2x Xeon X5672, 3200 MHz
    • 24GB DDR3 RAM
    • data resides on 6Gbit SAS RAID5
  • Codes

Methodology: I ran each code on the same 3750 MB input file five times in serial succession.  Each execution was timed using the `time` builtin  provided by the bash 4.1.5(1) included with Ubuntu 10.04.  stdout was redirected straight to /dev/null.

Trial Walltime Trial 1 Trial 2 Trial 3 Trial 4 Trial 5
Old Python 309.032 310.971 308.228 311.331 307.170 307.461
New Python 176.880 178.099 174.742 175.463 178.235 177.863
Old Perl 167.051 166.916 165.911 167.361 168.735 166.333
New Perl 126.860 125.913 124.709 130.125 127.809 125.746

So even cleaner code runs over 40% faster in Perl than Python, which is not far off from the 50% slowdown I noted with my two crumbier versions of the code.  Furthermore, it seems easier for a relative novice like myself to write inefficient Python code over Perl code.  Of course, it’s also easier to write Perl code that doesn’t do what you expect, and trying to understand someone else’s code is a crapshoot.

Judging by what others have told me and some comments have pointed out though, Python just isn’t optimized for “practical extraction and reporting.”  Maybe someday I’ll find a use for Python in my work.

In case the links to the codes I used ever go bad, here they are on pastebin:

I’d post the input files I used, but I don’t have anywhere I can anonymously host 3.7 GB (or even 8 MB) files.  If you’re interested in the input data, let me know and I can send a private link.

About these ads

About vitreousilica

I am a researcher in computational materials science. I study problems related to materials chemistry and chemical physics and do a fair amount of scientific programming as a result.
This entry was posted in Technology and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s