<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Silica in Silico</title>
	<atom:link href="http://silicainsilico.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://silicainsilico.wordpress.com</link>
	<description>Computations, glass, and research</description>
	<lastBuildDate>Sun, 19 May 2013 19:25:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='silicainsilico.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/c73d3e7af2712fc2850a2c7eba513ba0?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Silica in Silico</title>
		<link>http://silicainsilico.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://silicainsilico.wordpress.com/osd.xml" title="Silica in Silico" />
	<atom:link rel='hub' href='http://silicainsilico.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Job Qualifications and Ph.D. Prospects</title>
		<link>http://silicainsilico.wordpress.com/2012/04/12/job-qualifications-and-ph-d-prospects/</link>
		<comments>http://silicainsilico.wordpress.com/2012/04/12/job-qualifications-and-ph-d-prospects/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 21:10:33 +0000</pubDate>
		<dc:creator>vitreousilica</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[frustration]]></category>
		<category><![CDATA[molecular simulation]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://silicainsilico.wordpress.com/?p=90</guid>
		<description><![CDATA[Being near the end of my graduate studies, I&#8217;ve starting looking at the jobs that other people have a little more critically.  Whereas I used to think in terms of &#8220;oh, that&#8217;s interesting,&#8221; I now find myself wondering “am I &#8230; <a href="http://silicainsilico.wordpress.com/2012/04/12/job-qualifications-and-ph-d-prospects/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=90&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Being near the end of my graduate studies, I&#8217;ve starting looking at the jobs that other people have a little more critically.  Whereas I used to think in terms of &#8220;oh, that&#8217;s interesting,&#8221; I now find myself wondering “am I qualified to do that?”  More often than not, the answer is &#8220;no,&#8221; and my prospects have had me feeling quite depressed as of late.  I&#8217;ll have spent five full years in graduate school by the time I get my Ph.D., but aside from the letters after my name, what have I really gotten out of it aside from debt?</p>
<p>Let&#8217;s take inventory of the qualifications I have developed.</p>
<table style="font-size:smaller;" border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top"><strong>Skill</strong></td>
<td valign="top"><strong>Qualification</strong></td>
<td valign="top"><strong>Comments</strong></td>
</tr>
<tr>
<td colspan="3" valign="top">
<p align="center"><strong>Computational Research Qualifications</strong></p>
</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">general molecular simulation</td>
<td valign="top">intermediate</td>
<td valign="top">I&#8217;ve spent the better part of a decade doing MD simulations.  I know quite a bit about them, but that means I know enough to realize how much I don&#8217;t know.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">electronegativity equalization</td>
<td valign="top">intermediate/high</td>
<td valign="top">I understand how EEM works, which is to say I realize how much nonsense it is as a semi-empirical theory.  I spent a year developing a new EEM-based model and hated every minute of it.</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">potential development</td>
<td valign="top">intermediate/high</td>
<td valign="top">I can tune and develop potentials, but this work is extremely tedious and decidedly un-fun</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">algorithms</td>
<td valign="top">intermediate</td>
<td valign="top">I know the basic integrators but don&#8217;t fully understand any of the more technical methods (e.g., Nosé-Hoover, Parrinello-Rahman, SHAKE/RATTLE, Ewald, etc).  I can write a fully functional MD code using basic algorithms (e.g, velocity rescaling, Verlet, and the Berendsen barostat) but have difficulty implementing extended-system-based algorithms.</td>
</tr>
<tr style="background-color:#ffcccc;">
<td valign="top">commercial MD simulation packages</td>
<td valign="top">low</td>
<td valign="top">I&#8217;ve never used any off-the-shelf simulation packages other than LAMMPS and GULP.  Even then, I have not used either very extensively.  My group has always used its own code.</td>
</tr>
<tr style="background-color:#ffcccc;">
<td valign="top">bio/pharma molecular sim</td>
<td valign="top">low</td>
<td valign="top">I know next to nothing about protein folding, docking, bio-centric models (CHARMM, etc), bonded interactions, SHAKE, etc.</td>
</tr>
<tr style="background-color:#ffcccc;">
<td valign="top">quantum simulation</td>
<td valign="top">low</td>
<td valign="top">I know virtually nothing about quantum calculations.  I don&#8217;t have any understanding of basis sets, dispersion forces, DFT, Møller-Plesset, path integral, Car-Parrinello, etc.</td>
</tr>
<tr style="background-color:#ffcccc;">
<td valign="top">continuum simulation</td>
<td valign="top">low</td>
<td valign="top">I know nothing about phase-field methods, finite element/finite difference, etc.</td>
</tr>
<tr>
<td colspan="3" valign="top">
<p align="center"><strong>Other Research Qualifications</strong></p>
</td>
</tr>
<tr style="background-color:#99ff99;">
<td valign="top">data analysis</td>
<td valign="top">high</td>
<td valign="top">I have a strong grasp of many computational tools useful in efficiently analyzing large data sets and correlating data, and I am quite good at leveraging those tools to extract meaningful data and complex relationships.  Some of the tools that I regularly use are Perl, Python, Maple, Mathematica, sh, and awk.</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">technical writing</td>
<td valign="top">intermediate/high</td>
<td valign="top">I have a strong grasp of the English language and can put together sensible manuscripts handily.  Of my three first-author papers published to date, none have ever come back from peer review requiring any major revisions.  Roughly 80-90% of the written text in these manuscripts was in my words.</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">technical presentation</td>
<td valign="top">intermediate</td>
<td valign="top">I’m not a bad speaker and can assemble presentations that follow a logical path.  I design presentations for specific audiences that aren&#8217;t overbearingly technical but at the same time not superficial.  I&#8217;ve won a poster award and spoken at several international conferences.</td>
</tr>
<tr style="background-color:#ff9999;">
<td valign="top">laboratory work</td>
<td valign="top">none/low</td>
<td valign="top">I&#8217;m not very good with my hands, which is why I&#8217;ve stayed out of experimental labs.  I know lab safety but am afraid of dangerous machinery (machine shops, furnaces) and chemicals (highly caustic, toxic, etc) due to lack of experience.</td>
</tr>
<tr>
<td colspan="3" valign="top">
<p align="center"><strong>Scientific Knowledge</strong></p>
</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">ceramics</td>
<td valign="top">intermediate/high</td>
<td valign="top">I know about ceramics, crystal structures, point defects, grain boundaries, processing, microstructure, etc.  I don’t know much about specific technical ceramics</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">glasses</td>
<td valign="top">intermediate/high</td>
<td valign="top">I know a lot about silica.  As you introduce additives and exotic processing, my level of knowledge drops.  I know about its atomic structure, general properties, and mechanical behavior.  I know less about specific modern silicates (mesoporous, etc)</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">physics (general)</td>
<td valign="top">intermediate/high</td>
<td valign="top">I have a pretty strong understanding of general physics and why things happen.  I&#8217;ve also got an aptitude for solving analytical problems.  I could teach undergraduate-level physics pretty adeptly.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">physics (mechanics)</td>
<td valign="top">low/intermediate</td>
<td valign="top">I know enough to know that I don&#8217;t know very much.  I do not have a strong background in Lagrangian/Hamiltonian formalisms (which is to say nobody has ever taught me of their existence).  I am self-teaching this stuff though.</td>
</tr>
<tr style="background-color:#ffcccc;">
<td valign="top">physics (quantum)</td>
<td valign="top">low</td>
<td valign="top">I know the basics, but I haven&#8217;t solved a differential equation in half a decade.  I have no real experience working in modern physics outside of a classroom.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">physics (thermo/stat mech)</td>
<td valign="top">intermediate</td>
<td valign="top">I&#8217;ve taken thermodynamics three or four times and have a fair grasp of it.  My limited knowledge of mathematics prevents me from fully grasping more complicated formalisms (e.g., n-dimensional space)</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">physics (chemical)</td>
<td valign="top">intermediate</td>
<td valign="top">I&#8217;m familiar with many chemico-physical processes, reaction pathways, energetics, etc.</td>
</tr>
<tr>
<td colspan="3" valign="top">
<p align="center"><strong>Technical Computing</strong></p>
</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">architecture</td>
<td valign="top">intermediate/high</td>
<td valign="top">I have a reasonably good understanding of what makes computers fast.  I understand memory and cache layouts, pipelining, SIMD/vectorization, bandwidth, data locality, out-of-order execution, registers, and how to program efficiently with these features in mind.  I do not know x86 assembly.</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">programming</td>
<td valign="top">intermediate/high</td>
<td valign="top">I have a good sense of proper programming, program structure, and good practices.  I have years of experience in C, Fortran 77, Perl, and bash/sh.  I have some experience with Python, awk, C++, and Fortran 90.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">SMP parallel programming</td>
<td valign="top">low/intermediate</td>
<td valign="top">I am reasonably comfortable with OpenMP.  I have basic familiarity with pthreads.  I have never applied either of these to a real project.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">distributed parallel programming</td>
<td valign="top">low/intermediate</td>
<td valign="top">I am familiar with MPI, but I have not used it very extensively.  I am familiar with the concepts and considerations of distributed computing.  I have no experience in fault tolerance or large scaling.</td>
</tr>
<tr style="background-color:#ff9999;">
<td valign="top">GPGPU programming</td>
<td valign="top">low</td>
<td valign="top">I am familiar with the basics of CUDA.  I can write basic kernels, but have no experience using CUDA for research.</td>
</tr>
<tr>
<td colspan="3" valign="top">
<p align="center"><strong>Systems Administration</strong></p>
</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">Linux administration</td>
<td valign="top">intermediate</td>
<td valign="top">I run a few general-purpose Linux servers.  I am comfortable compiling from source (e.g., Apache, PHP), implementing basic security measures (firewalls, quotas, IDS), managing user accounts, working with LVM, etc.  I do not have much experience with SAN, clustered systems, advanced networking, automatic deployment, PXE, virtualization, packaging, etc.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">UNIX administration</td>
<td valign="top">intermediate</td>
<td valign="top">I&#8217;ve run a lot of Solaris servers and am comfortable with Solaris 10&#8242;s way of doing things.  I am experienced with Sun hardware, ZFS, and general administration.  I am unfamiliar with the details of SMF and dtrace.  I have intermediate familiarity with HP-UX 11i, IRIX 6.5, and AIX 5.</td>
</tr>
<tr style="background-color:#ffffcc;">
<td valign="top">cluster administration</td>
<td valign="top">low/intermediate</td>
<td valign="top">I use clusters and can configure a basic one, but I lack experience in diskless nodes, infiniband, low-level tuning</td>
</tr>
<tr style="background-color:#ccffcc;">
<td valign="top">hardware</td>
<td valign="top">intermediate/high</td>
<td valign="top">I have a lot of experience debugging hardware ranging from workstations to enterprise devices.  I have advised purchasing decisions on cluster hardware, assembled clusters, performed upgrades and troubleshooting, and inventory management.</td>
</tr>
</tbody>
</table>
<p>Granted, the fact that I listed some things as having low qualifications still means I&#8217;m probably more qualified than the average Joe off the street who doesn&#8217;t even know such things exist.  Furthermore, the fact that I&#8217;ve listed it means that I know it&#8217;s a shortfall and am willing to bone up on that skill if given the time and opportunity.</p>
<p>With that being said, where do my qualifications leave me?  I&#8217;m more equipped to do molecular simulation than most other researchers since I have intimate knowledge of simulation code, algorithms, and theory, but I also know that there&#8217;s a lot of the detail I don&#8217;t understand.  Do most graduate students know this sort of stuff by the time they finish?  My postdoc coworker wrote a Nosé-Hoover + Parrinello-Rahman thermostat-barostat integrator routine for our group&#8217;s simulation code back when he was a graduate student.  I&#8217;m almost done with my degree and I really have no idea how to do this.  Granted, his undergraduate degree was in physics while mine was in &#8220;ceramic engineering.&#8221;</p>
<p>This sort of thing makes me feel like my education is holding me back.  I know very few simulations people who are in materials science.  The vast majority of molecular modelers are</p>
<ol>
<li>in physics and understand the things I wish I understood (e.g., the statistical mechanical implications of various modifications to the Lagrangian)</li>
<li>in chemistry and also understand things I wish I understood (e.g., potentials of mean force, free energy of reactions, quantum chemical aspects)</li>
<li>in biology, and understand ??? (I suspect the bio people are using black-box code and don&#8217;t really understand or care about the nitty gritty)</li>
</ol>
<p>I went to graduate school so I wouldn&#8217;t have to get silicosis in some <a href="http://gpi.org/glassresources/education/manufacturing/section-32-batch-house.html">batch house</a> or <a href="http://mpresya.com/fusetech/ceramic-welding/">rebuild furnaces</a> for a living, but I&#8217;m just not seeing where to go from here.  I could slide into a vanilla post doc position and spend the next 2-6 years of my life bouncing around between short-term appointments, pumping out papers about obscure scientific problems about which nobody cares, and floating around technical conferences I hate attending.  That&#8217;s a miserable existence, but it seems like it&#8217;s the one for which I am most qualified.</p>
<p>Contrary to my thoughts going into this business, professional science isn&#8217;t all that great. There are a few superstar scientists who make clear and evident breakthroughs, and that&#8217;s really exciting.  But the majority of scientific progress is in painfully slow, small baby steps.  Publications address some tiny facet of some tiny problems that only a tiny group of other scientists care about, and even then, they rarely stand on their own.  Nobody will believe it unless there are enough of these tiny findings that don&#8217;t contradict each other that pile up.</p>
<p>And when that happens?</p>
<p>There&#8217;s still only a dozen people on the planet who care.</p>
<p>There&#8217;s no real rewarding fulfillment that comes with publishing these obscure results.  Even to the layperson, it&#8217;s not like curing cancer.  Nobody really benefits from most of the science that gets published today.  As I often tell people, being a janitor would be more fulfilling to me.  At least in that case, I&#8217;d be able to go home at night knowing that I made the toilets cleaner than they were when I started the day.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/silicainsilico.wordpress.com/90/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/silicainsilico.wordpress.com/90/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=90&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://silicainsilico.wordpress.com/2012/04/12/job-qualifications-and-ph-d-prospects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/9aa4dd4848a91bbb9e1857d5063cadcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vitreousilica</media:title>
		</media:content>
	</item>
		<item>
		<title>Revisiting Perl and Python&#8217;s Speed</title>
		<link>http://silicainsilico.wordpress.com/2012/04/02/revisiting-perl-and-pythons-speed/</link>
		<comments>http://silicainsilico.wordpress.com/2012/04/02/revisiting-perl-and-pythons-speed/#comments</comments>
		<pubDate>Mon, 02 Apr 2012 21:59:56 +0000</pubDate>
		<dc:creator>vitreousilica</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://silicainsilico.wordpress.com/?p=73</guid>
		<description><![CDATA[I was really surprised to see the discussion that was generated as the result of my previous post comparing the speed of Python and Perl.  Many people much wiser than me posted a lot of valuable comments and suggestions, and &#8230; <a href="http://silicainsilico.wordpress.com/2012/04/02/revisiting-perl-and-pythons-speed/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=73&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I was really surprised to see the discussion that was generated as the result of my previous post <a title="Switching from Perl to Python: Speed" href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/">comparing the speed of Python and Perl</a>.  Many people much wiser than me posted a lot of valuable comments and suggestions, and two people were kind enough to post total rewrites of my routines which (to nobody&#8217;s surprise) were much faster than the codes I wrote.</p>
<p>A few people (both here on the blog and through other discussion) raised legitimate points:</p>
<ol>
<li>My Python code was recompiling the regex every loop iteration because I was confused by how regex compilation and regex match objects work.  Fixing this problem alone increased speed by 10%-25%.</li>
<li>The timings I posted were sub-second and someone suggested that startup overhead may have been hurting Python.  To address this, I used a more &#8220;real-life&#8221; input file that was 3750 MB rather than the 8.588 MB input file I used earlier.</li>
<li>The style of Perl I was using was archaic, and the style of Python I was using wasn&#8217;t terribly Pythonic.  I live in a programming bubble; I learned both of these languages from their respective O&#8217;Reilly books and that&#8217;s it.  I don&#8217;t know anyone who knows either Perl or Python in real life, and I have never seen anyone else&#8217;s code in either language.  But as it turns out, poorly written Perl and poorly written Python follow the same trends as well-written Perl and Python (see below).</li>
</ol>
<p>So as to be a little more scientific about this (since I am a scientist and all), here are my starting parameters:</p>
<ul>
<li>Software
<ul>
<li>Ubuntu Server 10.04 LTS</li>
<li>Python 2.6.5 provided by the distribution</li>
<li>Perl 5.10.1 provided by the distribution</li>
<li>data resides on an ext4 lvm</li>
</ul>
</li>
<li>Hardware
<ul>
<li>HP DL360 G7</li>
<li>2x Xeon X5672, 3200 MHz</li>
<li>24GB DDR3 RAM</li>
<li>data resides on 6Gbit SAS RAID5</li>
</ul>
</li>
<li>Codes
<ul>
<li>&#8220;Old Perl&#8221; code is the code shown in <a title="Switching from Perl to Python: Speed" href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/">my previous post</a>.</li>
<li>&#8220;Old Python&#8221; code is also shown in <a title="Switching from Perl to Python: Speed" href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/">my previous post</a>.</li>
<li>&#8220;New Perl&#8221; code is the code <a href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/#comment-12">written by gnustavo</a>.</li>
<li>&#8220;New Python&#8221; code is <a href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/#comment-15">the code</a> written by <a href="http://gravatar.com/davispj">Paul Davis</a>.</li>
</ul>
</li>
</ul>
<p><span style="text-decoration:underline;"><strong>Methodology</strong></span>: I ran each code on the same 3750 MB input file five times in serial succession.  Each execution was timed using the `time` builtin  provided by the bash 4.1.5(1) included with Ubuntu 10.04.  stdout was redirected straight to /dev/null.</p>
<table style="font-size:smaller;vertical-align:middle;text-align:center;">
<tbody>
<tr style="font-weight:bold;">
<td>Trial</td>
<td>Walltime</td>
<td>Trial 1</td>
<td>Trial 2</td>
<td>Trial 3</td>
<td>Trial 4</td>
<td>Trial 5</td>
</tr>
<tr>
<td>Old Python</td>
<td>309.032</td>
<td>310.971</td>
<td>308.228</td>
<td>311.331</td>
<td>307.170</td>
<td>307.461</td>
</tr>
<tr>
<td>New Python</td>
<td>176.880</td>
<td>178.099</td>
<td>174.742</td>
<td>175.463</td>
<td>178.235</td>
<td>177.863</td>
</tr>
<tr>
<td>Old Perl</td>
<td>167.051</td>
<td>166.916</td>
<td>165.911</td>
<td>167.361</td>
<td>168.735</td>
<td>166.333</td>
</tr>
<tr>
<td>New Perl</td>
<td>126.860</td>
<td>125.913</td>
<td>124.709</td>
<td>130.125</td>
<td>127.809</td>
<td>125.746</td>
</tr>
</tbody>
</table>
<p>So even cleaner code runs over 40% faster in Perl than Python, which is not far off from the 50% slowdown I noted with my two crumbier versions of the code.  Furthermore, it seems easier for a relative novice like myself to write inefficient Python code over Perl code.  Of course, it&#8217;s also easier to write Perl code that doesn&#8217;t do what you expect, and trying to understand someone else&#8217;s code is a crapshoot.</p>
<p>Judging by what others have told me and some comments have pointed out though, Python just isn&#8217;t optimized for &#8220;practical extraction and reporting.&#8221;  Maybe someday I&#8217;ll find a use for Python in my work.</p>
<p>In case the links to the codes I used ever go bad, here they are on pastebin:</p>
<ul>
<li><a href="http://pastebin.com/Kuek6Tyy">New Perl Code</a></li>
<li><a href="http://pastebin.com/VAZWGU3t">Old Perl Code</a></li>
<li><a href="http://pastebin.com/v1r8NMXJ">New Python Code</a></li>
<li><a href="http://pastebin.com/0BwqDnib">Old Python Code</a></li>
</ul>
<p>I&#8217;d post the input files I used, but I don&#8217;t have anywhere I can anonymously host 3.7 GB (or even 8 MB) files.  If you&#8217;re interested in the input data, let me know and I can send a private link.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/silicainsilico.wordpress.com/73/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/silicainsilico.wordpress.com/73/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=73&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://silicainsilico.wordpress.com/2012/04/02/revisiting-perl-and-pythons-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/9aa4dd4848a91bbb9e1857d5063cadcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vitreousilica</media:title>
		</media:content>
	</item>
		<item>
		<title>Switching from Perl to Python: Speed</title>
		<link>http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/</link>
		<comments>http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 02:43:15 +0000</pubDate>
		<dc:creator>vitreousilica</dc:creator>
				<category><![CDATA[Computations]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://silicainsilico.wordpress.com/?p=64</guid>
		<description><![CDATA[The job listings in scientific computing these days seem to show a mild preference for applicants with backgrounds in Python over Perl. It has high-profile (or just highly visible?) packages like NumPy and Python&#8217;s MPI bindings for scientific computing, and &#8230; <a href="http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=64&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The job listings in scientific computing these days seem to show a mild preference for applicants with backgrounds in Python over Perl. It has high-profile (or just highly visible?) packages like NumPy and Python&#8217;s MPI bindings for scientific computing, and some molecular dynamics packages (e.g., LAMMPS) include analysis routines written in Python. Although I&#8217;ve invested a few years into Perl, I&#8217;ve decided to not pigeonhole myself and start picking up Python. After all, Perl is unintelligible after it&#8217;s been written, and it&#8217;s sometimes frustrating to deal with its odd quirks.</p>
<p>To this end, I reimplemented one of my most-used Perl analysis routines in Python.  Here is my Perl version, written back in 2009:</p>
<pre>#!/usr/bin/perl

@show = qw/ Siloxane SiO4 Si3O SiO3 SiO2 SiO1 NBO FreeOH H2O H3O SiOH SiOH2 Si2OH/;

printf("\n%-8.8s ", "ird");
foreach $specie ( @show )
{
  printf("%8.8s ", $specie);
}
print "\n";

$current = 0;
$isave = 0;
while ( $line = &lt;&gt; )
{
  chomp($line);
  $line =~ s/^\s+//g;
  @arg = split(/\s+/, $line);
  next unless $line =~ m/^\d+\s+[\d\w]+\s+\d+\s+[\w\.]+\s+[\w\.]+\s+[\w\.]+\s*$/ ;
  if ( $current == 0 )
  {
    $current = $arg[0];
    $isave = $current;
  }
  if ( $arg[0] != $current )
  {
    &amp;printargs();
    $current = $arg[0];
    $isave++;
  }
  $type{$arg[1]}++;
}
&amp;printargs();

sub printargs( )
{
  printf("%-8s ", $isave);
  foreach $specie ( @show )
  {
    printf("%8d ", $type{$specie});
  }
  print "\n";
  foreach $i ( keys(%type) )
  {
    $type{$i} = 0;
  }
}</pre>
<p>And here is the Python version I cooked up today:</p>
<pre>
#!/usr/bin/env python2

import fileinput
import re

show = [ "Siloxane", "SiO4", "Si3O", "SiO3", \
         "SiO2", "SiO1", "NBO", "FreeOH", \
         "H2O", "H3O", "SiOH", "SiOH2", "Si2OH" ]

def printargs( counts, isave ):
  print "%-8s" % isave,
  for s in show:
    print "%8d" % counts[s],
    counts[s] = 0
  print "\n",

print "%-8s" % "ird",
counts = {};
for s in show:
  counts[s] = 0
  print "%8s" % s,
print "\n",

isave = 0;
current = 0;

RE_LINE = \
  re.compile(r'\s*(\d+)\s+([\d\w]+)\s+\d+\s+[\w\.]+\s+[\w\.]+\s+[\w\.]+\s*$')

# method #1:
# for line in fileinput.input():

# method #2:
# for line in file('coord.out'):

# method #3:
contents = file('coord.out').readlines()
for line in contents:
  match = re.match(RE_LINE, line)
  if not match: continue

  specie = match.group(2)
  icur = int(match.group(1))

  if current == 0:
    current = icur
    isave = current
  elif current != icur:
    printargs(counts, isave)
    current = icur
    isave += 1

  if show.count(specie) &gt; 0:
    counts[specie] += 1;

printargs(counts,isave)
</pre>
<p>In the Python version, there are several ways to tear through a file and I tried all three.  Method #1 is closest to the Perl functionality, where I can specify multiple input files on the command line and have all of them parsed sequentially.  Method #2 is the method that the Python documentation seems to advocate the most.  Method #3 loads the whole file contents into memory and works from there.</p>
<p>Unfortunately, in all three cases, Python seems to be slower than Perl.  Average execution times for a typical input file are:</p>
<p>Python Method #1: 0.794 seconds<br />
Python Method #2: 0.692 seconds<br />
Python Method #3: 0.686 seconds<br />
Perl: 0.469 seconds</p>
<p>Maybe there&#8217;s something I&#8217;m missing in the Python version, but the Perl version isn&#8217;t exactly a shining example of simplicity in itself.  What gives here?  For a language that&#8217;s being venerated in the scientific computing world, in the case of basic text parsing of large files, it isn&#8217;t shining.  At best, it&#8217;s almost 50% slower than Perl.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/silicainsilico.wordpress.com/64/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/silicainsilico.wordpress.com/64/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=64&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://silicainsilico.wordpress.com/2012/03/26/switching-from-perl-to-python-speed/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/9aa4dd4848a91bbb9e1857d5063cadcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vitreousilica</media:title>
		</media:content>
	</item>
		<item>
		<title>The Sad State of our Storage Situation</title>
		<link>http://silicainsilico.wordpress.com/2012/03/20/the-sad-state-of-our-storage-situation/</link>
		<comments>http://silicainsilico.wordpress.com/2012/03/20/the-sad-state-of-our-storage-situation/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 04:45:18 +0000</pubDate>
		<dc:creator>vitreousilica</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[frustration]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://silicainsilico.wordpress.com/?p=59</guid>
		<description><![CDATA[Despite being in the field of technical computing for over thirty years now, the level of technological sophistication in my research group has been stagnant for somewhere around twenty years.  Take for example our user authentication: We use NIS for &#8230; <a href="http://silicainsilico.wordpress.com/2012/03/20/the-sad-state-of-our-storage-situation/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=59&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Despite being in the field of technical computing for over thirty years now, the level of technological sophistication in my research group has been stagnant for somewhere around twenty years.  Take for example our user authentication:</p>
<ol>
<li>We use NIS for user authentication.  Period.  No Kerberos.</li>
<li>Our NIS server is an SGI Indigo2 that is over fifteen years old now&#8230;</li>
<li>&#8230;and fifteen years ago, single DES was considered good enough for password hashes</li>
</ol>
<p>So our entire password database and all password hashes are fully exposed to the network, and the hashes are in an extremely insecure format.  Oh, and did I mention that our sole NIS server is running on a fifteen year old machine that is still using its factory-installed hard drive?  One of these days it&#8217;s going to blow out, and we don&#8217;t have any sort of drop-in replacement or backups for when that happens.</p>
<p>In fact, the matter of not having backups has been a major issue for as long as I&#8217;ve worked here.  When I started here as an undergraduate researcher, the entire lab&#8217;s backup strategy was to tar up directories and sftp them to a 250GB external USB drive connected to our already-old Power Mac G4.  When that filled up, we started just backing up data to whatever disks had free space.  One of the researchers started copying his data into the /usr partition on the compute nodes of our cluster since they had around 30GB free per node.  Another copied backups to various workstations that weren&#8217;t being used at the time.  And the third full-time researcher simply didn&#8217;t back up his data at all.  The cluster had automatic tape backup, after all, so why waste the effort?</p>
<p>On the day of my graduation, my department threw a luncheon for the new graduates where faculty could meet the parents of the graduating students.  My research advisor (under whom I am now finishing my Ph.D.) introduced himself to my parents, said a number of congratulatory and flattering things, and finished by turning to me and saying &#8220;Oh, and the cluster went down yesterday.  All the data is gone.  When are you going to be back in the lab?&#8221;</p>
<p>The following Monday I was back in the lab, and a lot of data was lost.  The cluster did have a tape drive with amanda installed, but nobody knew if the tape backups were ever actually running.  Nobody knew how to examine the contents of the tape, and nobody had rotated tapes recently.  In fact, although the tape drive was sold with two DLT tapes, the second one was never even unwrapped, much less rotated in.  I&#8217;d be pretty confident that even if amanda was doing regular tape backups, the tape had more writes on it than was safe for a DLT cartridge.</p>
<p>This story isn&#8217;t particularly interesting; the internet is full of similar anecdotal backup horror stories.  But it really sucks when it happens to you, and after returning to my group some time later to do graduate work, I took it upon myself to establish data redundancy and automated backups to make sure I never lost my data again.</p>
<p>Unfortunately, the technological sophistication of my group never went beyond buying an external USB hard drive, plugging into a Mac, and letting OS X magically set it up so that it can be written to.  And for some reason, purchasing decisions were continually made by those with perhaps the least qualifications to be making them.  The end result was our entire storage infrastructure being plugged into the USB ports of a Power Mac G5.</p>
<p>Our situation remained this way for some years despite the fact that the USB to SATA bridges used in external Seagate disks seem to fail under high throughput, and OS X does not handle failed drives gracefully at all.  I finally got fed up with the constant outages and failures, voided the warranties on our bigger USB disks, ripped them out of their enclosures, and installed them properly into whatever workstations had the drive cage space and SATA channels to support them.</p>
<p>The end result is a bit of a mess:</p>
<p><a href="http://silicainsilico.files.wordpress.com/2012/03/anonymousbackupmap.png"><img src="http://silicainsilico.files.wordpress.com/2012/03/anonymousbackupmap.png?w=600&#038;h=450" alt="Backup and storage layout" width="600" height="450" /></a></p>
<p>Some systems are automatically backed up, some are not.  Some have RAID1, others do not.  And none of the backup disks have any redundancy, so if one goes, the backups on it are gone.  It would please me to no end to replace this mess with a single storage solution; even something simple like a dozen terabytes of NAS would be a huge improvement over the spiderweb of small disks we&#8217;re currently using.</p>
<p>Unfortunately, the cost of a semi-serious storage solution (on the order of a few thousand dollars) is hard to sell on my boss.  We have backup disks, they can store data, and nobody&#8217;s lost anything important since that cluster failed many years ago.  Something must be going right, so why spend the money on storage when we can burn it on more inkjet printers to replace those whose cartridges have gone empty, or to hire more undergraduates who aren&#8217;t qualified to touch a UNIX workstation?</p>
<p>As backup space becomes a little tighter, I am tempted to halt the automated backups of my coworkers&#8217; data and just automatically back up my data.  After all, they have all been told to do their own backups, and I&#8217;ve told them not to trust that the automatic backups are actually working.  Yet none of them have done a manual backup in at least a year, and nobody but me has been checking to see that the automated backup system is even working.</p>
<p>I struggle to convince my boss to spend more on storage to accommodate my backup needs, so maybe I should just use what I&#8217;ve got available to me and let the others worry about their own data.  After all, it isn&#8217;t my job to keep their data backed up.  I am not a system administrator; I don&#8217;t even have administrator privileges on any of the clusters I am automatically backing up.  I just don&#8217;t want to lose any more of my data.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/silicainsilico.wordpress.com/59/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/silicainsilico.wordpress.com/59/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=59&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://silicainsilico.wordpress.com/2012/03/20/the-sad-state-of-our-storage-situation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/9aa4dd4848a91bbb9e1857d5063cadcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vitreousilica</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/anonymousbackupmap.png?w=950" medium="image">
			<media:title type="html">Backup and storage layout</media:title>
		</media:content>
	</item>
		<item>
		<title>Wolf&#8217;s approximation of Madelung potentials</title>
		<link>http://silicainsilico.wordpress.com/2012/03/15/wolfs-approximation-of-madelung-potentials/</link>
		<comments>http://silicainsilico.wordpress.com/2012/03/15/wolfs-approximation-of-madelung-potentials/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 16:30:02 +0000</pubDate>
		<dc:creator>vitreousilica</dc:creator>
				<category><![CDATA[Science]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[frustration]]></category>
		<category><![CDATA[silica]]></category>
		<category><![CDATA[water]]></category>

		<guid isPermaLink="false">http://silicainsilico.wordpress.com/2012/03/15/wolfs-approximation-of-madelung-potentials/</guid>
		<description><![CDATA[Wolf (Wolf, Keblinksi, Phillpot, and Eggebrecht, J. Chem. Phys. 110 (1999) 8254) came up with a very clever way to approximate the Madelung potential in infinite solids that is much less expensive to calculate than the traditional reciprocal-space-based Ewald methods, &#8230; <a href="http://silicainsilico.wordpress.com/2012/03/15/wolfs-approximation-of-madelung-potentials/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=29&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Wolf (<a href="http://dx.doi.org/10.1063/1.478738">Wolf, Keblinksi, Phillpot, and Eggebrecht, J. Chem. Phys. <strong>110</strong> (1999) 8254</a>) came up with a very clever way to approximate the Madelung potential in infinite solids that is much less expensive to calculate than the traditional reciprocal-space-based <a href="http://dx.doi.org/10.1002/andp.19213690304">Ewald methods</a>, and in his seminal paper, Wolf showed that his approach works wonderfully for both crystalline and amorphous solids like NaCl and MgO.</p>
<p>Implementing this so-called Wolf sum isn&#8217;t hard; there are two parameters to devise and picking the right one is quite straightforward:</p>
<div class="wp-caption aligncenter" style="width: 618px"><a href="http://silicainsilico.files.wordpress.com/2012/03/wolf-halite.png"><img class=" wp-image " src="http://silicainsilico.files.wordpress.com/2012/03/wolf-halite.png?w=608&#038;h=441" alt="Wolf-approximated Madelung potential for halite" width="608" height="441" /></a><p class="wp-caption-text">Wolf-approximated Madelung potential for halite as a function of the two empirical parameters rc and beta (=1/alpha)</p></div>
<p>In the case of halite, if you want a cutoff of 10 Å, it looks like β = 3.46 Å is a good choice; the Madelung potential appears fully converged, and unlike the β=2.46Å case, there is no systematic error due to overdamping. Life is good, right?</p>
<p>As it turns out, applying the Wolf summation method to slightly more complicated crystals isn&#8217;t as nice. Take, for example, alumina (Al<sub>2</sub>O<sub>3</sub>):</p>
<div class="wp-caption aligncenter" style="width: 618px"><a href="http://silicainsilico.files.wordpress.com/2012/03/wolf-alumina.png"><img class="wp-image " src="http://silicainsilico.files.wordpress.com/2012/03/wolf-alumina.png?w=608&#038;h=441" alt="Madelung potential for alumina" width="608" height="441" /></a><p class="wp-caption-text">Wolf-approximated Madelung potential for alumina.</p></div>
<p>Suddenly the Madelung potential doesn&#8217;t oscillate nicely around the true value; rather, the converged value decreases monotonically with increasing damping. This offers no indication of what the true converged Madelung energy for this crystal is. What about other relevant materials that aren&#8217;t a simple 1:1 stoichiometry?</p>
<div class="wp-caption aligncenter" style="width: 618px"><a href="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-qq.png"><img class="wp-image " src="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-qq.png?w=608&#038;h=441" alt="Madelung potential for water" width="608" height="441" /></a><p class="wp-caption-text">Madelung potential for water assuming partial charges</p></div>
<p>Water looks a lot like alumina. The Wolf sum isn&#8217;t working very well here.</p>
<div class="wp-caption aligncenter" style="width: 618px"><a href="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-gg.png"><img class="wp-image " src="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-gg.png?w=608&#038;h=441" alt="Madelung potential for water" width="608" height="441" /></a><p class="wp-caption-text">Madelung potential for water assuming partial charges with diffuse character</p></div>
<p>Using a more realistic (but still empirical) treatment of the Coulombic nature of water makes things worse.</p>
<div class="wp-caption aligncenter" style="width: 618px"><a href="http://silicainsilico.files.wordpress.com/2012/03/wolf-silica.png"><img class="wp-image " src="http://silicainsilico.files.wordpress.com/2012/03/wolf-silica.png?w=608&#038;h=441" alt="Madelung potential for amorphous silica" width="608" height="441" /></a><p class="wp-caption-text">Madelung potential for amorphous silica assuming partial charges with diffuse character</p></div>
<p>&#8230;and amorphous silica also doesn&#8217;t work.</p>
<p>So what&#8217;s going on here? This method seems scientifically sound, and I&#8217;m reasonably sure my implementation of it is correct since I can match the results of other codes, but the only systems with which it seems to work reliably are electronically very simple.</p>
<p>What frustrates me about this is that my work for the last five years <em>has</em> been using this Wolf method for both water and silica. The parameters were published before I started, and I had assumed that they were derived using some sort of sensible procedure. Unfortunately, I can&#8217;t figure out what that method was because re-parameterizing the Wolf method myself has revealed the process to be nothing but fragile and murky.</p>
<p>I can&#8217;t say that this is unexpected given how my research has almost invariably panned out for the last five years, but it is frustrating nonetheless. Consequentially, I will probably spend the next few days fighting code and methods rather than doing real science that can contribute to my dissertation. But such is the nature of graduate work.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/silicainsilico.wordpress.com/29/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/silicainsilico.wordpress.com/29/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=silicainsilico.wordpress.com&#038;blog=33871883&#038;post=29&#038;subd=silicainsilico&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://silicainsilico.wordpress.com/2012/03/15/wolfs-approximation-of-madelung-potentials/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/9aa4dd4848a91bbb9e1857d5063cadcf?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vitreousilica</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/wolf-halite.png?w=1014" medium="image">
			<media:title type="html">Wolf-approximated Madelung potential for halite</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/wolf-alumina.png?w=1014" medium="image">
			<media:title type="html">Madelung potential for alumina</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-qq.png?w=1014" medium="image">
			<media:title type="html">Madelung potential for water</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/wolf-water-gg.png?w=1014" medium="image">
			<media:title type="html">Madelung potential for water</media:title>
		</media:content>

		<media:content url="http://silicainsilico.files.wordpress.com/2012/03/wolf-silica.png?w=1014" medium="image">
			<media:title type="html">Madelung potential for amorphous silica</media:title>
		</media:content>
	</item>
	</channel>
</rss>
