Matt Platts

Web and app development since 1998 using
HTML5, CSS3 (inc Sass, Less), Javascript, Jquery, Mootols, Node, PHP, Perl, Linux, Apache

https://www.linkedin.com/in/mattplatts | https://github.com/matt-platts



Perl vs PHP - my take on the 'debate'

Actually, there is no debate - or at least very little. Or perhaps, not enough.

Once upon a time there was Unix/Linux OS, with it's collection of tools such as grep, sed and awk. And then along came Perl, merging all of this functionality with a stunning regular expression engine, and everybody loved it. Perl compatible regular expressions (PCRE) became a thing in it's own right, outdoing the POSIX standard, and even the p in PHP's 'preg-replace' function stood for perl (perl compatible regular expression replace). As the computer world grew and grew so did perl, and when the internet was invented the CGI module was written for perl to make writing cgi scripts easier. But it wasn't enough for some people, who decided to reinvent the wheel. Was perl so bad it needed a rewrite?

PHP appeared in the late '90s and had gained some popularity by 1998 when it hit version 3, which is when I first used it. It was quick and easy to learn. But, coming from a background of using a well developed tried and tested language, it was horrible, immature, and a bit like playing with a toy language.

My initial finding was that they'd basically rewritten Perl, and very badly, it was like a subset of perl with the real power non-existent. For what reason I couldn't work out, but darn it was quick and easy to get up and running, and it didn't really matter if you could program or not. By renaming your .html file to .php, you could suddenly get to pre-process the page. You were invited to open <php> tags and start writing your page on the fly, mixing back end logic with front end all the way. It was another back end language in a world of back end languages - Perl, ColdFusion, JSP, ASP etc. And it was attractive to front end developers as all the 'difficult' configuration stuff that you needed in the back end wasn't there. It was easily installed on your web server and you were off. This little thing, the ease of getting into it was, IMHO, it had such a big uptake. It felt powerful, like suddenly all the back end wizadry was opened up and at your fingertips. The problem was, it really wasn't, and all that lay within was not gold.

What made perl so difficult then? Little odities from a pre-online world. Files couldn't be transferred via FTP in binary mode - they didn't compile, and you wouldn't get a message to tell you why they didn't work or what you had done wrong - you just had to know this. All you saw in the browser was "500 - Internal server error". If you were on a decent server you'd know where the error logs were and how to read them, this wasn't always the case in the late '90s and I wasn't linux profient at that time. Also, files needed the permissions set on them correctly chmod 755 to allow world execute permissions, and additionally they must also be owned by a valid user/group. This tripped me up countless times in the beginning, sometimes wasting hours to try and work out why things weren't working correctly. You also needed to specify the path to the perl interpreter at the top of the file, leaving me wondering why my code ran fine on one server and not on another. Sheesh.

Other than little things like this however, I found perl to be a great language, and it took learning something else for me to realise just how great it was. In comparison, I thought PHP was horriffic, and indeed I still do.

Why but why did a well establised functioning language get overtaken in the web development world by essentially a cheap and lacking imitation?

Probably the ease of entry to the language. However, at risk of showing some bias(!), I've decided to put my thoughts down in long form, and i'll start by covering some PHP pet hates. This was my personal experience when I started looking at PHP.


PHP is long to write.

In the true spirit of good laziness, I loved perl's short function names - Why in PHP I had to type 'preg_replace' instead of 's' I don't know, it was such a waste of letters. Give me function names like y, m and s any day My PHP code always felt bloated. And of many missing 'shortcuts' I missed the =~ operator badly as I had to write out variables extra times because you couldn't just change them with one operator.

$myVar =~ s/x/y/g;
is so much nicer than
$myVar = preg_replace("/x/y/g",$myVar);
Why should I have to write the variable name twice to operate on it once??!! 18 letters vs 38 - that's over twice the length! What a waste of bytes and typing time! I would also argue that the former is more readable than the latter, despite Perl's reputation of often being unreadable. I mean, you've learned the language so you know what it does, right? Did the beginner programmers really need long explanatory words? Did this somehow make it easier?


Too many functions in a language peppered by inconsistencies

I'm covering two things at once here. Perl achieved more in less than 300 function names in the core with some clever operators, flags and switches, combined with the glorious community contributed CPAN modules, than PHP did with many thousands of functions (and counting). When I originally wrote this, it as around 3,500, now I think close to 7000. Who on earth can remember thousands of functions? And coming to the second part, what makes it worse is when there is no standard formatting for these thousands of names.

To underscore or not to underscore? We can write functions with underscores separating the words of a function name, or we couldn't. PHP has given us strip_tags and stripslashes, str_word_count and strcasecmp, htmlentites and html_entity_decode. There is absolutely no consistency in this language. And how about word order in those function names being verb->object or object->verb inconsistently PHP has given us var_dump and create_function, base64_decode and strip_tags etc. So not only are there thousands of functions to remember, they aren't second guessable, resulting in a lot of referring to the manual for even seasoned developers. I've programmed a great deal of PHP for over ten years, and am still constantly referring to the thing.

On top of this mess, in over 10 years I've never been able to remember the order of vars into some of these functions. is if function($haystack,$needle) or function($needle,$haystack)? It's back to the manual again every time.

Summing up, the language is extremely poorly written, and these are very basic inconsistencies. My head si still full of remembering trivialities rather than that brain power being put to use on some good algorithms.

Useless functions?

Returning to the sheer number of functions again, PHP is positively peppered with functions that in perl you would include a module for if you needed it. Do we really need functions like hebrev (convert logical Hebrew text to visual text) in the core of a language? I wonder what percentage of PHP apps use this function? Surely this is a candidate for a module. Others are more debatable. Perl for example doesn't include a trim function. Shock horror! The perl thinking is that it's so basic, you use a regex in the form of $var =~ s/(^\s+|\s+$)//g. Isn't it better to be thrown into regexes early on, they're things you're going to have to learn one day right? And then there's less functions to remember. Arguably you're not going to forget something as simple as trim, but I expect that with regexes and the substitute operator you'd hack down the number of PHP functions considerably. Perhaps this was PHP's thinking - simple things to start and bury the complexity for later on in the learning curve. Perhaps this is why it's uptake was so vast.

So what about functions such as strcmp, strncmp, strcasecmp, strncasecmp, strnatcmp and strnatcasecmp? (To perl programmers reading this - yes this is for real!!) I'm going to go into these in the security bit below too. But when you already have if ($a==$b) and if ($a===$b) from basic equality testing, do you really need this lot? The case comparisons can easily be mimiced using lc($a)==lc($b) - PHP is peppered with case sensitive and insensitive versions of the same thing in different function names - the concept of an 'i' flag is lost on these people. The strncmp function (comparing substrings of strings) can be mimiced using substr and basic comparison (don't forget, PHP also gives you the equally un-necessary substr_compare too!). Strnatcmp and strnatcasecmp can be rolled into one using lc as well, as can a number of others - strstr and stristr for example. However, these comparison operators are crazy - perls cmp and <=> should suffice! I'm not even going to start on ereg, eregi, ereg_replace and eregi_replace - thankfully these were deprecated in PHP 5.3 and will not appear in PHP 7 - perhaps they are finally starting to learn!

It seriously boggles my mind that there are so many function names where one or two would suffice.

PHP's equally thankfully deprecated 'Magic Quotes' feature is not even deserving of a mention.. or maybe this quote from the manual at php.net/manual/en/security.magicquotes.php is:

The very reason magic quotes are deprecated is that a one-size-fits-all approach to escaping/quoting is wrongheaded and downright dangerous. Different types of content have different special chars and different ways of escaping them, and what works in one tends to have side effects elsewhere. Any sample code, here or anywhere else, that pretends to work like magic quotes --or does a similar conversion for HTML, SQL, or anything else for that matter -- is similarly wrongheaded and similarly dangerous. Magic quotes are not for security. They never have been. It's a convenience thing -- they exist so a PHP noob can fumble along and eventually write some mysql queries that kinda work, without having to learn about escaping/quoting data properly. They prevent a few accidental syntax errors, as is their job. But they won't stop a malicious and semi-knowledgeable attacker from trashing the PHP noob's database. And that poor noob may never even know how or why his database is now gone, because magic quotes (or his spiffy "i'm gonna escape everything" function) gave him a false sense of security. He never had to learn how to really handle untrusted input. Data should be escaped where you need it escaped, and for the domain in which it will be used. (mysql_real_escape_string -- NOT addslashes! -- for MySQL (and that's only unless you have a clue and use prepared statements), htmlentities or htmlspecialchars for HTML, etc.) Anything else is doomed to failure.

This is the sort of rubbish we've had to put up with in PHP for a long, long time now. It also brings us nicely into the security section..


Security

Arguably, one of the worst things about PHP, which was thankfully removed early on, was the variable injection. In short, adding a query string to your url in the manner of www.domain.com/script.php?foo=bar would pre-inject youor script with the variable $foo with a value of bar. Together with the fact that PHP didn't force any strictness on declaring variables upfront, this made it quite a sport to try and guess variable names in people's scripts and set them in the query strings in the hope of breaking their program. http://www.domain.com/script.php?user=1&id=1&userid=1928&userId=18276&user_id=928767&i=6&x=10 etc. etc.

And what did they do to fix this? Stuck them in global variables with really long names instead: HTTP_POST_VARS and HTTP_GET_VARS. Eventually they came up with a shorthand version - $_GET and $_POST. But what is it with PHP's love of using lots of letters? Was it a response to Perl's 'unreadable' status as it was so short?

But still, the fact that they actually considered global variable injection at all sincerely worried me, and several others, contributing to it's status as a 'toy language' in the perl community.

I can go on and on about points like this. The pecuilar 'strcmp' function for example, which returns 0 if two strings match. I think this is meant to be used as part of a sorting function similar to cmp in Perl, but the PHP manual gives an example of a basic equality comparison intead and with no mention of sorting functions.. hmm! W3schools does the same. Double hmm! (As a sideline, w3schools tends to focus on very basic things and lacks real analysis when it is required - again it's focussed on being easy at the expense of thorough when you need it).

Anyway, a well known PHP bug involves sending multiple get variables in with the same variable name. What happens if you compare an expected input string to the actual input? Well, PHP has created an array of the multiple variable names, resulting in a comparison between an string and an array - which throws an error to the error log but the script continues to execute and returns null! Got it yet? This means that if (strcmp($testfor,$input)==0){ // run code here} will execute despite strings no match being made. Now imagine if the input string was a security key for changing a password...

There are loads of little things like this in PHP which make it extremely easy to make hard to spot mistakes.

Oh, and don't forget the several THOUSAND functions that PHP has in it's core... yep you really do need to know all of these inside out!


'Missing' operators

To me, there were heaps of really simple little things just missing. Maybe you wouldn't notice this from a more old school programming background. For example, the 'Or equals' operator: $x ||= $y. This is is an example of what I loved about perl - short ways of doing common simple things - another nice mixture of a comparison and assignment operator - similar to =~. If $x is set (or, technically, 'truthy'), it remains as it is, if not it becomes equal to $y. I learned it and don't find it hard to read, in fact I find it quite obvious. $x =0 for example - a falsy value, followed by 'Or equals $y'. It's easy to see what the expression is, isn't it? In PHP you'd need to do something like isset($x)? $x : $y. I fail to see why one is more readable than the other. If you weren't a programmer, both would look equally bizarre I'm sure.

Finally on version 7, PHP has added the 'Null Coalesce' operator: $foo = $foo ?? 'bar'; - that only took 15 years! And, it's still long as you have to put the variable in twice! Is the mixture of test and assignment in one function too much for people? Does it contribute to undeadability?

Never quite sure what this one is called, I've heard it referred to as the boolean quasi-operator however: !! - which returns 1 (truth value) or an empty string (false value). Eg. print !!$x (Just noticed, PHP now has this).

$x = $y unless $z; - a short way of writing if ($x){ $y=$x }else{ $y=$z}; - nb: both languages have the ternary $x = $x? $y : $z.


Default variables

"These are simply wonderful" - perl programmers.
"Unreadable line noise" - php programmers.

Let's loop through an array in PHP:

foreach ($array as $item){
	print $item;
} 
and in perl..
foreach (@array){
	print $_;
}
You can actually write the perl code in the same style - it would go 'foreach $item (@array)'. But in the example above, $_ is the default scalar variable which perl inserts when it is left out. There is also a default array (@_) and a default hash (for now, a hash is an associative array) (%_). There are several places in perl where they just appear, and this is one of them.
And again in perl..
foreach (@array){
	print;
}
The default variable as actually 'so default' that you don't even need to write it - print acts on the default if there is nothing else to print!

In perl, all values are passed into a subroutine (function) as a list. You can access these from @_ (no need to declare the incoming variable names early on). The perl version of array_shift is simply shift (short!). So to print the first argument to a subroutine you can simply write print shift; (shifting the first value from the array puts it into $_ which is then accessed by print).

You don't have to use this shorthand of course - you can give everything names if you like in the name of readability. But as a perl programmer it becomes very obvious immediately what is intended, and saves you bytes and keystokes everywhere.


The 'difficult' bit of Perl

I covered the getting up and running issues early on. Perhaps the other difficult bit about perl was understanding it's underlying data types, knowning when to use references, and when to de-reference, and the odd way the syntax changes accessing arrays and hashes by reference instead of directly sometimes. PHP now has references. People complain about them! It confused a lot of people up front, but once learned and used it's really very simple. I cover all the strange 'barriers to entry' in my quick start guide here


Perl community response

It's not just me. Several articles from known names in the perl community surfaced around the turn of the millennium, calling out PHP's shortcomings. http://tnx.nl/php.html is possibly the most famous article on this subject, which was more recently followed up with the PHP5 version at http://tnx.nl/php5.html.


Unreadable Code

Perl has often been called a 'write-only' language (generally by non-perl programmers). True you can write unreadable code in Perl, you can in any language should you choose. Good programmers don't write unreadable code, they write well laid out maintainable code with comments as required that can be read and underdstood quickly and easily by other programmers. But they do write code that non-programmers can't read, and may be scared of.

Did PHP somehow achieve a win here for non-programmers? Well, PHP, with it's horrible language structure and long names, did, if you spaced your code out well and indented properly, make it sort of more readable to non-programmers at a first glance, and less a collection of sigils and characters. This very likely had something else to do with the uptake of PHP. Sadly though, it meant that the PHP programmer was doomed to write more bytes all the time forever.

Perl isn't difficult to learn, and once you've learned it it's not unreadable and it's fast to write. I've often wondered what the issue is here.

Frameworks

I'm not even going to start on taking a hugely bloated language and writing hugely bloated frameworks in it which you have to learn alongside the language itself... some of these frameworks are good, most are very time consuming to get your head around. I thnk the 'quick and easy' bit of PHP has come full circle and eaten it's own tail, and it's time for something new..

Some kind of conclusion?

As the internet became very big very quickly, there was a big shortage of programmers. People had to be trained. They had to make it easy for people. Somehow, I think PHP made it very easy to start, and in doing that it made people who didn't think they could be programmers realise that they oculd be programmers. But, you still had to learn plenty, but later on, in a cross that bridge when you come to it kind of way. Well, if that's what it took, and it worked, then yeah, as long as you don't mind remembering exactly how thousands of functions work and have your security head on, um.. it's really not so bad. It just... I don't know.. could have been a whole lot better? NB: PHP is known as the most insecure language with the most vulnerabilities of all the web languaes.

I guess that security was pretty low down the agenda for a lot of people. Check the massive increase in penetration testing companies springing up. Guess what industry I've been working in for the last year?

Personally, I see a bit of a VHS/Betamax thing going on here, where the inferior format gained the credible market share. It actually does show how marketing and perception can change things drastically.

My conclusion - don't let ease of use and low barrier to entry dicatate your technology without understanding what's going on behind the scenes. All that glisters is not gold..

Different cultures

It's interesting to see just how different the cultures are between Perl and PHP.

Perl you need to learn a bit more first to get to grips with the language. But once you have I think it makes you a better programmer, with better eye to detail and a better understainding of what you're getting into. PHP is about getting started quickly. I very much feel that PHP is just putting off the learning curve until later however, by which time it is actually harder, and a greater security risk.

Perl, whilst it doesn't have the uptake in CGI any more, is certainly no less popular on the command line, for sysadmin kind of work.

The philosophy of laziness, impatience and hubris still pervades throughout the Perl culture. Most PHP programmers (that I have encountered) just laugh it off as some kind of joke.

PHP won for the web because it was easy to start, but does have a couple of points well in it's favour. Perl's attempt to run as an apache module to speed things up - mod_perl - was pretty horrible. Persistent variables accross different sessions if you didn't package everything was a pain for example, and having to package main with it's own name was a pain. It meant you couldn't take a bunch of previously written scripts and just run them under mod_perl. Little things like that became quite annoying. PHP got rid of all these, in favour of other annoyances. If only there was a nice shorthand version of the language, and the 'missing' perl opertaors really could have made things much easier.

Moving onwards

PHP was designed for the web and not as a command line tool - although you can use it as this too, and a few little things went a long way in generating a massive uptake of users. Specifically these are things like not having to worry about printing headers, paths to interpeters, ascii vs binary upload types, easy way to read in variables, etc.

The latest entry to back end languages - Node - is very cool. The frameworks such as React are also very cool. It's got a nice package manager - npm - a bit like Perl has with cpan. Interestingly, it's focussed around some stuff that Perl could do decades ago like Async and writing servers ;) This does actually feel like a step forward, when to me, PHP never did. Like PHP, Javascript was written for the web, it's great that we can use it everywhere. One language to learn for all tasks is going to be a great benefit.

Many of my arguments against PHP still stand however. Maybe the readability vs long function names just needs to stay, the web still needs more people and a good uptake of people, and if the 'explosion in an ascii factory' look of perl is too much for some people well so be it.

Perl will always have it's place as the swiss army chainsaw of scripting languages.

Now, if only I could find a perl to javascript compiler..

contact me.