Matt Platts

Web and app development since 1998 using
HTML5, CSS3 (inc Sass, Less), Javascript, Jquery, Mootols, Node, PHP, Perl, Linux, Apache

https://www.linkedin.com/in/mattplatts | https://github.com/matt-platts

The (missing) perl quick start guide

All the reasons for all things that make you want to tear your hair out, in one place. Designed for newcomers, and people who thought they could get away without understanding the difference between arrays and lists, and what a reference is.

Table of contents:

  1. The five fundamentals for running CGI scripts under Linux/Apache
  2. How to understand a 500 - Internal Server Error
  3. Quickly understanding the types of variables ($,@,%)
  4. References and Complex data structures (arrays of arrays of hashes of arrays of hashes etc.) - ie all the complicated stuff.

The five fundamentals for running CGI scripts under Linux/Apache

  1. Perl programs must be comprised of ASCII text in order to run. If you have uploaded your script via FTP in binary mode, it will not run, and you will receive an Error 500 - Internal Server Error message in your browser. Perl must be uploaded in ASCII transfer mode. See the FTP settings in your FTP program - you may need to add .cgi to your list of files to transfer in ASCII mode.
  2. Perl programs must have the correct file permissions in order to run. Unlike other languages, your scripts should be both readable and executable. chmod 755 filename normally does the trick. I have known some servers run code set at 777 and others reject it. I don't know if this was to do with the server set up or not.. stick with 755.
  3. Perl programs must have the correct file ownership and be in a correct user group in order to run. Eg. If you have placed a script in a directory as the root user, you may find that this script cannot be executed by the web server, which doesn't have permissions to execute a file owned by the root user. chown username filename will allow you to change the owner of the file, and chgrp groupname filename will change the group. Some flavours of Linux inc. CentOS allow you to run chown username.groupname filename and do both operations in one.
  4. The first line of the file should be the 'shebang' line, which points to the path of the perl interpreter on the server. Normally, this looks like this:
    #!/usr/bin/perl
    Note that there is no semi-colon at the end of this line. The exact location of perl depends on your system. Other places I have seen it are usr/local/bin/perl, and even usr/sbin/perl. You can find where it is by running 'which perl' in the shell.
  5. You need to print the HTTP header. As part of the HTTP protocol, a document has a header and a body. Purpose built server side languages take care of this for you. Perl was not built ground up for the web, so you need to send the header and body. This is actually really simple - the separator between the header and body is two newlines. So simply include this in your file before you print any output: print "Content-type:text/html\n\n";. The "\n" part is a newline, this separates the header from the body. The header is simply "Content-type:text/html" - the most basic header a web browser requires. And yes, if you're going to supply a jpeg in binary format, you'd use a content type of image/jpeg, etc etc.

How to understand a 500 - Internal Server Error

There are several ways to get a more concise error message.


  1. Look in the server log files. Under apache webserver these can normally be found in /var/log/apache2/error_log. If you are on a shared host, the provider should be able to tell you where they are. Eg. I have a server running Plesk for system administration, and by default the logs for each web site are in /var/www/vhosts/sitename.com/statistics/log.
    The best way to read the log files is to run tail -f error_log. If you don't know where to look, get as far down the directory tree as you are allowed and run find . -name error_log.
  2. Run the code on the command line. You can do this in the shell by running perl filename, or if you are in the Vim editor you can type :!perl %. You should see the full error message printed out in the shell. If you don't get an error message on the command line, the issue is likely to do with one of the five fundamentals above.
  3. Use the perl -w flag on the shebang line (#!/usr/bin/perl -w), and put use warnings as the next line under the shebang line. Just in case... it might help!

If you're still getting an Error 500 and don't know why, speak to your hosting provider / systems administrator.




Quickly understanding the types of variables ($,@,%), and looping/retrieving from complex data trees.

Many languages use the $ to signify a variable (eg. PHP). Some use no significator and just a bare word (javascript).
Perl uses the $ to signify a scalar variable, a @ to represent an array and a % to represent a hash (similar to an associative array). These signs are known as 'sigils'.

Setting variables

$myVar = "A scalar Variable";
$myVar = 4; # quotes not required for numbers

@myArray = ("first item", "second item");
@myArray = (1,2,3,4,5); # same thing, illustrating that you don't need to quote numbers.
@myArray = qw(one two three four five); # the very useful qw function means 'quote words' - works for single words only.

%myHash = (item => "value", item2 => "value2");

Getting variables

Rule 1: To access a value of a hash key, use curly braces after the variable name (myHash{item}).
To access an index of an array, use square brackets after the variable name (myArray[indexNo].

Rule 2: The sigil used to get a variable, including an element out of an array or hash, is not the sigil for the array or hash itself, but the type of data you are trying to get out if it. To get a scalar out of a %hash, use $hash{'item'}, and to get a scalar out of an array use $array[arrayIndex]. Do not refer to the container type when accessing it's elements.

print $myHash{"item"} # prints the word 'value'. Use curly braces {} to access individual elements of a hash.
print $myArray[1]; # prints the second item. Use square brackets [] to access the array by index number.

We do NOT use print @myArray[1] as we are not retrieving an array, the item itself is a scalar variable, and it is the item we are referring to with the sigil, NOT the container. However, in more complex data structures, if the second item in the array is itself a HASH we could use %myArray[1]!

This is different to some languages - eg. BASIC - where you would always refer to the array as @myArray, and it's elements as @myArray[arrayIndex]. However, you can sometimes do this in perl without error, print @myArray[3] takes the third item of myArray, loads it into an array (as you specified the @ sign) and prints the array of one item. It looks like you've done it right, However, treating scalars as arrays will only lead to problems further down the road. As we're about to see, things get very confusing very quickly after a mistake such as this.

References, and Complex data structures (arrays of arrays of hashes of arrays of hashes etc.)

There is of course official documentation on this sort of thing, and it's here.

Time for a bit of complexity. What I'm about to cover is a very common area of mistakes, which should be understood from the off.

Perl data structures - all of them - are one dimensional internally. You can actually store three things - a string, a number or a reference. What's actually going on when you have hashes of hashes of arrays of hashes etc, is a lot of references are stored - references to other strings and numbers. And this is why it is so important to use the correct sigils for how you want to retrieve the data back. Loading arrays and hashes into scalars creates references to the underlying structures, which you must then access diffrently.

Thats the technical explanation. Here's a quick table of what you need to know:

\ - The backslash means 'reference to'. Technically, you can't store an array or hash as an element in an array or hash, but you can store a reference to it by preceeding the sigil with a backslash.

[1,2,3] - The angled brackets create a reference to a list, as opposed the list being an actual list.

-> - The arrow operator means follow the reference into the referenced structure.

You cannot have an array of arrays literally as perl does not work like that internally. But you can store references to other arrays and hashes somewhere in a data tree. What you are actually storing is a reference to another array or hash or another data tree, and you can continue to traverse this from where you are, just the syntax changes a little - the arrow operator is required.

The correct way to add hashes and arrays to hashes and arrays, and how to retrieve these values

%hash = (
	one => "value of one", two => "value of two"
);
$hash{'one'}{'two'}{'three'}=123; # overwrite a hash key which did contain a scalar, with a further hash.
$hash{'two'}{'two'}=22;
$hash{'three'}{'two'}=32; # just add keys as you feel - no need to define anything beforehand
$hash{'four'}="four";
$hash{'five'}=\(9,2,3,4,5); # regular brackets denote a list, and we store a reference to it in the hash.
$hash{'six'}=[1,2,3,4,5,6] ; # angled brackets automatically make a reference

@array=(1,2,3,4,5,6,7);
$hash{'seven'}=\@array; # backslash denotes a reference, this time it's a reference to an array we've previously created.

%newHash = ( item1 => "Item one", item2 => "item two", item3 => "item three" );
$hash{'eight'}=\%newHash; # backslash denotes a reference - this time it's to a hash.

%anotherHash = ("anotherOne" => "Another item one", "anotherTwo" => "Another item two");
$hash{'eight'}->{'item4'}=\%anotherHash;

@anotherArray = (1,2,3,4,\%anotherHash,6,7); # store a hash by reference as an item in an array
$hash{'eight'}->{'item5'}=\@anotherArray; # put this entire array into a hash key by reference.

We have now built a fairly complex data structure. Here's how to traverse it, and get these values out..

print $hash{'six'}->[3]; # print the fourth item of the array stored in $hash{'six'};
print $hash{'seven'}->[5]; # print the sixth item of the array stored in $hash{'seven'};
print $hash{'eight'}->{'item2'}; # print the hash key value for item2. 
print $hash{'eight'}->{'item4'}->{'anothertwo'}; # two references 
print $hash{'eight'}->{'item5'}->[4]->{'another'}; # three references including one array in the middle

To test what a reference is a reference to - good for traversing unknown data trees

The ref function - examples below:

print ref($hash{'item'})
print ref($hash{'item'}->{'otherItem'}->[3]->[4]->{'yetanother'})
returns the ref value of the item. Return values are HASH, SCALAR and ARRAY (see official documentation for more..) if it is a reference. If not a reference will return an empty string.

+ a few quick reminders..

The default variables $_ (default string variable) and @_ (default array variable) will automatically be populated in many places if no variable is specified. Many of the examples below show with and without the default variable.

Control structures

Loop through array

foreach $item (@array){
	print $item;
}
foreach (@array);
	print;
}
while (<@array>){
print;
}

Loop through hash keys

foreach $key (keys %hash){
	print $hash{$key};
}
foreach (keys %hash){
	print $hash{$_};
}

Read a text file

NB: < before the filename means open in read mode. > and >> are for writing and appending repectively.

open (FILE, "<myfile.txt" || die ("Read error: $!"); 
while (<FILE>){
	chomp; # remove newline
	print;
}
open (my $fh, "<file.txt" || die ("Read error: $!");
while (<$fh>){
	print;
}

Incorrect common mistakes:

@array=(0,1,2,3,4,(5,6,7)); # no, we haven't made item 6 an array going 5,6,7, $array[5] is simply 5. You've made one big array.
@secondArray=(8,9,10);
push (@array,@secondArray); # We've just made an even bigger array
print @array; # prints 012345678910

Ok, an explanation - (0,1,2,3,4.. etc) is NOT an array. It's a LIST. We've assigned that list to an array using the @ sigil. We could, actually, have assigned that list to something using the $ sigil too, we'll come to that. For now we need to understand that a LIST is immutable, it exists in our code and has no underlying type. And we can assign a list to ANYTHING.

The behaviour of lists needs to be understood. If you assign a list, or some lists, to a variable, you end up with the entire list going into that variable as one list.

our 'push' function made one big array and did not make an array of arrays by pushing one array onto the end of the other as a single element.

print $array; # prints nothing, without error. 
print $array[2]; # prints the scalar value 2
print @array[2]; # prints the scalar value 2 even though you've specified it as an array.

Arrays of Arrays

When creating an array inside an array we use angled brackets for the inner array. the angled brackets mean 'a reference to another array'. Thus we are adding a reference - which *is* a single value, as a single element in the array. This reference refers to the actual array.

@array=(0,1,2,3,4,[5,6,7]); # now, we HAVE made item 6 an array going 5,6,7, $array[5] is an array. 
print $array[5]; # prints the reference - something like ARRAY(0x8ff2c28) - because item 5 is now an array and we've requested it as a scalar.
print $array[5][1]; # prints 6. The second item in the inner array (at position 1 - don't forget we index from 0).
print $array; # prints nothing without error (again);
if ($array){ print "boolean true";} else {print "boolean false";} # just to test, $array didn't print. It returns a boolean false..
print @array; # prints something like 01234ARRAY(0x8f2ac28) - the first values and the last value which is an array 

A common mistake - arrays of arrays using angled brackets[] all the way through.

@array=[0,1,2,3,4,[5,6,7]];

OK, What we've actually done is create an array with only one item, as signified by the outer[]. Item 5 of this array is an array of 3 items. But $array[5] won't work as the outer brackets put the whole thing into an inner array - we need to use $array[0][5] to get to here. It's as if the whole thing was surrounded by () brackets anyway - Perl appears to have assumed this because of the @ sign. Some exmaples may clear this up:

print "\$array[0]" . $array[0] . "\n"; # prints ARRAY(8x8f2cac28) - because the first tiem of the arrays IS an array as we used []
print "\$array[1]" . $array[1] . "\n";; # prints nothing - our array only has one item and it is itself the whole array we just set up.
print $array[0][5][2]; # prints 7 - yes we really are 3 levels deep.
print @array[0][5]; # finally, an error message - we have broken it. This cannot be printed, perl is confused at last - as if that's what we were tryin to do.

The point, is to point out that things can get quite confusing down here if you're not careful.

I'll cover the same sort if thing with hashes, and then explain how to get the values out.

More about Hashes, and Hashes of hashes

%myHash = (
"item1" => "value 1",
"item2" => "value 2",
"item3" => "value 3",
"item4" => "value 4",
);

print %myHash;

The first thing to note is that by printing a hash directly, you get a text printout of the whole hash - keys and values. But, probably in the wrong order. These are not like PHP associative arrays, a hash has no internal order - it isn't an array. A loop won't help either:

foreach $key (keys %myHash){
    print $key . " - " . $myHash{$key} . "\n";
}

There is no order that can be guaranteed. So first up, if you want ordering, use arrays. Hashes are FAST to work with - very fast. This is why ordering is not preserved. Perl modules are available from cPan such as Tie::IxHash but there is a big performance hit.

Creating a hash of hashes requires items to be put in singly:

$hash{'one'}{'two'}{'three'}=123;
$hash{'two'}{'three'}=23;
$hash{'three'}="three";
$hash{'four'}=(9,2,3,4,5); # incorrect - we can't add an array as a hash key value directly
$hash{'five'}=\(9,2,3,4,5); # correct - the backslash turns the list into a reference which is stored in the hash key. 
$hash{'six'}=[1,2,3,4]; # correct - we've used angled brackets to create a reference to array, and have now stored the reference.

or you can take an existing structure and add a reference:

@array=(1,2,3,4,5);
$hash{'seven'}=\@array; # correct - we've referenced an array.

Here' we've taken an existing array, and added the reference to the array as the value of the hash key - this is the correct way to do it.

Dereferencing, or getting the values out of referenced hashes and arrays

Now we've got a sufficiently complicated data structure, we need to read it. When you're up in the top of a data tree and have come accross a reference to an array instead of an actual array, the syntax changes from accessing an array index or hash key to accessing it directly by the reference using the arrow operator (->).

$arrayRef->[arrayIndex];
$hashRef->{'key'};

The code above points to the first item in the array that we've referenced, or to the key of the has via it's reference.

Here's our nice complicated data structure again:

$hash{'one'}{'two'}{'three'}=123;
$hash{'two'}{'two'}=22;
$hash{'three'}{'two'}=32;
$hash{'four'}="four";
$hash{'five'}=\(9,2,3,4,5); # regular brackets denote a list, and we store a reference to it in the hash.
$hash{'six'}=[1,2,3,4,5,6] ; # angled brackets automatically make a reference

@array=(1,2,3,4,5,6,7);
$hash{'seven'}=\@array; # backslash denotes a reference, this time it's a reference to an array we've previously created.

%newHash = ( item1 => "Item one", item2 => "item two", item3 => "item three" );
$hash{'eight'}=\%newHash; # backslash denotes a reference.

The first four values are straight forward, and we can access these simply using $hash{'two'}{'two'}. But what about keys five, six and seven?

print $hash{'six'}->[3]; # print the fourth item of the hash stored in $hash{'six'};
print $hash{'seven'}->[5]; # print the sixth item of the array stored in $hash{'seven'};
print $hash{'eight'}->{'item2'}; # print the hash key value for item2. 

now let's mix it up a bit

%anotherHash = ("another" => "Another item one", "anothertwo" => "Another item two");
$hash{'eight'}->{'item4'}=\%anotherHash; # assign anotherHash to a hash key itself in a hash
@anotherArray = (1,2,3,4,\%anotherHash,6,7); # create a new array and assign anotherHash to part of it as well
$hash{'eight'}->{'item5'}=\@anotherArray; # add the new array to a hash key within the original hash

And get these values out..

print $hash{'eight'}->{'item4'}->{'anothertwo'}; # two references 
print $hash{'eight'}->{'item5'}->[4]->{'another'}; # three references including one array in the middle

Dereferencing using multiple sigils

$$hashRef->{'key'};
@$arrayRef->[1];

The code above will also dereference.


© Matt Platts 2016. contact me.