Quickly understanding the types of variables ($,@,%), and looping/retrieving from complex data trees.
Many languages use the $ to signify a variable (eg. PHP). Some use no significator and just a bare word (javascript).
Perl uses the $ to signify a scalar variable, a @ to represent an array and a % to represent a hash (similar to an associative array). These signs are known as 'sigils'.
Setting variables
$myVar = "A scalar Variable";
$myVar = 4; # quotes not required for numbers
@myArray = ("first item", "second item");
@myArray = (1,2,3,4,5); # same thing, illustrating that you don't need to quote numbers.
@myArray = qw(one two three four five); # the very useful qw function means 'quote words' - works for single words only.
%myHash = (item => "value", item2 => "value2");
Getting variables
Rule 1: To access a value of a hash key, use curly braces after the variable name (myHash{item}).
To access an index of an array, use square brackets after the variable name (myArray[indexNo].
Rule 2: The sigil used to get a variable, including an element out of an array or hash, is not the sigil for the array or hash itself, but the type of data you are trying to get out if it. To get a scalar out of a %hash, use $hash{'item'}, and to get a scalar out of an array use $array[arrayIndex]. Do not refer to the container type when accessing it's elements.
print $myHash{"item"} # prints the word 'value'. Use curly braces {} to access individual elements of a hash.
print $myArray[1]; # prints the second item. Use square brackets [] to access the array by index number.
We do NOT use print @myArray[1] as we are not retrieving an array, the item itself is a scalar variable, and it is the item we are referring to with the sigil, NOT the container. However, in more complex data structures, if the second item in the array is itself a HASH we could use %myArray[1]!
This is different to some languages - eg. BASIC - where you would always refer to the array as @myArray, and it's elements as @myArray[arrayIndex]. However, you can sometimes do this in perl without error, print @myArray[3] takes the third item of myArray, loads it into an array (as you specified the @ sign) and prints the array of one item. It looks like you've done it right, However, treating scalars as arrays will only lead to problems further down the road. As we're about to see, things get very confusing very quickly after a mistake such as this.
References, and Complex data structures (arrays of arrays of hashes of arrays of hashes etc.)
There is of course official documentation on this sort of thing, and it's here.
Time for a bit of complexity. What I'm about to cover is a very common area of mistakes, which should be understood from the off.
Perl data structures - all of them - are one dimensional internally. You can actually store three things - a string, a number or a reference. What's actually going on when you have hashes of hashes of arrays of hashes etc, is a lot of references are stored - references to other strings and numbers. And this is why it is so important to use the correct sigils for how you want to retrieve the data back. Loading arrays and hashes into scalars creates references to the underlying structures, which you must then access diffrently.
Thats the technical explanation. Here's a quick table of what you need to know:
\ - The backslash means 'reference to'. Technically, you can't store an array or hash as an element in an array or hash, but you can store a reference to it by preceeding the sigil with a backslash.
[1,2,3] - The angled brackets create a reference to a list, as opposed the list being an actual list.
-> - The arrow operator means follow the reference into the referenced structure.
You cannot have an array of arrays literally as perl does not work like that internally. But you can store references to other arrays and hashes somewhere in a data tree. What you are actually storing is a reference to another array or hash or another data tree, and you can continue to traverse this from where you are, just the syntax changes a little - the arrow operator is required.
The correct way to add hashes and arrays to hashes and arrays, and how to retrieve these values
%hash = (
one => "value of one", two => "value of two"
);
$hash{'one'}{'two'}{'three'}=123; # overwrite a hash key which did contain a scalar, with a further hash.
$hash{'two'}{'two'}=22;
$hash{'three'}{'two'}=32; # just add keys as you feel - no need to define anything beforehand
$hash{'four'}="four";
$hash{'five'}=\(9,2,3,4,5); # regular brackets denote a list, and we store a reference to it in the hash.
$hash{'six'}=[1,2,3,4,5,6] ; # angled brackets automatically make a reference
@array=(1,2,3,4,5,6,7);
$hash{'seven'}=\@array; # backslash denotes a reference, this time it's a reference to an array we've previously created.
%newHash = ( item1 => "Item one", item2 => "item two", item3 => "item three" );
$hash{'eight'}=\%newHash; # backslash denotes a reference - this time it's to a hash.
%anotherHash = ("anotherOne" => "Another item one", "anotherTwo" => "Another item two");
$hash{'eight'}->{'item4'}=\%anotherHash;
@anotherArray = (1,2,3,4,\%anotherHash,6,7); # store a hash by reference as an item in an array
$hash{'eight'}->{'item5'}=\@anotherArray; # put this entire array into a hash key by reference.
We have now built a fairly complex data structure. Here's how to traverse it, and get these values out..
print $hash{'six'}->[3]; # print the fourth item of the array stored in $hash{'six'};
print $hash{'seven'}->[5]; # print the sixth item of the array stored in $hash{'seven'};
print $hash{'eight'}->{'item2'}; # print the hash key value for item2.
print $hash{'eight'}->{'item4'}->{'anothertwo'}; # two references
print $hash{'eight'}->{'item5'}->[4]->{'another'}; # three references including one array in the middle
To test what a reference is a reference to - good for traversing unknown data trees
The ref function - examples below:
print ref($hash{'item'})
print ref($hash{'item'}->{'otherItem'}->[3]->[4]->{'yetanother'})
returns the ref value of the item. Return values are HASH, SCALAR and ARRAY (see official documentation for more..) if it is a reference. If not a reference will return an empty string.
+ a few quick reminders..
The default variables $_ (default string variable) and @_ (default array variable) will automatically be populated in many places if no variable is specified. Many of the examples below show with and without the default variable.
Control structures
Loop through array
foreach $item (@array){
print $item;
}
foreach (@array);
print;
}
while (<@array>){
print;
}
Loop through hash keys
foreach $key (keys %hash){
print $hash{$key};
}
foreach (keys %hash){
print $hash{$_};
}
Read a text file
NB: < before the filename means open in read mode. > and >> are for writing and appending repectively.
open (FILE, "<myfile.txt" || die ("Read error: $!");
while (<FILE>){
chomp; # remove newline
print;
}
open (my $fh, "<file.txt" || die ("Read error: $!");
while (<$fh>){
print;
}
Incorrect common mistakes:
@array=(0,1,2,3,4,(5,6,7)); # no, we haven't made item 6 an array going 5,6,7, $array[5] is simply 5. You've made one big array.
@secondArray=(8,9,10);
push (@array,@secondArray); # We've just made an even bigger array
print @array; # prints 012345678910
Ok, an explanation - (0,1,2,3,4.. etc) is NOT an array. It's a LIST. We've assigned that list to an array using the @ sigil. We could, actually, have assigned that list to something using the $ sigil too, we'll come to that. For now we need to understand that a LIST is immutable, it exists in our code and has no underlying type. And we can assign a list to ANYTHING.
The behaviour of lists needs to be understood. If you assign a list, or some lists, to a variable, you end up with the entire list going into that variable as one list.
our 'push' function made one big array and did not make an array of arrays by pushing one array onto the end of the other as a single element.
print $array; # prints nothing, without error.
print $array[2]; # prints the scalar value 2
print @array[2]; # prints the scalar value 2 even though you've specified it as an array.
Arrays of Arrays
When creating an array inside an array we use angled brackets for the inner array. the angled brackets mean 'a reference to another array'. Thus we are adding a reference - which *is* a single value, as a single element in the array. This reference refers to the actual array.
@array=(0,1,2,3,4,[5,6,7]); # now, we HAVE made item 6 an array going 5,6,7, $array[5] is an array.
print $array[5]; # prints the reference - something like ARRAY(0x8ff2c28) - because item 5 is now an array and we've requested it as a scalar.
print $array[5][1]; # prints 6. The second item in the inner array (at position 1 - don't forget we index from 0).
print $array; # prints nothing without error (again);
if ($array){ print "boolean true";} else {print "boolean false";} # just to test, $array didn't print. It returns a boolean false..
print @array; # prints something like 01234ARRAY(0x8f2ac28) - the first values and the last value which is an array
A common mistake - arrays of arrays using angled brackets[] all the way through.
@array=[0,1,2,3,4,[5,6,7]];
OK, What we've actually done is create an array with only one item, as signified by the outer[]. Item 5 of this array is an array of 3 items. But $array[5] won't work as the outer brackets put the whole thing into an inner array - we need to use $array[0][5] to get to here. It's as if the whole thing was surrounded by () brackets anyway - Perl appears to have assumed this because of the @ sign. Some exmaples may clear this up:
print "\$array[0]" . $array[0] . "\n"; # prints ARRAY(8x8f2cac28) - because the first tiem of the arrays IS an array as we used []
print "\$array[1]" . $array[1] . "\n";; # prints nothing - our array only has one item and it is itself the whole array we just set up.
print $array[0][5][2]; # prints 7 - yes we really are 3 levels deep.
print @array[0][5]; # finally, an error message - we have broken it. This cannot be printed, perl is confused at last - as if that's what we were tryin to do.
The point, is to point out that things can get quite confusing down here if you're not careful.
I'll cover the same sort if thing with hashes, and then explain how to get the values out.
More about Hashes, and Hashes of hashes
%myHash = (
"item1" => "value 1",
"item2" => "value 2",
"item3" => "value 3",
"item4" => "value 4",
);
print %myHash;
The first thing to note is that by printing a hash directly, you get a text printout of the whole hash - keys and values. But, probably in the wrong order. These are not like PHP associative arrays, a hash has no internal order - it isn't an array. A loop won't help either:
foreach $key (keys %myHash){
print $key . " - " . $myHash{$key} . "\n";
}
There is no order that can be guaranteed. So first up, if you want ordering, use arrays. Hashes are FAST to work with - very fast. This is why ordering is not preserved. Perl modules are available from cPan such as Tie::IxHash but there is a big performance hit.
Creating a hash of hashes requires items to be put in singly:
$hash{'one'}{'two'}{'three'}=123;
$hash{'two'}{'three'}=23;
$hash{'three'}="three";
$hash{'four'}=(9,2,3,4,5); # incorrect - we can't add an array as a hash key value directly
$hash{'five'}=\(9,2,3,4,5); # correct - the backslash turns the list into a reference which is stored in the hash key.
$hash{'six'}=[1,2,3,4]; # correct - we've used angled brackets to create a reference to array, and have now stored the reference.
or you can take an existing structure and add a reference:
@array=(1,2,3,4,5);
$hash{'seven'}=\@array; # correct - we've referenced an array.
Here' we've taken an existing array, and added the reference to the array as the value of the hash key - this is the correct way to do it.
Dereferencing, or getting the values out of referenced hashes and arrays
Now we've got a sufficiently complicated data structure, we need to read it. When you're up in the top of a data tree and have come accross a reference to an array instead of an actual array, the syntax changes from accessing an array index or hash key to accessing it directly by the reference using the arrow operator (->).
$arrayRef->[arrayIndex];
$hashRef->{'key'};
The code above points to the first item in the array that we've referenced, or to the key of the has via it's reference.
Here's our nice complicated data structure again:
$hash{'one'}{'two'}{'three'}=123;
$hash{'two'}{'two'}=22;
$hash{'three'}{'two'}=32;
$hash{'four'}="four";
$hash{'five'}=\(9,2,3,4,5); # regular brackets denote a list, and we store a reference to it in the hash.
$hash{'six'}=[1,2,3,4,5,6] ; # angled brackets automatically make a reference
@array=(1,2,3,4,5,6,7);
$hash{'seven'}=\@array; # backslash denotes a reference, this time it's a reference to an array we've previously created.
%newHash = ( item1 => "Item one", item2 => "item two", item3 => "item three" );
$hash{'eight'}=\%newHash; # backslash denotes a reference.
The first four values are straight forward, and we can access these simply using $hash{'two'}{'two'}. But what about keys five, six and seven?
print $hash{'six'}->[3]; # print the fourth item of the hash stored in $hash{'six'};
print $hash{'seven'}->[5]; # print the sixth item of the array stored in $hash{'seven'};
print $hash{'eight'}->{'item2'}; # print the hash key value for item2.
now let's mix it up a bit
%anotherHash = ("another" => "Another item one", "anothertwo" => "Another item two");
$hash{'eight'}->{'item4'}=\%anotherHash; # assign anotherHash to a hash key itself in a hash
@anotherArray = (1,2,3,4,\%anotherHash,6,7); # create a new array and assign anotherHash to part of it as well
$hash{'eight'}->{'item5'}=\@anotherArray; # add the new array to a hash key within the original hash
And get these values out..
print $hash{'eight'}->{'item4'}->{'anothertwo'}; # two references
print $hash{'eight'}->{'item5'}->[4]->{'another'}; # three references including one array in the middle
Dereferencing using multiple sigils
$$hashRef->{'key'};
@$arrayRef->[1];
The code above will also dereference.