perl でハッシュを使って探索を高速化する。

要素数が大量にある配列に対して線形探索を行うと、とても時間がかかることがある。その場合、各要素をハッシュの key に入れておくと、探索の際にかなり高速化できる。以下のコードでは、配列をgrepする場合、配列をforeachで回す場合、(今回取り上げている)ハッシュから検索する場合にかかる時間を計測している。
ちなみに、今回の場合、ハッシュの value は何でも良いので、適当に 0 を入れている。(★の箇所)

use strict;
use warnings;
use Time::HiRes "gettimeofday";

# Searching 10000000 from 1~10000000. It's the worst case for linear search.
my $minNum = 1;
my $maxNum = 10000000;
my $numToBeFound = $maxNum;

# Setting numbers to an array and a hash.
my @numArray;
foreach my $num ($minNum .. $maxNum) {
    push(@numArray, $num);
}
my %numHash;
foreach my $num (@numArray) {
    $numHash{$num} = 0; # ★
}

# Case 1.
# Grep the array.
my $startTime = getCurrentTime();
if (grep {$_ == $numToBeFound} @numArray) {
    print "Exist.\n";
}
my $endTime = getCurrentTime();
my $elapsedTime = $endTime - $startTime;
print "$elapsedTime about grepping the array.\n";

# Case 2.
# Foreach the array.
$startTime = getCurrentTime();
foreach my $num (@numArray) {
    if ($num == $numToBeFound) {
        print "Exist.\n";
        last;
    }
}
$endTime = getCurrentTime();
$elapsedTime = $endTime - $startTime;
print "$elapsedTime about foreach the array.\n";

# Case 3.
# Search the hash.
$startTime = getCurrentTime();
if (exists($numHash{$numToBeFound})) {
    print "Exist.\n";
}
$endTime = getCurrentTime();
$elapsedTime = $endTime - $startTime;
print "$elapsedTime about searching the hash.\n";


sub getCurrentTime {
    my ($epochSec, $microSec) = gettimeofday();
    return $epochSec + $microSec/1000000;
}

これを実行すると、以下のようになる。

0.305840015411377 about grepping the array.
0.317622900009155 about foreach the array.
0.000194072723388672 about searching the hash.

配列に対して grep した場合と foreach で回した場合にかかる時間はだいたい同じ(約0.3秒)。一方で、ハッシュに対して探索を行った時は圧倒的にかかる時間が少ない(約0.0002秒)。たぶん、ハッシュを使った場合、perl が内部で効果的な探索を行ってくれるのだろう。