study
study - optimize input data for repeated searches
study SCALAR
study
Takes extra time to study SCALAR (
$_
if unspecified) in anticipation of
doing many pattern matches on the string before it is next modified.
This may or may not save time, depending on the nature and number of
patterns you are searching on, and on the distribution of character
frequencies in the string to be searched--you probably want to compare
runtimes with and without it to see which runs faster. Those loops
which scan for many short constant strings (including the constant
parts of more complex patterns) will benefit most. You may have only
one study active at a time--if you study a different scalar the first
is ``unstudied''. (The way study works is this: a linked list of every
character in the string to be searched is made, so we know, for
example, where all the 'k' characters are. From each search string,
the rarest character is selected, based on some static frequency tables
constructed from some C programs and English text. Only those places
that contain this ``rarest'' character are examined.)
For example, here is a loop which inserts index producing entries
before any line containing a certain pattern:
while (<>) {
study;
print ".IX foo\n" if /\bfoo\b/;
print ".IX bar\n" if /\bbar\b/;
print ".IX blurfl\n" if /\bblurfl\b/;
...
print;
}
In searching for /\bfoo\b/, only those locations in
$_
that contain ``f''
will be looked at, because ``f'' is rarer than ``o''. In general, this is
a big win except in pathological cases. The only question is whether
it saves you more time than it took to build the linked list in the
first place.
Note that if you have to look for strings that you don't know till
runtime, you can build an entire loop as a string and eval that to
avoid recompiling all your patterns all the time. Together with
undefining $/ to input entire files as one record, this can be very
fast, often faster than specialized programs like fgrep(1). The following
scans a list of files (@files) for a list of words (@words), and prints
out the names of those files that contain a match:
$search = 'while (<>) { study;';
foreach $word (@words) {
$search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n";
}
$search .= "}";
@ARGV = @files;
undef $/;
eval $search; # this screams
$/ = "\n"; # put back to normal input delim
foreach $file (sort keys(%seen)) {
print $file, "\n";
}