PSDT Main Web

Pro Perl Parsing
by Christopher M. Frenz
Apress, 2005
ISBN 1590595041

Reviewed by Peter Scott

APress continues to take risks in the Perl book market; a relatively scholarly book about parsing using Perl would not generally be recognized as filling a yawning niche, yet my copy was delivered in hardcover. Overall, this book reads more like a Comp. Sci. textbook than most Perl books. ("Higher Order Perl" ventures into equally complex territory, but Mark Jason Dominus uses such clear and down to earth language that you hardly notice.) That emphasis is clearly a product of the author's background (he is a bioinformaticist, and the more serious examples are from that field), and clearly deliberate, although whether it's part of an APress strategy to dip a toe in the textbook market waters is not clear.

After a brief introduction to the concepts of parsing and lexing, Frenz covers the relatively little-used Parse::Lex before moving on to regular expressions, which gives you a hint of the academic mindset in the book, which is then cemented when regexes are explained in terms of state machines. Then there is a useful discursion on Regexp::Common.

The Comp. Sci. approach really takes hold in the next chapter on grammars, as Frenz explains the Chomsky hierarchy of type 0, 1, 2, and 3 grammars, BNF notation, production trees, and natural language parsing issues. The continued exercise of these grammars on trivial examples starts to grate after a while, and the Perl content is light. The next chapter on parsing is just as academic, starting with bottom-up parsing (if you've wondered what "shift/reduce conflicts" meant when you've seen bison or yacc run, you'll find out here). There is an interesting module-less Perl parser that reads in grammar rules and applies them. This is then followed by an entire chapter on Parse::Yapp, which is surely the most attention that module has ever received. The same trivial examples are exercised in that chapter.

Of course you'd expect Parse::RecDescent to be covered, and there is a chapter on it. This may be the most useful part of the whole book, as Frenz goes into considerable detail and depth, although the examples are still trivial. The chapter on HTML::TreeBuilder shows more practical examples, and the chapter on XML neatly covered DOM and SAX and the rare topic of validation with both DTDs and Schemas. It then segued to RPC::XML and SOAP::Lite, and a concise example of using XML::SAX::ParserFactory, XML::Handler::YAWRiter, and XML::Filter::Sort that should be thought-provoking to the XML programmer. The miscellaneous topics are an odd bunch of Text::Balanced, Date::Parse, XML::RSS::Parser, and Math::Expression, and an example of using Parse::RecDescent to parse command line options, of all things.

Finally there is a brave chapter on data mining, a topic I have previously not associated with Perl, and Frenz opened my eyes to the presence of at least one module (Data::Mining::AssociationRules) on that topic in CPAN. This probably isn't data mining in the sense that most businesses understand it, but that's due to a relative lack of Perl modules for doing it. So Frenz covers Statistics::Descriptive and Statistics::Regression (it's very useful to see how easy trend analysis can be with Perl), and then gives a module-less example of a neural network as conclusive evidence that he knows what he's talking about (complete with the equations for learning by back propagation). But there was no mention of AI::Perceptron or AI::NeuralNet::* or AI::NNFlex (although that last module likely came out after galley time). There is an unusual lack of a bibliography.

The author hasn't appeared in the Perl community before, and his previous book was "Visual Basic and Visual Basic .NET for Scientists and Engineers." The technical reviewer, Teodor Zlatanov, however, has some modules on CPAN and other presence in the Perl community. Notice that I said *the* technical reviewer. APress continues to think that they need only one for a book, and I've noted in other reviews how I believe this has hurt them.

If you desire a strong theoretical underpinning to your parsing, this book delivers; there's no arguing with the "Pro" part of the title. Since I am in that category, I like this approach; however, I suspect I am in a minority compared with the people who are more focussed on the pragmatic aspects of Perl programming. This book's main competition is Dave Cross's "Data Munging with Perl" (Manning, 1930110006), and presenting so much theory appears to be the chosen means of distinguishing between them. There is considerable work behind this book, and since its practical examples are in bioinformatics, I suspect it will resonate with that crowd most. Frenz and APress deserve kudos for pitching a book to promote Perl to an academic market that often looks down its nose at us.

About | Services | News | Tips | Publications | Contact

Top (this article)

© Pacific Systems Design Technologies
Revised 11/8/2005