Text Mining Application Programming teaches
software developers how to mine the vast amounts of
information available on the Web, internal networks, and
desktop files and turn it into usable data. The book
helps developers understand the problems associated with
managing unstructured text, and explains how to build
your own mining tools using standard statistical methods
from information theory, artificial intelligence, and
operations research. Each of the topics covered are
thoroughly explained and then a practical implementation
is provided. The book begins with a brief overview of
text data, where it can be found, and the typical search
engines and tools used to search and gather this text.
It details how to build tools for extracting and using
the text, and covers the mathematics behind many of the
algorithms used in building these tools. From there
you'll learn how to build tokens from text, construct
indexes, and detect patterns in text. You'll also find
methods to extract the names of people, places, and
organizations from an email, a news article, or a Web
page. The next portion of the book teaches you how to
find information on the Web, the structure of the Web,
and how to build spiders to crawl the Web. Text
categorization is also described in the context of
managing email. The final part of the book covers
information monitoring, summarization, and a simple
Question & Answer (Q&A) system. The code used in
the book is written in Perl, but knowledge of Perl is
not necessary to run the software. Developers with an
intermediate level of experience with Perl can customize
the software. Although the book is about programming,
methods are explained with English-like pseudocode and
the source code is provided on the CD-ROM. After reading
this book, you'll be ready to tap into the bevy of
information available online in ways you never thought
possible.
|
|