The LWP modules provide the core of functionality for web programming in Perl. It contains the foundations for networking applications, protocol implementations, media type definitions, and debugging ability.
The modules LWP::Simple and LWP::UserAgent define client applications that implement network connections, send requests, and receive response data from servers. LWP::RobotUA is another client application that is used to build automated web searchers following a specified set of guidelines.
LWP::UserAgent is the primary module used in applications built with LWP. With it, you can build your own robust web client. It is also the base class for the Simple and RobotUA modules. These two modules provide a specialized set of functions for creating clients.
Additional LWP modules provide the building blocks required for web communications, but you often don't need to use them directly in your applications. LWP::Protocol implements the actual socket connections with the appropriate protocol. The most common protocol is HTTP, but mail protocols (like SMTP), FTP for file transfers, and others can be used across networks.
The following sections describe the RobotUA, Simple, and UserAgent modules of LWP.
The Robot User Agent (LWP::RobotUA) is a subclass of LWP::UserAgent, and is used to create robot client applications. A robot application requests resources in an automated fashion. Robots perform such activities as searching, mirroring, and surveying. Some robots collect statistics, while others wander the Web and summarize their findings for a search engine.
The LWP::RobotUA module defines methods to help program robot applications and observes the Robot Exclusion Standards, which web server administrators can define on their web site to keep robots away from certain (or all) areas of the site.
The first parameter,$rob = LWP::RobotUA->new(agent_name, email, [$rules]);
agent_name, is the user agent identifier that is used for the value of the User-Agent header in the request. The second parameter is the email address of the person using the robot, and the optional third parameter is a reference to a WWW::RobotRules object, which is used to store the robot rules for a server. If you omit the third parameter, the LWP::RobotUA module requests the robots.txt file from every server it contacts, and then generates its own WWW::RobotRules object.
Since LWP::RobotUA is a subclass of LWP::UserAgent, the LWP::UserAgent methods are used to perform the basic client activities. The following methods are defined by LWP::RobotUA for robot-related functionality:
LWP::Simple provides an easy-to-use interface for creating a web client, although it is only capable of performing basic retrieving functions. An object constructor is not used for this class; it defines functions to retrieve information from a specified URL and interpret the status codes from the requests.
This module isn't named Simple for nothing. The following lines show how to use it to get a web page and save it to a file:
The retrieving functionsuse LWP::Simple; $homepage = 'oreilly_com.html'; $status = getstore('http://www.oreilly.com/', $homepage); print("hooray") if is_success($status);
headreturn the URL's contents and header contents respectively. The other retrieving functions return the HTTP status code of the request. The status codes are returned as the constants from the HTTP::Status module, which is also where the
is_failuremethods are obtained. See Section 17.3.4, "HTTP::Status" later in this chapter for a listing of the response codes.
The user-agent identifier produced by LWP::Simple is
is the version number of LWP being used.
The following list describes the functions exported by LWP::Simple:
You give the object a request, which it uses to contact the server, and the information you requested is returned. The most often used method in this module is$ua = new LWP::UserAgent;
request, which contacts a server and returns the result of your query. Other methods in this module change the way
requestbehaves. You can change the timeout value, customize the value of the User-Agent header, or use a proxy server.
The following methods are supplied by LWP::UserAgent: