Product SiteDocumentation Site

11.2. Web Server (HTTP)

The Falcot Corp administrators decided to use the Apache HTTP server, included in Debian Wheezy at version 2.2.22.

11.2.1. Installing Apache

By default, installing the apache2 package causes the apache2-mpm-worker version of Apache to be installed too. The apache2 package is an empty shell, and it only serves to ensure that one of the Apache versions is actually installed.
The differences between the variants of Apache 2 are concentrated in the policy used to handle parallel processing of many requests; this policy is implemented by an MPM (short for Multi-Processing Module). Among the available MPMs, apache2-mpm-worker uses threads (lightweight processes), whereas apache2-mpm-prefork uses a pool of processes created in advance (the traditional way, and the only one available in Apache 1.3). apache2-mpm-event also uses threads, but they are terminated earlier, when the incoming connection is only kept open by the HTTP keep-alive feature.
The Falcot administrators also install libapache2-mod-php5 so as to include the PHP support in Apache. This causes apache2-mpm-worker to be removed, and apache2-mpm-prefork to be installed instead, since PHP only works under that particular MPM.
Apache is a modular server, and many features are implemented by external modules that the main program loads during its initialization. The default configuration only enables the most common modules, but enabling new modules is a simple matter of running a2enmod module; to disable a module, the command is a2dismod module. These programs actually only create (or delete) symbolic links in /etc/apache2/mods-enabled/, pointing at the actual files (stored in /etc/apache2/mods-available/).
With its default configuration, the web server listens on port 80 (as configured in /etc/apache2/ports.conf), and serves pages from the /var/www/ directory (as configured in /etc/apache2/sites-enabled/000-default).

11.2.2. Configuring Virtual Hosts

A virtual host is an extra identity for the web server.
Apache considers two different kinds of virtual hosts: those that are based on the IP address (or the port), and those that rely on the domain name of the web server. The first method requires allocating a different IP address (or port) for each site, whereas the second one can work on a single IP address (and port), and the sites are differentiated by the hostname sent by the HTTP client (which only works in version 1.1 of the HTTP protocol — fortunately that version is old enough that all clients use it already).
The (increasing) scarcity of IPv4 addresses usually favors the second method; however, it is made more complex if the virtual hosts need to provide HTTPS too, since the SSL protocol hasn't always provided for name-based virtual hosting; the SNI extension (Server Name Indication) that allows such a combination is not handled by all browsers. When several HTTPS sites need to run on the same server, they will usually be differentiated either by running on a different port or on a different IP address (IPv6 can help there).
The default configuration for Apache 2 enables name-based virtual hosts (with the NameVirtualHost *:80 directive in the /etc/apache2/ports.conf file). In addition, a default virtual host is defined in the /etc/apache2/sites-enabled/000-default file; this virtual host will be used if no host matching the request sent by the client is found.
Each extra virtual host is then described by a file stored in /etc/apache2/sites-available/. Setting up a website for the falcot.org domain is therefore a simple matter of creating the following file, then enabling the virtual host with a2ensite www.falcot.org.

Example 11.16. The /etc/apache2/sites-available/www.falcot.org file

<VirtualHost *:80>
ServerName www.falcot.org
ServerAlias falcot.org
DocumentRoot /srv/www/www.falcot.org
</VirtualHost>
The Apache server, as configured so far, uses the same log files for all virtual hosts (although this could be changed by adding CustomLog directives in the definitions of the virtual hosts). It therefore makes good sense to customize the format of this log file to have it include the name of the virtual host. This can be done by creating a /etc/apache2/conf.d/customlog file that defines a new format for all log files (with the LogFormat directive). The CustomLog line must also be removed (or commented out) from the /etc/apache2/sites-available/default file.

Example 11.17. The /etc/apache2/conf.d/customlog file

# New log format including (virtual) host name
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vhost

# Now let's use this "vhost" format by default
CustomLog /var/log/apache2/access.log vhost

11.2.3. Common Directives

This section briefly reviews some of the commonly-used Apache configuration directives.
The main configuration file usually includes several Directory blocks; they allow specifying different behaviors for the server depending on the location of the file being served. Such a block commonly includes Options and AllowOverride directives.

Example 11.18. Directory block

<Directory /var/www>
Options Includes FollowSymlinks
AllowOverride All
DirectoryIndex index.php index.html index.htm
</Directory>
The DirectoryIndex directive contains a list of files to try when the client request matches a directory. The first existing file in the list is used and sent as a response.
The Options directive is followed by a list of options to enable. The None value disables all options; correspondingly, All enables them all except MultiViews. Available options include:
  • ExecCGI indicates that CGI scripts can be executed.
  • FollowSymlinks tells the server that symbolic links can be followed, and that the response should contain the contents of the target of such links.
  • SymlinksIfOwnerMatch also tells the server to follow symbolic links, but only when the link and the its target have the same owner.
  • Includes enables Server Side Includes (SSI for short). These are directives embedded in HTML pages and executed on the fly for each request.
  • Indexes tells the server to list the contents of a directory if the HTTP request sent by the client points at a directory without an index file (ie, when no files mentioned by the DirectoryIndex directive exists in this directory).
  • MultiViews enables content negotiation; this can be used by the server to return a web page matching the preferred language as configured in the browser.
The AllowOverride directive lists all the options that can be enabled or disabled by way of a .htaccess file. A common use of this option is to restrict ExecCGI, so that the administrator chooses which users are allowed to run programs under the web server's identity (the www-data user).

11.2.3.1. Requiring Authentication

In some circumstances, access to part of a website needs to be restricted, so only legitimate users who provide a username and a password are granted access to the contents.

Example 11.19. .htaccess file requiring authentication

Require valid-user
AuthName "Private directory"
AuthType Basic
AuthUserFile /etc/apache2/authfiles/htpasswd-private
The /etc/apache2/authfiles/htpasswd-private file contains a list of users and passwords; it is commonly manipulated with the htpasswd command. For example, the following command is used to add a user or change their password:
# htpasswd /etc/apache2/authfiles/htpasswd-private user
New password:
Re-type new password:
Adding password for user user

11.2.3.2. Restricting Access

The Allow from and Deny from directives control access restrictions for a directory (and its subdirectories, recursively).
The Order directive tells the server of the order in which the Allow from and Deny from directives are applied; the last one that matches takes precedence. In concrete terms, Order deny,allow allows access if no Deny from applies, or if an Allow from directive does. Conversely, Order allow,deny rejects access if no Allow from directive matches (or if a Deny from directive applies).
The Allow from and Deny from directives can be followed by an IP address, a network (such as 192.168.0.0/255.255.255.0, 192.168.0.0/24 or even 192.168.0), a hostname or a domain name, or the all keyword, designating everyone.

Example 11.20. Reject by default but allow from the local network

Order deny,allow
Allow from 192.168.0.0/16
Deny from all

11.2.4. Log Analyzers

A log analyzer is frequently installed on a web server; since the former provides the administrators with a precise idea of the usage patterns of the latter.
The Falcot Corp administrators selected AWStats (Advanced Web Statistics) to analyze their Apache log files.
The first configuration step is the customization of the /etc/awstats/awstats.conf file. The Falcot administrators keep it unchanged apart from the following parameters:
LogFile="/var/log/apache2/access.log"
LogFormat = "%virtualname %host %other %logname %time1 %methodurl %code %bytesd %refererquot %uaquot"
SiteDomain="www.falcot.com"
HostAliases="falcot.com REGEX[^.*\.falcot\.com$]"
DNSLookup=1
LoadPlugin="tooltips"
All these parameters are documented by comments in the template file. In particular, the LogFile and LogFormat parameters describe the location and format of the log file and the information it contains; SiteDomain and HostAliases list the various names under which the main web site is known.
For high traffic sites, DNSLookup should usually not be set to 1; for smaller sites, such as the Falcot one described above, this setting allows getting more readable reports that include full machine names instead of raw IP addresses.
AWStats will also be enabled for other virtual hosts; each virtual host needs its own configuration file, such as /etc/awstats/awstats.www.falcot.org.conf.

Example 11.21. AWStats configuration file for a virtual host

Include "/etc/awstats/awstats.conf"
SiteDomain="www.falcot.org"
HostAliases="falcot.org"
AWStats uses many icons stored in the /usr/share/awstats/icon/ directory. In order for these icons to be available on the web site, the Apache configuration needs to be adapted to include the following directive:
Alias /awstats-icon/ /usr/share/awstats/icon/
After a few minutes (and once the script has been run a few times), the results are available online: