iX 06/1996, Seite 122: World Wide Web
Since March of this year 'Apache' has been the world's most used WWW server, according to a survey by the British company Netcraft - just ahead of the NCSA daemon. Configuring Apache is not such a job as with the now legendary sendmail ; but a few small things have to be taken into consideration.
The Indian name of the most popular Web daemon was originally an expression of what it characterised: a bug corrected (a patchy) NCSA server. That's long ago because since at least version 0.8.x the number of Apache users can hardly be counted. A whole range of well-known - and above all well visited - Web sites, such as MIT's Artificial Intelligence Laboratory and HotWired, use Apache and the commercial Linux distribution of Codera (see iX 2/96) stands at version 1.0.
Apache is, amongst other reasons, interesting because Ben Laurie has written a freely available SSL implementation (Secure Socket Layer) for the server version 1.0.3 (also see the article Schlüsselfertig in this issue). In addition there is a project underway to realise Apache for OS/2: a beta release 1.0.3 is available on the Apache group's Web pages.
At the start the is always source: www.apache.org is the name of the central server where all the most important information about Apache can be found. However, there are mirrors of this computer all over the world, just not in Germany. Users 'just' wanting the source code can point their view (or the shell) to Regensburg, where the source is available to download. Additionally, the ancient 0.6.5 version and the newer version 1.0.2 are available on the iX-Abo-CD-ROM.
This mirroring affects how up-to-date the files are. At the end of April the Apache organisation's server had not only daemon version 1.0.5 but also a beta release for 1.2b. In Regensburg, on the other hand, the current version was reported to be 1.0.3 at that time. If you really have to have the latest version, make sure you reference the www.apache.org server. And people who can't constantly keep checking back there can join an appropriate mailing list. Send a message to majordomo@apache.org, leaving the subject field blank and inserting subscribe apache-announce as the message body.
Update, June 4, 1996
Although at the time of writing there wasn't any German server mirroring Apache, now there are at least two (not yet counting the June 5 update): one at Darmstadt (http://apache.zit.th-darmstadt.de) (Technische Hochschule) and one at Dreieich (http://apache.www.nacamar.de/) (Nacamar) [thanks to Michael Beckmann und Peter Gramlich].Update, June 6, 1996
As Udo Steinegger states, the Regensburg Apache-Mirror was not working properly recently. It is doing it now.
There is a weekly digest available of the developers' mailing list at UKWeb (www.ukweb.com/support/apacheweek/), or a message to majordomo@mail.ukweb.com with message body subscribe apacheweek will put you on their mailing list.
At about 200 KB (tar.gz) - depending on version - Apache doesn't take any more room than a common PD tool. It's the secure SSL Apache that boost the binary to over a Mbyte in size. No problem then. The same goes for compiling because the source tree has only two branches: src and cgi-src. There are makefiles in both and with their help the httpd should be quickly compiled. The configuration files in the src directory are called, not unexpectedly, Configuration and contain advice for a whole range of Unix variants (Solaris, Next, Linux, SCO, HP-UX).
In the new and last version, Apache comes as a configurable package. Individual modules can be included or removed before compilation as required. Because the server is compatible with NCSA httpd 1.3 it contains a module to manage this. There is also a component to support the Netscape cookies (they give CGI scripts the ability to hold status information independently of individual documents).
After the usual alteration of the Configuration file - selection of the CC compiler et cetera - the daemon is quickly translated and the actual work can begin: configuring the server. The Apache group has made preparations in that there are four files available in the conf directory which only have to be modified. People who know the NSCA daemon won't have problems here its original files are already contained in this distribution. And users needing more reading material after this article can look at our literature references (chapter 3 in [1], chapter 18 in [2]).
To install a Web server the Webmaster starting out has to create a directory for the binaries and configuration files. Here /usr/local/www is used as the output directory, though any other can be chosen as desired. Within this directory it's recommended that there are more directories for documents, icons, for configurations files, for executable programs (CGI) and one for log files. It also makes sense to take of the arrangement foreseen in the distribution.
Within the directories for the configurations files there are three more files for editing the actual operation of Apache:
| httpd.conf | Daemon |
| access.conf | Accesses |
| srm.conf | Appearance |
A fourth type is in conf with mime.types, but configuring that isn't mandatory.
The first decision in the httpd.conf is one that one that isn't private anymore; because the WWW daemon has to run either after being started in inetd or run as a stand alone item. The latter instance is for the case when the daemon is to start when the computer is re-started, a script such as Standalone in /etc/init.d with the link to /etc/rc3.d (SVR4: Solaris 2.x) or an appropriate entry in /etc/rc.local (BSD systems). In both cases that includes the system administrator (root) (if the root isn't already in the personal union). The stand alone version is naturalised because a daemon that has been started doesn't have to read the configuration file for every access.
After the test phase it's normal to set the port to 80. Only then can the port number be omitted from URL -- unlike in http://www999.ix.de:999/ (a URL that doesn't exist).
Some of the settings in httpd.conf are good and proper just as the authors have given them. The path name for error and log files, for example, are right. The place for the process number also makes sense too; the Webmaster needs it in the moment that he wants to let the server read the new configuration by using kill -HUP; to be more precise kill -HUP `cat /usr/local/www/logs/httpd.pid`.
Other entries should be changed urgently: after all, email messages about WWW related problems shouldn't really be sent to you@your.server. The same applies for server name, which nobody should have to dream up. It can be the host name itself, and the fully qualified one, that is with domain name - or an alias. In the latter instance, that also has to be included in the responsible name service: in /etc/hosts, in the Domain Name System (DNS) or in the Network Information Service (NIS). If you're unsure it's worth having a chat with the network administrator.
As well as the things previously mentioned, the path is also of central importance: ServerRoot must be the exit path for all the files that are important for the server (above all conf and htdoc). Here, as mentioned above, /usr/local/www is used. The settings made in the part below can stay as they are. If the server it to serve numerous IP addresses the last directive (<VirtualHost>) has to be considered.
For an introduction article, covering this aspect would be going too far, but as several commercial users need, Apache can work with so-called Multi-Homing (offering several domains on a single server).
In the second configuration file (srm.conf) the purpose is to specify which documents to show. In addition, where they are to be found is also to be set. In the listing this sets the variable DocumentRoot: /usr/local/www/htdocs is the name for the directory. This represents the entry point into the public files which the surfer calls up with http://www.my_server.com/. Also, everyone should set both of the following variables themselves - or consciously not do it.
It's not every server administrator that thinks it's good when a Web server accesses files that are outside the path set in ServerRoot. Therefore some people completely refuse reference to UserDir in the user's respective $HOME (the default is public_html) and instead consider DocumentRoot under a name such as home, but which can't be designated as UserDir - this will be possible from version 1.1.
DirectoryIndex names the files that the server shows when a user doesn't try to individual document but instead just a directory. The input index.html then means that when someone calls up http://www.heise.de/ix/dieser_artikel/ (just an example) there should be a file called index.html in the /usr/local/www/htdocs/ix/dieser_artikel directory which Apache sends to the client. If the index.html file doesn't exist, a list of files and sub-directories can be viewed and clicked on by the user.
That can be avoided in three ways: either the Webmaster makes sure that there is a files called index.html in every directory, or with DirectoryIndex defines various names for the document to be shown, for example:
DirectoryIndex index.html welcome.html default.html default.shtml
- where the last suffix (along with file name) is necessary if the includes on the server side are used, (also see the box on the server side). That's to do with scripts which are linked in when the page is called up, for example to automatically include the date of the last change:
<p>Last update:<!--#echo var="LAST_MODIFIED"--></p>
Such activities are sometimes disputed, because they imply that when a page is called up an Internet connection has to be established, even if the page is in local cache or on a proxy.
It is also possible to forbid access to the directory listing on a directory, whereby Indexes doesn't appear under the options (see below and the start of listing 4).
Of the remaining items to be configured two are from that point of view interesting in that the Webmaster needs to attend to them, whilst others such as AddType can be left out during the first configuration. AccessFileName determines the file which contains access limitations for a directory. Under the line Self-made error reports there is a list of three different options which show how it's possible to create ErrorDocuments. Either a short text is included in the srm.conf, or a reference to a script which will respectively start another document. Finally, there can be a diversion to an external server.
Above all when a server can be publically accessed it's often the case that there are some files or directories that are only intended for internal use or just available for selected external users. Using the directive <Location> Apache can limit access to an individual document in versions 1.1 upwards (see the box still in the beta stage). Directories, on the other hand, can be protected from unwanted access in the current release, version 1.0.5, using the <Directory> instruction.
It makes sense to restrict permission for DocumentRoot (recursive) in access.conf, as can be seen in the listing. In addition further restrictions for other directories can be set here too - also underneath the DocumentRoot. For that purpose the file .htaccess is foreseen, which has to be put in a directory (the name can be chosen in AccessFileName).
In .htaccess the Webmaster usually states whether symbolic links should be followed for individual directories or whether the directory listing may be shown if no index.html or similar file (see above) is available. In addition to that the Web administrator can use access.conf to determine which options the .htaccess available in various directories is allowed to overwrite (Allow Override). Also see the box options for options.
If an access rule is required for a whole document tree, then this can be set up in the <DIRECTORY> instruction: with the help of <LIMIT>. This can be seen in the bottom part of the access.conf listing. However, this isn't for restrictions on documents, rather access for everyone (allow from all). If you want an overview of users allowed to make accesses, can block the access for everyone and then reinstate it for those who are allowed to read the documents:
<Limit GET>order deny,allow deny from all allow from .my.domain 105.23.34.144 105.23.33.</Limit>
deny from all locks everyone out. If the fictitious address 105.23 existed, all the computers in the local network (those out of the same domain) would be allowed access the files, as well as machines from the network 105.23.33 and also the computer with the IP address 105.23.34.144.
One of the big sites that uses Apache is the online version of Wired: HotWired. You may know that access to information is limited in that you have to register for a password first in order to gain full access.
In the Apache distribution's support directory support there are two scripts, along with htpasswd and dbmmanage which are supposed to help administer users. In addition, the Webmaster can create group file which can be queried. The password files (and those for the group) shouldn't be under the DocumentRoot - and definitely not in the directory to be protected. conf is surely a good choice. A password file implicitly contains the following command:
htpasswd -c /pfad/der/passwort/datei adduser anton bcez8r
- where the password is given in plain text and -c is only necessary when creating the datei...
Whether a user can post queries or not is contained in the <DIRECTORY> part, above the <LIMIT> section in access.conf or .htaccess in the appropriate directory. For example, this:
<Directory /usr/local/www/htdocs/Mine>AuthName My personal Stuff AuthType Basic AuthUserFile /usr/local/www/conf/passwd AuthGroupFile /usr/local/www/conf/group<Limit GET>require user anton bertha christoph require group my_department</Limit>
allows only members of the group my_department and users anton, bertha and christoph to read files in the directory called Mine, providing they enter their passwords properly.
dbmmanage in the distribution's support directory perform the same function as htpasswd and uses the rudimentary Datebase format, known from the NIS area, and lays down the .pag and .dir files. Instead of AuthUserFile that has to be called AuthDBMUserFile in the above example; the same is valid for the group.
The developments that are to follow in release 1.1 are shown in the still in the beta stage box. Some of the coming modules suggest that the Apache group intends to do a few things that will make Webmasters happier: being able to limit access to individual documents and a status module that gives those responsible information about the status of the server are just two. All that's missing is a menu controlled configuration.
Literature
[1] Lincoln D. Stein; How to Set Up and Maintain a World Wide Web Site; Reading, MA (Addison-Wesley) 1995
[2] Cricket Liu, Jerry Peek, Russ Jones, Bryan Buus, Adrian Nye; Managing Internet Information Services; Sebastopol, CA (O'Reilly & Associates) 1995
Version zum Drucken | Per E-Mail versenden | Heft bestellen
Kein Kiosk in der Nähe?
Da hilft eine Online-Bestellung für die Frei-Haus-Lieferung.