Difference between revisions of "Apache"

From CSE330 Wiki
Jump to navigationJump to search
 
(50 intermediate revisions by 6 users not shown)
Line 1: Line 1:
This page describes how to set up a web server on a Linux machineIf you are unfamiliar with using Linux from the command line, you should read [[Linux|the Linux guide]] first.
+
''Apache'' is the industry standard web server for Linux distributionsIt is highly configurable and has a wide range of modules ready for different needs.
 +
 
 +
{{XKCD
 +
|name=permanence
 +
|id=910
 +
}}
  
 
== What is a web server? ==
 
== What is a web server? ==
Line 7: Line 12:
 
For example, when you visit this wiki, you are sending a request over the internet to some machine that is probably located somewhere in EIT (the user seldom knows exactly where the machine is located).  The web server receives your request, and it processes the data you sent.  Finally, the server prepares a response (the web page), and sends it back to you.
 
For example, when you visit this wiki, you are sending a request over the internet to some machine that is probably located somewhere in EIT (the user seldom knows exactly where the machine is located).  The web server receives your request, and it processes the data you sent.  Finally, the server prepares a response (the web page), and sends it back to you.
  
== SSH ==
+
== Installing Apache ==
 
 
When connecting as an administrator to your machine over the internet or intranet, you will most likelly be using ''ssh'' (secure shell).  SSH access requires that the ''sshd'' daemon is running in your machine.
 
  
By default, SSH is preinstalled on your EC2 instance.  If you are not using an EC2 instance, simply install it from yum or apt.
+
{{RequiredInstructions|content=
  
=== SSH Keys ===
+
In yum, Apache is distributed under the package name '''httpd''' (for ''hypertext transfer protocol daemon'').  Use the package manager associated with your distribution to install Apache.  (For more information on how to use yum, see [[Linux#Repository-Based Package Managers|the Linux guide]].)
  
Normally, you can SSH into your machine with one of two ways: you can use traditional username/password authentication, or you can use a public/private key pair.  A public/private key pair is generally considered to be more secure, but it requires that you always have access to your private key file when you want to log into your remote machine.  By default, EC2 instances allow only public/private key pair authentication.  You can enable password-based authentication by changing the PaswordAuthentication option in ''/etc/ssh/sshd_config'' to '''yes''':
+
For example:
 
 
PasswordAuthentication yes
 
 
 
If possible, however, you should restrict yourself to using private and public keys.
 
 
 
=== SSH Server Configuration ===
 
 
 
The configuration files for SSH are in ''/etc/ssh''. You can modify the files to affect SSH permissions, among other things. For example, it is always a good idea to disable root access over ssh. This could be done by editing ''/etc/ssh/sshd_config'' and setting 
 
 
 
PermitRootLogin no
 
 
 
For more detail on editing files on the command line, see [[Linux#File Editors|the Linux guide]].
 
 
 
You will need to restart the server for changes to take effect:
 
  
 
<source lang="bash">
 
<source lang="bash">
$ sudo service ssh restart  # if that doesn't work, try: sudo /etc/init.d/sshd restart
+
$ sudo yum update
ssh stop/waiting
+
$ sudo yum install httpd
ssh start/running, process 1443
 
$  
 
 
</source>
 
</source>
  
'''Caution:''' Disabling root access over SSH for your EC2 instance should only be done after setting up an additional user account and adding that account to the sudoers list.
+
You'll need to run this command to add Apache as a startup item:
 
 
==== Adding SSH Users ====
 
 
 
If you want to log into your EC2 instance using your own account rather than the default '''ec2-user''' or '''ubuntu''' account that Amazon made for you, you need to do some additional configurations.
 
 
 
If you are using keys, you need to associate your private key with the user on the server.  To do this, run the following commands:
 
  
 
<source lang="bash">
 
<source lang="bash">
$ sudo cd /home/myUserName/
+
$ sudo /sbin/chkconfig --levels 235 httpd on
$ sudo mkdir .ssh
 
$ sudo cp /home/originalUserName/.ssh/authorized_keys .ssh  # originalUserName should be ec2-user in the Amazon AMI or ubuntu in Ubuntu 12.04 LTS
 
$ sudo chmod 700 .ssh
 
$ sudo chmod 600 .ssh/authorized_keys
 
$ sudo chown -R myUserName:myUserGroup .ssh  # myUserGroup is probably the same as myUserName; for example, if the username were alice, you could run: sudo chown -R alice:alice .ssh
 
$
 
 
</source>
 
</source>
  
You should now be able to log into your server using your custom username with SSH keys!
+
In RHEL, most Apache configurations are stored in ''/etc/httpd/conf/httpd.conf'' and others are located in the directory ''/etc/httpd/conf.d/''.
 
 
'''IMPORTANT:''' From here on out it is recommended that you always login as your user account instead of '''ec2-user''' or '''ubuntu'''.  Previous versions of the Amazon EC2 cloud created an account named "root", instead of the default account '''ec2-user''' or '''ubuntu'''. If you really want to login as root, you can set the password for that account by typing: sudo passwd root.  As explained below, it is not recommended to use the root account or allow ssh connections with that username.  It is advisable to use the account you created instead of the root account (or '''ec2-user''' or '''ubuntu''') for security reasons. Additionally, the act of requiring you to type "sudo" in order to run commands as root serves as a reminder that the command you are typing in should be examined carefully.
 
 
 
=== SSH Client Configuration ===
 
 
 
==== Unix-Based Systems (including Mac OS X) ====
 
 
 
Mac OS X is based on BSD, a flavor of Unix.  As such, Mac OS X comes pre-built with all the tools you need to use SSH!  Simply fire up Terminal and enter the command
 
 
 
ssh username@hostname
 
 
 
To use SSH with a key pair, use the command
 
 
 
ssh -i /path/to/key.pem username@hostname
 
 
 
==== Non-Unix-Based Systems (including Microsoft Windows) ====
 
 
 
Unfortunately, using SSH with Windows is more complicated. Amazon provides a great tutorial on how to connect to a virtual machine from Windows (follow this link if you are in the Urbauer Lab) [http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/ConnectToInstanceLinux.html#connect-from-windows-machine]. It is necessary to install an SSH client to support the connections.  A widely used SSH client for Windows is PuTTY.  You can download PuTTY from http://www.chiark.greenend.org.uk/~sgtatham/putty/
 
 
 
PuTTY is fairly simple and straight forward with one caveat: Amazon's .pem key pair files are not compatible with PuTTY keys. In order to convert ''.pem'' keys to a PuTTY ''.ppk'' privte key file, you should use the puttygen.exe utility available from the same page [http://www.chiark.greenend.org.uk/~sgtatham/putty/] as PuTTY.
 
Next select import under the conversions menu,load the amazon ''.pem'' key file and press the save private key button. Be sure to save the file in the directory where PuTTY looks for its keys.
 
 
 
Copy and paste works similarly to the X Window System in Unix. You use the left mouse button to select text in the PuTTY window. The act of selection automatically copies the text to the clipboard: there is no need to press Ctrl-Ins or Ctrl-C or anything else. In fact, pressing Ctrl-C will send a Ctrl-C character to the other end of your connection (just like it does the rest of the time), which may have unpleasant effects. The only thing you need to do, to copy text to the clipboard, is to select it.
 
 
 
To paste the clipboard contents into a PuTTY window, by default you click the right mouse button. If you have a three-button mouse and are used to X applications, you can configure pasting to be done by the middle button instead, but this is not the default because most Windows users don't have a middle button at all.  
 
  
Also, here is a good PuTTY tutorial that you might find useful to get started: http://kb.mediatemple.net/questions/1595/Using+SSH+in+Putty+%28Windows%29
+
At this point, Apache has been installed, but is not yet running. To start the webserver, run this command:
 
 
=== SSHFS ===
 
 
 
SSHFS is a filesystem client which allows secure mounting of remote file systems. While there are other ways to mount remote file systems, SSHFS has the advantage of being able to mount a file system located on any host that has an SSH daemon running without any host side installation or configuration. This means that you can easily access and edit your files using all of your local applications including IDEs.
 
 
 
As you may have inferred from the name, the underlying implementation utilizes SSH File Transfer Protocol in combination with FUSE, a package now included in the kernel that allows unprivileged users to easily create their own file systems in userspace (see the wikipedia entry for more information [http://en.wikipedia.org/wiki/Filesystem_in_Userspace]).
 
 
 
To mount a share using password based authentication, the command is
 
sshfs user@domain:/path/to/remote/directory /path/to/local/mountpoint
 
e.g. To mount the directory /home/joe/myfiles in the user ''joe'''s home directory for a machine with the domain schmoesfiles.org using SSHFS you would enter the command
 
sshfs joe@www.schmoesfiles.org:myfiles
 
 
 
Note that if you are using public key authentication, the command to mount the remote share is slightly different
 
sshfs -o IdentityFile=/path/to/private/key user@domain:/path/to/remote/directory /path/to/local/mountpoint
 
 
 
To unmount the filesystem you can use the following command
 
fusermount -u /path/to/local/mountpoint
 
 
 
=== SFTP ===
 
 
 
Any server running an SSH server is also compatible with '''SFTP''' or Secure File Transfer Protocol.  (Compare to FTP, or File Transfer Protocol.)
 
 
 
You can use SFTP from the command line, or you can use any GUI file transfer client.  All FTP clients I have seen also support SFTP.  One popular FTP client is [http://www.filezilla-project.org/ Filezilla].
 
 
 
== Apache ==
 
 
 
''Apache'' is the industry standard web server for Linux distributions.  It is highly configurable and has a wide range of modules ready for different needs.
 
 
 
=== Installing Apache ===
 
 
 
In yum, Apache is distributed under the package name '''httpd''' (for ''hypertext transfer protocol daemon'').  In apt, it is distributed under the name '''apache2'''.  Use the package manager associated with your distribution to install Apache.  (For more information on how to use yum and apt, see [[Linux#Repository-Based Package Managers|the Linux guide]].)
 
 
 
When Apache is installed through apt, the HTTP Daemon will be automatically to added as a startup item.  If you are using the Amazon AMI, you need to run this command to add Apache as a startup item:
 
  
 
<source lang="bash">
 
<source lang="bash">
$ sudo /sbin/chkconfig --levels 235 httpd on
+
$ sudo /usr/sbin/apachectl start
$
 
 
</source>
 
</source>
  
In RHEL, all Apache configurations are stored in ''/etc/httpd/httpd.conf''.  Debian takes a more modular approach, having separate directories for each type of configuration, all located in ''/etc/apache2/''.  For more detail on Debian's approach, see http://www.control-escape.com/web/configuring-apache2-debian.html
+
}}
  
=== Apache Directives ===
+
== Apache Directives ==
  
 
You define your settings for Apache using ''directives''.  Some of the directives you will likely encounter include:
 
You define your settings for Apache using ''directives''.  Some of the directives you will likely encounter include:
Line 140: Line 56:
 
* '''Alias:''' Map a directory URL to some other location on your filesystem.  Requires that the ''Alias'' module be loaded.
 
* '''Alias:''' Map a directory URL to some other location on your filesystem.  Requires that the ''Alias'' module be loaded.
  
==== .htaccess Files ====
+
=== .htaccess Files ===
  
 
You can also specify some Apache configurations without delving into the master configuration file.  To do this, put a file named ''.htaccess'' in any directory that Apache is serving.  All directives in it will be interpreted as if they were in a Directory directive in the master configuration file.
 
You can also specify some Apache configurations without delving into the master configuration file.  To do this, put a file named ''.htaccess'' in any directory that Apache is serving.  All directives in it will be interpreted as if they were in a Directory directive in the master configuration file.
  
'''VERY IMPORTANT:''' The directory containing ''.htaccess'' must not have the '''AllowOverride None''' directive in the master configuration file in order for '''.htaccess''' to be read.  In Debian, '''AllowOverride None''' is enabled by default!  The Apache configuration file you need to edit in Debian is ''/etc/apache2/sites-available/default'' (remove the '''AllowOverride None''' located on line 11).
+
'''VERY IMPORTANT:''' The directory containing ''.htaccess'' must not have the '''AllowOverride None''' directive in the master configuration file in order for '''.htaccess''' to be read.
 
+
=== Directory Directive ===
==== Directory Directive ====
 
  
 
Use the Directory directive to assign other directives to a specific directory.  For example:
 
Use the Directory directive to assign other directives to a specific directory.  For example:
Line 169: Line 84:
 
Note that this directory is actually the root directory of the web server.
 
Note that this directory is actually the root directory of the web server.
  
=== Enabling the UserDir Module ===
+
== The UserDir Module ==
 +
 
 +
{{RequiredInstructions|content=
  
 
The UserDir module lets you access files for any user on the server with a ~, e.g., http://ec2-xxx-xxx-xxx-xx.compute-1.amazonaws.com/~paul/
 
The UserDir module lets you access files for any user on the server with a ~, e.g., http://ec2-xxx-xxx-xxx-xx.compute-1.amazonaws.com/~paul/
Line 175: Line 92:
 
This module comes installed, but not activated by default.
 
This module comes installed, but not activated by default.
  
==== Enabling UserDir in Debian ====
+
=== Enabling UserDir in RHEL ===
  
If you are using a Debian-based distribution for your server (like Ubuntu 12.04 LTS), simply run the following command to enable the module:
+
If you are using an RHEL-based distribution for your server (like the Amazon AMI), you need to edit the master Apache configuration file.
  
<source lang="bash">
+
Open ''/etc/httpd/conf.d/userdir.conf'' in your favorite text editor.  For more information on command-line text editors, refer to [[Linux#File Editors|the Linux guide]].
$ sudo a2enmod userdir
 
Enabling module userdir.
 
To activate the new configuration, you need to run:
 
  service apache2 restart
 
$
 
</source>
 
  
Then restart Apache for the changes to take effect.  Now, all users will be able to store their own personal web site in public_html inside their home directory.
+
Find the line that says
  
To make sure everything is working, create a test file in your home directory under public_html, and then point your browser to it: http://ec2-xxx-xx-xx-xxx.compute-1.amazonaws.com/~yourUserName/hello.txt
+
<source lang="apache">UserDir disabled</source>
  
'''Note:''' The UserDir configuration file, which already has all the correct settings, is located at ''/etc/apache2/mods-available/userdir.conf''
+
and change it to
  
==== Enabling UserDir in RHEL ====
+
<source lang="apache">UserDir disabled root</source>
  
If you are using an RHEL-based distribution for your server (like the Amazon AMI), you need to edit the master Apache configuration file.
+
Additionally, find the line that says
  
Open ''/etc/httpd/conf/httpd.conf'' in your favorite text editor.  For more information on command-line text editors, refer to [[Linux#File Editors|the Linux guide]].
+
<source lang="apache">#UserDir public_html</source>
  
Find the line that says
+
and uncomment it; that is, remove the # so that you have
UserDir disabled
 
and change it to
 
UserDir disabled root
 
  
Additionally, find the line that says
+
<source lang="apache">UserDir public_html</source>
#UserDir public_html
 
and uncomment it; that is, remove the # so that you have
 
UserDir public_html
 
  
 
This tells Apache that the directory containing each user's html files is a subdirectory of their home directory called public_html.
 
This tells Apache that the directory containing each user's html files is a subdirectory of their home directory called public_html.
  
To make sure everything is working, create a test file in your home directory under public_html, and then point your browser to it: http://ec2-xxx-xx-xx-xxx.compute-1.amazonaws.com/~yourUserName/hello.txt
+
'''Note:''' Users need to manually create public_html.  It is not created automatically.
 +
 
 +
You may also need to change the permissions of your user directory and your public_html directory to allow Apache to read and execute inside them.  To do this, run the following commands:
 +
 
 +
<source lang="bash">
 +
$ sudo chmod o+x /home/<myusername>
 +
$ sudo chmod o+rx /home/<myusername>/public_html
 +
$
 +
</source>
 +
 
 +
Finally, [[#Restarting Apache and Testing|restart Apache]].
 +
 
 +
}}
  
==== Remapping UserDir ====
+
=== Remapping UserDir ===
  
If you want to change the name of the UserDir web server root from '''public_html''' to something else like '''.html''', follow these instructions.
+
If you want to change the name of the UserDir web server root from '''public_html''' to something else like '''.html''', follow these instructions. (You do not need to do this for the purposes of CSE 330; this serves as a reference.)
  
 
# Rename your '''public_html''' to '''.html'''
 
# Rename your '''public_html''' to '''.html'''
 
#* The "." in front of the directory name means that the directory is a hidden one.  You will not see it with the normal "ls" command.  Use "ls -a" to see hidden files as well.
 
#* The "." in front of the directory name means that the directory is a hidden one.  You will not see it with the normal "ls" command.  Use "ls -a" to see hidden files as well.
 
# Edit the UserDir configuration file.
 
# Edit the UserDir configuration file.
#* In RHEL, edit the master Apache configuration file at ''/etc/httpd/conf/httpd.conf''
+
#* In RHEL, edit the userDir configuration file at ''/etc/httpd/conf.d/userdir.conf''
#* In Debian, edit the UserDir configuration file at ''/etc/apache2/mods-available/userdir.conf''
 
 
# Find the line that reads <code>UserDir public_html</code> and change it to <code>UserDir .html/</code>
 
# Find the line that reads <code>UserDir public_html</code> and change it to <code>UserDir .html/</code>
# Restart Apache.
+
# [[#Restarting Apache and Testing|Restart Apache]].
 +
 
 +
== Apache Logs ==
  
To make sure everything is still working, point your browser to your test file again, and you should still see it: http://ec2-xxx-xx-xx-xxx.compute-1.amazonaws.com/~yourUserName/hello.txt
+
Apache creates two log files: one for all access attempts to your server, and one for errors.  The locations of these log files are:
  
=== Apache Logs ===
+
* '''Access Log:''' /var/log/httpd/access_log
 +
* '''Error Log:''' /var/log/httpd/error_log
  
Apache records all access attempts and errors associated with your server in log filesIt is useful to check your access logs to ensure that things are subbing smoothly and that, for example, you aren't experiencing any denial-of-service-like attacks on your server.
+
You might find it helpful to see access and errors appear "live" in your terminal window as they are createdTo do this, you can use the [[Linux#Command Reference|tail -f]] command.  Since the log files have strict permissions, you will need to also use sudo. Example:
  
In RHEL, the Apache logs are located in ''/var/log/httpd''.  In Debian, the Apache logs are located in ''/var/log/apache2''.
+
<source lang="bash">
 +
$ sudo tail -f /var/log/httpd/access_log
 +
</source>
  
=== Virtual Hosts ===
+
== Virtual Hosts ==
  
 
'''Virtual Hosts''' are used to run multiple Apache web servers from the same machine.  Virtual hosts can listen for connections on different ports and/or different hostnames, serving completely different web sites to each.  For example:
 
'''Virtual Hosts''' are used to run multiple Apache web servers from the same machine.  Virtual hosts can listen for connections on different ports and/or different hostnames, serving completely different web sites to each.  For example:
Line 249: Line 170:
 
</source>
 
</source>
  
This configuration enables any requests that use a host name of ''cse330.dyndns.org'' will use ''/home/www/cse330'' as the root document directory.  Make sure that the DocumentRoot directory exists and is readable by the httpd process.  In RHEL, Apache runs as the '''apache''' user.  In Debian, it runs as the '''www-data''' user.
+
This configuration enables any requests that use a host name of ''cse330.dyndns.org'' will use ''/home/www/cse330'' as the root document directory.  Make sure that the DocumentRoot directory exists and is readable by the httpd process.  In RHEL, Apache runs as the '''apache''' user.
  
It is good practice to put raw server configuration files in ''/etc/httpd/sites-available'' in RHEL or ''/etc/apache2/sites-available'' in Debian.  To activate a site, create a symlink from the configuration in ''sites-available'' to a sibling directory called ''sites-enabled''.  In Debian, these directories are already set up for you, and Debian Apache even provides the '''a2ensite''' and '''a2dissite''' commands to create or destroy the symlinks!  In RHEL, you have to do this by hand.
+
It is good practice to put raw server configuration files in ''/etc/httpd/sites-available''.  To activate a site, create a symlink from the configuration in ''sites-available'' to a sibling directory called ''sites-enabled''.
  
=== Restarting Apache ===
+
== Restarting Apache and Testing ==
  
Whenever you make changes to the Apache configuration files, you will need to restart Apache for the changes to take effect.  There are several different ways to restart Apache; they all do the same thing, so choose your favorite:
+
Whenever you make changes to the Apache configuration files, you will need to restart Apache for the changes to take effect.  There are several different ways to restart Apache; they all functionally do (almost) the same thing, so choose your favorite:
  
 
<source lang="bash">
 
<source lang="bash">
$ /etc/init.d/httpd restart
+
$ sudo /usr/sbin/apachectl restart
$ /sbin/service httpd restart
+
$ sudo apachectl restart    # if /usr/sbin is in your PATH (which it is *not* by default in RHEL)
$ service httpd restart    # if /sbin is in your PATH
+
$ sudo /etc/init.d/httpd restart
$ /usr/sbin/apachectl restart
+
$ sudo /sbin/service httpd restart
$ apachectl restart   # if /usr/sbin is in yoru PATH
+
$ sudo service httpd restart
 
</source>
 
</source>
  
'''Note:''' ''restart'' performs a hard restart of Apache.  To perform a soft restart, use ''graceful'' instead (e.g. <code>apachectl graceful</code>).  To only reload the configuration files but not restart the server, use ''reload'' (e.g. <code>/etc/init.d/httpd reload</code>).
+
If you're torn for which version is the "best" to use, the commands involving '''apachectl''' (think "Apache Control") are written by the Apache folks themselves.  This has a couple advantages:
 +
* They show you errors in the startup process (if there are any)
 +
* They give you the option to perform a "soft" restart; that is, a restart that allows any pending connections to complete.  To perform a soft restart, use ''graceful'': <code>sudo apachectl graceful</code>
 +
In my anecdotal experience, the '''apachectl''' commands are also faster than the '''service''' or '''init.d''' commands.
  
== Subversion ==
+
To make sure everything is working, create a test file in your home directory under public_html, and then point your browser to it: http://ec2-xxx-xx-xx-xxx.compute-1.amazonaws.com/~yourUserName/hello.txt
 
 
'''Subversion''' (svn) provides an easy way to store all of your files that you create in this course on a server backed up at WashU.  It also allows you to easily create and edit text files on your local computer and then transfer them to the Amazon cloud machine using the svn "checkout" command.
 
 
 
=== Setting Up SVN on your Server ===
 
 
 
First, install the package using '''apt''' or '''yum''', depending on whether you chose Debian or RHEL for your Linux distribution.  For more information on installing packages, refer to [[Linux#Repository-Based Package Managers|the Linux guide]].  The package name is '''svn''' in yum and '''subversion''' in apt.
 
 
 
Next, navigate to your home directory and ''checkout'' your repository:
 
 
 
<source lang="bash">
 
$ cd
 
$ svn co https://hostname.wustl.edu/path/to/repository
 
...
 
$
 
</source>
 
 
 
You will need to authenticate using your WUSTL Key.  The path to this semester's repository is listed on [[Module 1]].
 
 
 
After running the above commands, sn empty directory should be created.  To make sure that SVN is working, cd into the directory, create a text file, and tell SVN to start version control on it.  Finally, commit the file to the remote repository:
 
 
 
<source lang="bash">
 
$ cd Lastname-studentid
 
$ touch helloFromTheCloud.txt
 
$ svn add helloFromTheCloud.txt
 
A        helloFromTheCloud.txt
 
$ svn commit -m 'Added first file'
 
...
 
Committed revision 1.
 
$
 
</source>
 
 
 
=== Setting Up SVN on your Desktop ===
 
 
 
If you've taken CSE 131 or 132, you should already have SVN installed in Eclipse.  If not, refer to the tutorial here: http://www.cs.wustl.edu/~cytron/cse131/HelpDocs/Subversive/subversive.htm
 
 
 
When you're ready, check out the repository into Eclipse.  The path to this semester's repository is listed on [[Module 1]].
 
 
 
Do you see the helloFromTheCloud.txt file that you made earlier?  (You should.)
 
 
 
Create another text file, helloFromMyDesktop.txt.  Commit that file to the remote repository
 
 
 
Back on your server, issue <code>svn update</code>.  The helloFromMyDesktop.txt should be downloaded:
 
 
 
<source lang="bash">
 
$ svn update
 
A    HelloFromDemo.txt
 
Updated to revision 89.
 
$
 
</source>
 
 
 
=== SVN Resources ===
 
 
 
* http://xahlee.info/UnixResource_dir/svn.html
 
* http://www.yolinux.com/TUTORIALS/Subversion.html
 
  
[[Category:Module 1]]
+
[[Category:Module 2]]

Latest revision as of 05:44, 10 January 2023

Apache is the industry standard web server for Linux distributions. It is highly configurable and has a wide range of modules ready for different needs.

XKCD Comic: permanence

What is a web server?

A web server is software that listens for connections to your machine, and when a connection is receive, processes the request and responds with the appropriate information. Most web servers listen on port 80, which is reserved for the purpose, and use the HTTP protocol.

For example, when you visit this wiki, you are sending a request over the internet to some machine that is probably located somewhere in EIT (the user seldom knows exactly where the machine is located). The web server receives your request, and it processes the data you sent. Finally, the server prepares a response (the web page), and sends it back to you.

Installing Apache

In yum, Apache is distributed under the package name httpd (for hypertext transfer protocol daemon). Use the package manager associated with your distribution to install Apache. (For more information on how to use yum, see the Linux guide.)

For example:

$ sudo yum update
$ sudo yum install httpd

You'll need to run this command to add Apache as a startup item:

$ sudo /sbin/chkconfig --levels 235 httpd on

In RHEL, most Apache configurations are stored in /etc/httpd/conf/httpd.conf and others are located in the directory /etc/httpd/conf.d/.

At this point, Apache has been installed, but is not yet running. To start the webserver, run this command:

$ sudo /usr/sbin/apachectl start

Apache Directives

You define your settings for Apache using directives. Some of the directives you will likely encounter include:

  • DocumentRoot: The path to the directory where the top level web files are going to be stored.
  • IfModule: The following block would be included if specified module exists.
  • User: Which user apache2 will run as.
  • Group: Which group will have group access to default web files.
  • AccessFileName: The name of the access file (that specifies user names/passwords and other limitations to files/directories).
  • ErrorLog: Where any errors will be written.
  • Include: Include some other files.
  • LogFormat: How to write a log message.
  • ErrorDocument: Files to display for some HTTP errors (500, 404, 402 etc.).
  • Alias: Map a directory URL to some other location on your filesystem. Requires that the Alias module be loaded.

.htaccess Files

You can also specify some Apache configurations without delving into the master configuration file. To do this, put a file named .htaccess in any directory that Apache is serving. All directives in it will be interpreted as if they were in a Directory directive in the master configuration file.

VERY IMPORTANT: The directory containing .htaccess must not have the AllowOverride None directive in the master configuration file in order for .htaccess to be read.

Directory Directive

Use the Directory directive to assign other directives to a specific directory. For example:

<Directory /var/www/>
	Options Indexes FollowSymLinks 
	AllowOverride None
	Order allow,deny
	allow from all
	RedirectMatch ^/$ /apache2-default/
</Directory>

This sets options for the /var/www directory.

  1. The Options directive says that:
    1. If no index page is present in a directory, display a directory index page instead
    2. Apache will follow symbolic links in the directory
  2. AllowOverride None says that .htaccess files cannot alter the Apache options in this directory and all sub-directories
  3. Order allow,deny and Allow from all specifies that anybody is allowed to access this server via HTTP.

Note that this directory is actually the root directory of the web server.

The UserDir Module

The UserDir module lets you access files for any user on the server with a ~, e.g., http://ec2-xxx-xxx-xxx-xx.compute-1.amazonaws.com/~paul/

This module comes installed, but not activated by default.

Enabling UserDir in RHEL

If you are using an RHEL-based distribution for your server (like the Amazon AMI), you need to edit the master Apache configuration file.

Open /etc/httpd/conf.d/userdir.conf in your favorite text editor. For more information on command-line text editors, refer to the Linux guide.

Find the line that says

UserDir disabled

and change it to

UserDir disabled root

Additionally, find the line that says

#UserDir public_html

and uncomment it; that is, remove the # so that you have

UserDir public_html

This tells Apache that the directory containing each user's html files is a subdirectory of their home directory called public_html.

Note: Users need to manually create public_html. It is not created automatically.

You may also need to change the permissions of your user directory and your public_html directory to allow Apache to read and execute inside them. To do this, run the following commands:

$ sudo chmod o+x /home/<myusername>
$ sudo chmod o+rx /home/<myusername>/public_html
$

Finally, restart Apache.

Remapping UserDir

If you want to change the name of the UserDir web server root from public_html to something else like .html, follow these instructions. (You do not need to do this for the purposes of CSE 330; this serves as a reference.)

  1. Rename your public_html to .html
    • The "." in front of the directory name means that the directory is a hidden one. You will not see it with the normal "ls" command. Use "ls -a" to see hidden files as well.
  2. Edit the UserDir configuration file.
    • In RHEL, edit the userDir configuration file at /etc/httpd/conf.d/userdir.conf
  3. Find the line that reads UserDir public_html and change it to UserDir .html/
  4. Restart Apache.

Apache Logs

Apache creates two log files: one for all access attempts to your server, and one for errors. The locations of these log files are:

  • Access Log: /var/log/httpd/access_log
  • Error Log: /var/log/httpd/error_log

You might find it helpful to see access and errors appear "live" in your terminal window as they are created. To do this, you can use the tail -f command. Since the log files have strict permissions, you will need to also use sudo. Example:

$ sudo tail -f /var/log/httpd/access_log

Virtual Hosts

Virtual Hosts are used to run multiple Apache web servers from the same machine. Virtual hosts can listen for connections on different ports and/or different hostnames, serving completely different web sites to each. For example:

<VirtualHost cse330.dyndns.org>
	ServerAdmin webmaster@localhost
	ServerName cse330.dyndns.org
	DocumentRoot /home/www/cse330/
	ErrorLog /var/log/httpd/error_log
	LogLevel warn
	CustomLog /var/log/apache2/access_log combined
	ServerSignature On
</VirtualHost>

This configuration enables any requests that use a host name of cse330.dyndns.org will use /home/www/cse330 as the root document directory. Make sure that the DocumentRoot directory exists and is readable by the httpd process. In RHEL, Apache runs as the apache user.

It is good practice to put raw server configuration files in /etc/httpd/sites-available. To activate a site, create a symlink from the configuration in sites-available to a sibling directory called sites-enabled.

Restarting Apache and Testing

Whenever you make changes to the Apache configuration files, you will need to restart Apache for the changes to take effect. There are several different ways to restart Apache; they all functionally do (almost) the same thing, so choose your favorite:

$ sudo /usr/sbin/apachectl restart
$ sudo apachectl restart    # if /usr/sbin is in your PATH (which it is *not* by default in RHEL)
$ sudo /etc/init.d/httpd restart
$ sudo /sbin/service httpd restart
$ sudo service httpd restart

If you're torn for which version is the "best" to use, the commands involving apachectl (think "Apache Control") are written by the Apache folks themselves. This has a couple advantages:

  • They show you errors in the startup process (if there are any)
  • They give you the option to perform a "soft" restart; that is, a restart that allows any pending connections to complete. To perform a soft restart, use graceful: sudo apachectl graceful

In my anecdotal experience, the apachectl commands are also faster than the service or init.d commands.

To make sure everything is working, create a test file in your home directory under public_html, and then point your browser to it: http://ec2-xxx-xx-xx-xxx.compute-1.amazonaws.com/~yourUserName/hello.txt