Linux is an open-source operating system based on UNIX. Linux is highly versatile and is used in a wide range of applications. Desktop Linux is Linux with a GUI (like Microsoft Windows or Mac OS X); Desktop Linux is popular in niche markets, and it is used widely in developing countries.
Linux is the most widely used operating system for web servers. In CSE330, we will be interacting with Linux from the command line. This article covers the tools you need to make the best use of Linux.
- 1 Linux Distributions
- 2 Files and Permissions
- 3 Bash
- 3.1 Displaying a Value
- 3.2 Working Directory
- 3.3 Variables
- 3.4 Running Programs
- 3.5 Shell Scripting
- 4 Networking
- 5 Synchronizing Date and Time
- 6 Installing Software
- 7 Command Reference
- 8 Linux Resources
The open-source community is responsible for the development of many different distributions of Linux. Distributions, or distros, are different "flavors" of the Linux operating system with different objectives.
The sections below discuss two of these branches and give recommendations of Linux distributions suitable for a web server. The CSE330 wiki provides instructions for both Debian-based and RHEL-based Linux distributions, so the choice of which to use is up to you.
Debian was first introduced in 1993. Debian has a passionate following, and its repositories contain more software packages than any other mainstream Linux distribution. There are hundreds of derivatives of Debian. Debian's most popular desktop derivative is Ubuntu.
Base Debian has a slow upgrade schedule, and because of this, it is an extremely stable operating system, making it well-suited for servers. Ubuntu Server is also an excellent choice for a Debian-based web server.
Red Hat Enterprise Linux (RHEL)
Red Hat Enterprise Linux, or RHEL, was first introduced in 1994. RHEL is known for being a good choice for enterprises that wish to use LInux as their primary OS. RHEL has an abundance of administration tools.
The Amazon EC2 Linux AMI is a distribution of Linux based on RHEL. The Linux Lab in Lopata Hall uses Fedora Linux, a desktop distribution also based on RHEL. CentOS is a popular RHEL derivative that is widely used in web servers.
Special Note: Linux Kernel and Modules
What separates Linux from other Unix variants is its kernel. The kernel is the most important component of the operating system and is responsible for scheduling processes, providing access to the hardware devices, allocating memory to the programs, and so on.
The Linux kernel uses both monolithic and modular approaches. A monolithic kernel is a single program that contains all the code so any addition to kernel (such as code to access a driver) requires recompiling the code. A monolithic kernel is usually a little faster and could have a smaller size since only the absolutely necessary code is there. The modular kernel, on the other hand, enables dynamic loading and unloading of kernel code, called modules. Typical modules include device drivers. Thanks to this modular approach, Linux seldom requires a reboot after installing a new device.
Files and Permissions
At the core of a Unix-based operating system is a directory structure with files and permissions.
The root directory of Linux contains a dozen or so subdirectories, each with a specific purpose:
- /bin contains binaries used by all users
- /sbin contains system binaries typically used only by the system administrator
- /lib contains libraries for the binaries found in /bin and /sbin
- /etc contains configuration files
- /etc/yum.conf Configuration file for yum
- /etc/yum/yum.repos.d Directory containing .repo files for online repositories
- /etc/crontab System-wide crontab file
- /etc/fstab Information about default partitions to be mounted
- /etc/group List of groups in the system
- /etc/hosts List of IP addresses with their names
- /etc/inittab What to do at each run-level
- /etc/inetd.conf Configuration file for some internet services (replaced by xinetd.* in most systems)
- /etc/modules.conf Module information for the boot
- /etc/motd Message to be seen at the login prompt
- /etc/passwd User information
- /etc/profile System level initial file for sh and its derivatives
- /etc/shadow User passwords
- /dev contains device files
- /proc contains information on currently running processes
- /var contains files whose contents is expected to change
- /var/log contains system log files
- /var/log/messages System/Kernel messages
- /var/log/syslog System log (mostly for Daemons)
- /var/log/wtmp' User access log (binary)
- /var/log/dmesg Boot-up messages
- /var/log/auth.log Authorization logs
- /var/lib contains packages and database files
- /var/spool contains print queues
- /var/log contains system log files
- /tmp contains temporary files that are deleted at system reboot
- /usr contains user programs
- /usr/bin contains binaries for user programs
- /usr/sbin contains binaries for system administrators
- /usr/lib contains libraries for /usr/bin and /usr/sbin
- /usr/local contains programs that you install from source
- /home contains users' home directories
- /root is root's home directory
- /boot contains boot loader files (do not touch unless you know what you are doing!)
- /opt contains optional add-on applications
- /mnt is where system administrators can mount filesystems
- /media contains links to removable media devices (for example, CDs)
- /srv contains site-specific data which are served by the system
For more information, see the Wikipedia article on the Filesystem Hierarchy Standard.
Every file in Linux has permissions that define which users can Read, Write, and Execute it. Every file has an owner and a group. The permissions for a file are set on three levels: User (owner), Group, and Other.
When you view the permissions of a file in Linux, they will most often be displayed in symbolic notation. Symbolic notation consists of 10 characters: the first defines the file type, and then there are three characters each for User, Group, and Other permissions.
-r--r--r--is a normal file that is readable by all users but writable or executable by no one.
-rwxr-xr-xis a normal file that is readable and executable by everyone but only writable by User (the file's owner). This is the most common permission set.
Viewing File Permissions
To view the permissions of all files in a certain directory, run the binary
ls -l in Bash:
$ ls -l # displays a list of all files in a directory with their permissions in symbolic notation total 16 lrwxr-xr-x 1 sffc wheel 6 Aug 9 09:13 link -> myfile.txt -rwxr--r-- 1 sffc wheel 12 Aug 9 09:13 myfile.txt $ ls -l myfile.txt # displays the permissions of only myfile.txt -rwxr--r-- 1 sffc wheel 12 Aug 9 09:13 myfile.txt $
Setting File Permissions
Linux comes with several useful binaries for setting file permissions.
- chmod is used for setting permissions
- chown is used for setting a file's owner
- chgrp is used for setting a file's group
Some examples are shown below.
$ chmod a+x myfile.txt # turns on the Execute option for all users $ chmod o-w myfile.txt # turns off the Write option for Other users $ chmod u+wx-r myfile.txt # turns on the Write and Execute options for User (the file's owner) and also turns off the Read option for User $ chown todd myfile.txt # sets the owner of myfile.txt to the user todd. Note: First comes the user, then comes the filename: not the other way around! $ chgrp staff myfile.txt # sets the group of myfile.txt to usergroup staff $
For more information, see http://www.tuxfiles.org/linuxhelp/filepermissions.html
The . and .. Directories
The . directory is a reference to the current directory. The .. directory brings you one level up in the filesystem.
A symbolic link, or symlink, is basically a link from one spot in the filesystem to another. You can think of them like aliases in Mac OS X. To create a a symlink, use the ln -s command:
$ ln -s /path/to/file.txt /path/to/link # creates a symlink to file.txt at /path/to/link # Example: $ ln -s /home/todd/instructions.doc /var/www/public_html/classes/instructions.doc # creates a symlink in the web server to instructions.doc $ vi /var/www/public_html/classes/instructions.doc # changes to the symbolic link will be reflected in the original file $
Bash is the default shell environment in Linux; that is, it is the interface in which you will be interacting with your Linux server. Bash is a derivative of sh, one of the first shells. Other popular shells include csh and tcsh, shells with c-like syntax for scripting, and zsh a bash-like shell which focuses on extending the capabilities of the shell environment.
Displaying a Value
To display a value at the shell prompt, use the command echo.
$ echo "Hello World" # displays Hello World Hello World $
Note: In examples, code written at the prompt is conventionally denoted by a line starting with a currency symbol. Lines without a currency symbol represent output.
Seeing the contents of a file
If you want to see the contents of a file, use the cat command.
$ cat myfile.txt Hello World $
cat is one of a number of useful Linux command-line binaries, the rest of which we will see later.
Whenever you are interacting with the shell, you will be executing commands from a working directory. To see the current working directory, run the command pwd (path to working directory). To change the working directory, run the command cd (change directory).
$ pwd /home/todd $ cd projects $ pwd /home/todd/projects $ cd ./ # recall that . is the current directory $ pwd /home/todd/projects $ cd ../ # recall that .. is the next directory up in the filesystem $ pwd /home/todd $
If you run commands that interact with the filesystem (e.g. ones that create or edit files), they will be saved in your current working directory.
Bash supports the use of variables. There are system-defined variables, and you can also define your own custom variables.
Defining and Accessing Variables
$ MYVARIABLE="Hello World" # assigns the value Hello World to the variable MYVARIABLE $ echo $MYVARIABLE # notice that you need to put a currency symbol in front of the variable in order to access its value Hello World $ export $MYVARIABLE # allows MYVARIABLE to be accessed in child processes (e.g., in a program you call from the shell) $ export MYVARIABLE="Hello Moon" # a shortcut for defining a variable and exporting it to subprocesses $ set # displays a list of all currently set variables MYVARIABLE=Hello World $
Bash comes pre-loaded with certain environment variables. Some of the variables with which you may find yourself interacting include:
- PATH: search path for the commands
- PWD: name of the current directory
- SHELL: type of shell
- TERM: type of the terminal
- USER: the account name
- HOME: the user's home directory
- PS1: the prompt at command line
- $$: the process id of current shell
- $RANDOM: a random value
- $?: the return value of the last command
- $_: the last argument of the previous command
- $#: where # is a number, the value of the #th argument
- IFS: input field separator
Try echoing some of the system variables to examine your current environment.
To run an executable file, simply enter its filename into the shell prompt:
$ /usr/bin/perl -v # runs the binary executable located at /usr/bin/perl with the flag -v This is perl 5, version 12, subversion 3 (v5.12.3) $ ../mydir/myprogram # runs the binary located one level up in the file system, then in mydir/myprogram You just ran myprogram! $
Programs in your PATH
Many commonly-used executable binaries are located in /bin, /usr/bin, and similar directories. In order to avoid typing paths to these directories every time you want to execute a command, you define these directories in your PATH system variable:
$ echo $PATH # displays the current value of the PATH variable /opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin $ PATH=$PATH:/my/favorite/bin # adds a directory to your PATH variable $ echo $PATH /opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/my/favorite/bin $
Notice that the different PATH directories are separated by colons. Now, when you execute a command, Bash will scan all of the directories in your PATH variable. To see the path to the binary that Bash found, use the which command.
$ perl -v This is perl 5, version 12, subversion 3 (v5.12.3) $ which perl /usr/bin/perl $
Note: it is unwise to have . in your PATH. Instead, if you want to run an executable in the current directory, do so by calling ./myprogram:
$ ./myprogram You just ran myprogram! $ myprogram -bash: myprogram: command not found $
Foreground and Background Processes
A program runs in the foreground (unless it detaches itself from the terminal) by default. You can run a program in the background by adding & at the end of the command (after arguments). In this case, the shell would fork a process for that program and enable the command prompt back for input. At any time, jobs command can be used to see the processes running at the background. fg command brings the specified process back to foreground. A program running in the background can be stopped by typing ctrl-c in most cases. Typing ctrl-z interrupts a program running in the foreground. If a program is interrupted, it will not continue executing until it is resumed. An interrupted program can be brought back to foreground by fg, or it could be send to background by bg.
$ ./myprogram You just ran myprogram! I'm taking a long time to run. ^C $ jobs $ ./myprogram You just ran myprogram! I'm taking a long time to run. ^Z + Stopped ./myprogram $ jobs + Stopped ./myprogram $ bg + ./myprogram & $ jobs + Running ./myprogram & $ fg ^C $ jobs $ ./myprogram &  64741 $ jobs + Running ./myprogram & $
A process can be killed by using the kill command:
In some cases the kill signal can be ignored, so it may be necessary to force kill the program by sending an absolute KILL signal:
kill -9 process-number
The current processes can be listed using the ps command.
$ ps # list currently running processes in the current shell PID TTY TIME CMD 19107 ttys000 0:00.75 -bash 1873 ttys001 0:00.05 -bash 57267 ttys002 0:00.20 -bash 50721 ttys003 0:00.55 -bash $ ps -eaf # list all currently running processes UID PID PPID C STIME TTY TIME CMD 0 1 0 0 31Dec00 ?? 3:24.45 /sbin/launchd 0 19106 327 0 1Aug12 ttys000 0:00.03 login -pfl sffc /bin/bash -c exec -la bash /bin/bash 501 19107 19106 0 1Aug12 ttys000 0:00.75 -bash 0 1872 327 0 31Jul12 ttys001 0:00.02 login -pfl sffc /bin/bash -c exec -la bash /bin/bash 501 1873 1872 0 31Jul12 ttys001 0:00.05 -bash 0 57266 327 0 Mon05AM ttys002 0:00.08 login -pfl sffc /bin/bash -c exec -la bash /bin/bash 501 57267 57266 0 Mon05AM ttys002 0:00.20 -bash 0 64747 57267 0 9:58AM ttys002 0:00.00 ps -eaf 0 50720 327 0 Fri12AM ttys003 0:00.03 login -pfl sffc /bin/bash -c exec -la bash /bin/bash 501 50721 50720 0 Fri12AM ttys003 0:00.55 -bash $
A program's standard output can be send to a file by typing >filename at the end. Similarly, >> appends to a file. In Linux, there are three default file handlers, standard input or STDIN, standard output or STDOUT, and standard error or STDERR. STDOUT has a file handler number 1 and STDERR has a number of 2. In bash, you can direct either of these handlers to a file. You can also redirect one file handler to another.
$ ./myprogram >filename.txt # redirects all output to filename.txt $ cat filename.txt You just ran myprogram! $ ./myprogram >>filename.txt # appends the output to filename.txt $ cat filename.txt You just ran myprogram! You just ran myprogram! $ ./myprogram 1>filename.txt # redirects the standard output to filename.txt $ cat filename.txt You just ran myprogram! $ ./myprogram 2>filename.txt #redirects the error output to filename.txt You just ran myprogram! $ ./myprogram 2>&1 # STDERR is redirected to STDOUT You just ran myprogram! $
Output of one program can be redirected to the input of another program using pipes.
$ ./program1 | ./program2 # send program1's output as an input to program2 You just ran program2 with the input: You just ran program1! $
Redirection is possible for STDIN too. A program can get its input by redirecting STDIN using <
$ ./myprogram < inputfile.txt You just ran myprogram with input from inputfile.txt! $
Finally, ` (a backtick) can be used to capture the output of a program, and use it as a string such as in setting a variable
$ MYVARIABLE=`./myprogram` $ echo $MYVARIABLE You just ran myprogram!
Some commands require root privileges to run. In order to run a command as root without logging in as root, use sudo.
$ yum install lynx You need to be root to perform this command. $ sudo yum install lynx [sudo] password: ..... Complete! $
Automatically Running Programs
You will often find it useful for binaries to be executed at predefined intervals, certain days of the week, or at startup. Linux provides you with the tools you need to make these configurations.
Scheduled Programs in Cron
Cron is a system service that will run programs in a periodic manner. For more details on how to configure cron, see the Cron guide.
Programs at Startup
When a Linux system boots there are a series of scripts that are called to start up system processes, daemons, and other programs (such as SSH servers, web servers, database programs, etc). The simplest way to add something to the boot process is to add it to /etc/rc.local, which is a script that is called automatically at the very end of the boot process. Simply write a script that does what you want and then call it from with in /etc/rc.local to ensure that your script is called at the end of the boot process.
You can also add scripts which run at different times during the boot process. The way to do this varies by Linux distribution. For Fedora, see http://www.yolinux.com/TUTORIALS/LinuxTutorialInitProcess.html (specifically the section entitled Init Script Activation).
Programs can be scripted using Bash. For more information, see Shell Scripting.
In Linux, you can see your network information by typing ifconfig. This command shows the status information of each network interface, including the IP address you will need to remotely connect to your instance. The interface lo is the special loopback interface with IP address 127.0.0.1. This refers to your local machine and any connection from your machine to your machine goes through this pseudo-interface. Typical network interfaces include eth0, eth1,..., wlan0, etc. Ethernet cards are represented with ethX. In the past, most wireless cards showed up as wlanX, but it is also common now for them to be represented with ethX names. ifconfig also gives information such as hardware address (MAC), netmask, and broadcast addresses.
You can start or stop networking by calling /etc/init.d/networking script. As with most /etc/init.d scripts, this script takes several options, such as start, stop, restart. Note even if you stop networking, you would still have your lo interface. You can look at the code of the script to find out what it actually does. You can also stop or start individual interfaces by using the ifup and ifdown commands.
The network configuration files are stored in /etc/network. /etc/network/interfaces contains the defaults for each interface. For xample, you could specify static IP, netmask, network, broadcast and default gateway for an interface here, but you should not need to edit this files in general. These default options can be changed with the ifconfig command. The /etc/network/if-down.d and /etc/network/if-up.d directories contain the scripts that are going to be executed when an interface is turned on or off. Of course, most modern Linux distributions have GUI tools for doing network configuration more easily, and you shouldn't need to change anything for the purposes of this course.
Synchronizing Date and Time
In order to avoid setting your system's time manually at every daylight savings change, you can use a Network Time Server via the Network Time Protocol (NTP).
The NTP Daemon comes pre-installed on EC2 AMI instances. To install it on Debian, install the ntp package from aptitude.
Your server is probably not set to the correct timezone by default.
The timezone files are in the directory /usr/share/zoneinfo. They are further organized within subdirectories grouped by region. For instance, Rome's time zone file is stored within /usr/share/zoneinfo/Europe.
In order to set the time zone, simply copy the desired time zone file to our /etc directory as a new file named "localtime". For example, to set the the machine's system time to Rome's time zone, we would enter the command
sudo cp /usr/share/zoneinfo/Europe /etc/localtime
ntp uses /etc/ntp.conf configuration file to find the ostnames of remote time servers. The defaults here are probably fine.
The package management tool in Red Hat Enterprise Linux (and therefore also your Amazon EC2 instance) is rpm. (In Debian, it is dpkg.) If you have an rpm package, you can install it by
$ rpm -i somepackage.rpm $
This requires that somepackage.rpm be in your current directory, which means you will have to download the file yourself (or create it). It requires you to manually install any dependencies the package has.
Repository-Based Package Managers
A better alternative is to use a repository-based package manager. In RHEL, this is yum; in Debian, it is aptitude (aka apt-get).
Before you install new software, you need to ensure that your local list of available packages is up-to-date. Run one of the following commands to perform this operation:
- In RHEL: yum check-update
- In Debian: apt-get update
After you have ensured that your package list is synced with the remote repository, you can start installing packages. To install a package, use one of the following commands:
- In RHEL: yum install package-name
- In Debian: apt-get install package-name
For example, this is how you would install lynx, a command-line web browser, in your RHEL Linux distribution:
$ lynx --version # is lynx installed? -bash: lynx: command not found $ sudo yum check-update # sync package lists with the remote repositories $ sudo yum install lynx # install the lynx package Downloading Packages: lynx-2.8.6-27.6.amzn1.i686.rpm | 1.8 MB 00:00 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Installing : lynx-2.8.6-27.6.amzn1.i686 1/1 Installed: lynx.i686 0:2.8.6-27.6.amzn1 Complete! $ lynx --version # test again to see if we have lynx installed Lynx Version 2.8.6rel.5 (09 May 2007) $
You can also search for available packages by name or by the name of a file that they install.
$ yum search lynx # search for packages whose name contains lynx ======= N/S Matched: lynx ======= lynx.i686 : A text-based Web browser $ yum provides lynx # search for packages that install a file or command named lynx lynx-2.8.6-27.6.amzn1.i686 : A text-based Web browser Repo : installed Matched from: Other : Provides-match: lynx $
The list of repositories that yum searches is located at
/etc/yum.conf. The list of repositories that aptitude searches is located at
Earlier, you saw that cat is a command that shows the contents of a file. Below is a list of other useful commands in Linux.
- ls List file(s) in current working directory
- ll Shortcut to ls -l. List files with more details than ls. Only available in certain distributions
- cd Change working directory. Note: cd called without any arguments moves you to your home directory
- cp Copy a file
- mv Move or rename a file
- rm Remove a file
- rm -r Remove a directory and all files in it
- ln -s Create a symlink to a file
- mkdir Create a directory
- rmdir Remove a directory (directory must be empty; if it's not, use rm -r)
- cat Display the contents of a file
- less Display the contents of a file, wait for the user at each page
- tail Display the last 20 lines of a file
- tail -f Display the last 20 lines of a file and then wait for changes, displaying them as they occur. Useful for monitoring log files.
- chown Change the owner of a file
- chgrp Change the group of a file
- chmod Change the security permissions of a file
- grep Display the lines of a file matching a user specified string
- diff Display the difference between two files
- df Display free diskspace
- du Display disk usage
- free Display memory usage information
- date Display current time and date
- top Display the CPU and Memory usages of current processes
- ps Display current processes
- kill Terminate a running process
- killall Terminate the running process matching user specified criterias
- ping hostname Ping a host
- host Get the IP address of a host
- passwd Change the user password
- su user Switch to the privileges of another user
- shutdown Power off the computer
- reboot Reboot the computer
- clear Clear the terminal
- ifconfig Display/Configure a network device
- file Show the file type
- lsmod Display loaded kernel modules
- insmod Install a kernel module
- modprobe Load a kernel module (also load the dependencies)
- adduser Add a new user
- exit Exit from a shell
- lpr Print a file
- head Display lines at the beginning of a file
- tail Display lines at the end of a file
- pwd Display the name of the current working directory
- lsof Open files in the system
- netstat Statistics related to open sockets
It is sometimes convenient to edit files using the command line. Three widely-used command line text editors are vi, emacs, and nano.
To edit a file using Vi, use the command vi. You will see something like this:
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ "myfile.txt" [New File]
To insert text into the file, press i once, then type away:
Hello World ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ -- INSERT --
To leave insert mode, press ESC.
To save your file, type the command :w and press Enter.
Hello World ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ "myfile.txt" [New] 1L, 12C written
To save and then close your file, type the command :wq. (To close the file without saving, use the command :q!)
For more vi commands, see http://ss64.com/vi.html
To edit a file using emacs, use the command emacs. You can start typing immediately:
Hello World -uuu:**-F1 myfile.txt All L1 (Text)----------
To save a file, type C-x C-s (that means Control-X, then Control-S):
Hello World -uuu:---F1 myfile.txt All L1 (Text)---------- Wrote /home/todd/myfile.txt
To quit emacs, type C-x C-c. (It will ask you whether or not to save the file if you've made changes.)
For more emacs commands, see http://souptonuts.sourceforge.net/chirico/emacs_ref.html
To edit a file using nano, use the command nano. You can start editing the file immediately:
GNU nano 2.0.6 File: myfile.txt Hello World ^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos ^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell
Nano tells you the commands you need right there so you don't have to always keep looking them up like with Vi and Emacs.
For more detail on Nano commands, see http://www.nano-editor.org/dist/v2.2/nano.html