Linux

From CSE330 Wiki
Revision as of 04:50, 10 August 2012 by Shane (talk | contribs)
Jump to navigationJump to search

Linux is an open-source operating system based on UNIX. Linux is highly versatile and is used in a wide range of applications. Desktop Linux is Linux with a GUI (like Microsoft Windows or Mac OS X); Desktop Linux is popular in niche markets, and it is used widely in developing countries.

Linux is the most widely used operating system for web servers. In CSE330, we will be interacting with Linux from the command line. This article covers the tools you need to make the best use of Linux.

Linux Distributions

The open-source community is responsible for the development of many different distributions of Linux. Distributions, or distros, are different "flavors" of the Linux operating system with different objectives.

There are hundreds of distributions of Linux. Three of the main banches are Debian, SuSE (based on Slackware), and Red Hat Enterprise Linux (RHEL). The Linux Lab in Lopata Hall uses Fedora Linux, a desktop distribution based on RHEL.

The Amazon EC2 Linux AMI is a distribution of Linux based on RHEL. The CSE330 wiki assumes that you have an instance running the Amazon EC2 Linux AMI, although you may use any distribution that you want.

Special Note: Linux Kernel and Modules

What separates Linux from other Unix variants is its kernel. The kernel is the most important component of the operating system and is responsible for scheduling processes, providing access to the hardware devices, allocating memory to the programs, and so on.

The Linux kernel uses both monolithic and modular approaches. A monolithic kernel is a single program that contains all the code so any addition to kernel (such as code to access a driver) requires recompiling the code. A monolithic kernel is usually a little faster and could have a smaller size since only the absolutely necessary code is there. The modular kernel, on the other hand, enables dynamic loading and unloading of kernel code, called modules. Typical modules include device drivers. Thanks to this modular approach, Linux seldom requires a reboot after installing a new device.

Files and Permissions

At the core of a Unix-based operating system is a directory structure with files and permissions.

Filesystem Hierarchy

The root directory of Linux contains a dozen or so subdirectories, each with a specific purpose:

  • /bin contains binaries used by all users
  • /sbin contains system binaries typically used only by the system administrator
  • /lib contains libraries for the binaries found in /bin and /sbin
  • /etc contains configuration files
    • /etc/yum.conf Configuration file for yum
    • /etc/yum/yum.repos.d Directory containing .repo files for online repositories
    • /etc/crontab System-wide crontab file
    • /etc/fstab Information about default partitions to be mounted
    • /etc/group List of groups in the system
    • /etc/hosts List of IP addresses with their names
    • /etc/inittab What to do at each run-level
    • /etc/inetd.conf Configuration file for some internet services (replaced by xinetd.* in most systems)
    • /etc/modules.conf Module information for the boot
    • /etc/motd Message to be seen at the login prompt
    • /etc/passwd User information
    • /etc/profile System level initial file for sh and its derivatives
    • /etc/shadow User passwords
  • /dev contains device files
  • /proc contains information on currently running processes
  • /var contains files whose contents is expected to change
    • /var/log contains system log files
      • /var/log/messages System/Kernel messages
      • /var/log/syslog System log (mostly for Daemons)
      • /var/log/wtmp' User access log (binary)
      • /var/log/dmesg Boot-up messages
      • /var/log/auth.log Authorization logs
    • /var/lib contains packages and database files
    • /var/spool contains print queues
  • /tmp contains temporary files that are deleted at system reboot
  • /usr contains user programs
    • /usr/bin contains binaries for user programs
    • /usr/sbin contains binaries for system administrators
    • /usr/lib contains libraries for /usr/bin and /usr/sbin
    • /usr/local contains programs that you install from source
  • /home contains users' home directories
  • /root is root's home directory
  • /boot contains boot loader files (do not touch unless you know what you are doing!)
  • /opt contains optional add-on applications
  • /mnt is where system administrators can mount filesystems
  • /media contains links to removable media devices (for example, CDs)
  • /srv contains site-specific data which are served by the system


For more information, see the Wikipedia article on the Filesystem Hierarchy Standard.

File Permissions

Every file in Linux has permissions that define which users can Read, Write, and Execute it. Every file has an owner and a group. The permissions for a file are set on three levels: User (owner), Group, and Other.

Symbolic Notation

When you view the permissions of a file in Linux, they will most often be displayed in symbolic notation. Symbolic notation consists of 10 characters: the first defines the file type, and then there are three characters each for User, Group, and Other permissions.

  • -r--r--r-- is a normal file that is readable by all users but writable or executable by no one.
  • -rwxr-xr-x is a normal file that is readable and executable by everyone but only writable by User (the file's owner). This is the most common permission set.

Viewing File Permissions

To view the permissions of all files in a certain directory, run the binary ls -l in Bash:

$ ls -l   # displays a list of all files in a directory with their permissions in symbolic notation
total 16
lrwxr-xr-x  1 sffc  wheel   6 Aug  9 09:13 link -> myfile.txt
-rwxr--r--  1 sffc  wheel  12 Aug  9 09:13 myfile.txt
$ ls -l myfile.txt   # displays the permissions of only myfile.txt
-rwxr--r--  1 sffc  wheel  12 Aug  9 09:13 myfile.txt
$

Setting File Permissions

Linux comes with several useful binaries for setting file permissions.

  • chmod is used for setting permissions
  • chown is used for setting a file's owner
  • chgrp is used for setting a file's group

Some examples are shown below.

$ chmod a+x myfile.txt   # turns on the Execute option for all users
$ chmod o-w myfile.txt   # turns off the Write option for Other users
$ chmod u+wx-r myfile.txt   # turns on the Write and Execute options for User (the file's owner) and also turns off the Read option for User
$ chown todd myfile.txt   # sets the owner of myfile.txt to the user todd.  Note: First comes the user, then comes the filename: not the other way around!
$ chgrp staff myfile.txt   # sets the group of myfile.txt to usergroup staff
$

For more information, see http://www.tuxfiles.org/linuxhelp/filepermissions.html

The . and .. Directories

The . directory is a reference to the current directory. The .. directory brings you one level up in the filesystem.

Symbolic Links

A symbolic link, or symlink, is basically a link from one spot in the filesystem to another. You can think of them like aliases in Mac OS X. To create a a symlink, use the ln -s command:

$ ln -s /path/to/file.txt /path/to/link   # creates a symlink to file.txt at /path/to/link

# Example:
$ ln -s /home/todd/instructions.doc /var/www/public_html/classes/instructions.doc   # creates a symlink in the web server to instructions.doc
$ vi /var/www/public_html/classes/instructions.doc   # changes to the symbolic link will be reflected in the original file
$

Bash

Bash is the default shell environment in Linux; that is, it is the interface in which you will be interacting with your Linux server. Bash is a derivative of sh, one of the first shells. Other popular shells include csh and tcsh, shells with c-like syntax for scripting, and zsh a bash-like shell which focuses on extending the capabilities of the shell environment.

Displaying a Value

To display a value at the shell prompt, use the command echo.

$ echo "Hello World" # displays Hello World
Hello World
$

Note: In examples, code written at the prompt is conventionally denoted by a line starting with a currency symbol. Lines without a currency symbol represent output.

Seeing the contents of a file

If you want to see the contents of a file, use the cat command.

$ cat myfile.txt
Hello World
$

cat is one of a number of useful Linux command-line binaries, the rest of which we will see later.

Working Directory

Whenever you are interacting with the shell, you will be executing commands from a working directory. To see the current working directory, run the command pwd (path to working directory). To change the working directory, run the command cd (change directory).

$ pwd
/home/todd
$ cd projects
$ pwd
/home/todd/projects
$ cd ./   # recall that . is the current directory
$ pwd
/home/todd/projects
$ cd ../   # recall that .. is the next directory up in the filesystem
$ pwd
/home/todd
$

If you run commands that interact with the filesystem (e.g. ones that create or edit files), they will be saved in your current working directory.

Variables

Bash supports the use of variables. There are system-defined variables, and you can also define your own custom variables.

Defining and Accessing Variables

$ MYVARIABLE="Hello World"    # assigns the value Hello World to the variable MYVARIABLE
$ echo $MYVARIABLE     # notice that you need to put a currency symbol in front of the variable in order to access its value
Hello World
$ export $MYVARIABLE     # allows MYVARIABLE to be accessed in child processes (e.g., in a program you call from the shell)
$ export MYVARIABLE="Hello Moon"     # a shortcut for defining a variable and exporting it to subprocesses
$ set     # displays a list of all currently set variables
MYVARIABLE=Hello World
$

System Variables

Bash comes pre-loaded with certain environment variables. Some of the variables with which you may find yourself interacting include:

  • PATH: search path for the commands
  • PWD: name of the current directory
  • SHELL: type of shell
  • TERM: type of the terminal
  • USER: the account name
  • HOME: the user's home directory
  • PS1: the prompt at command line
  • $$: the process id of current shell
  • $RANDOM: a random value
  • $?: the return value of the last command
  • $_: the last argument of the previous command
  • $#: where # is a number, the value of the #th argument
  • IFS: input field separator

Try echoing some of the system variables to examine your current environment.

Running Programs

To run an executable file, simply enter its filename into the shell prompt:

$ /usr/bin/perl -v   # runs the binary executable located at /usr/bin/perl with the flag -v
This is perl 5, version 12, subversion 3 (v5.12.3)
$ ../mydir/myprogram   # runs the binary located one level up in the file system, then in mydir/myprogram
You just ran myprogram!
$

Programs in your PATH

Many commonly-used executable binaries are located in /bin, /usr/bin, and similar directories. In order to avoid typing paths to these directories every time you want to execute a command, you define these directories in your PATH system variable:

$ echo $PATH   # displays the current value of the PATH variable
/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
$ PATH=$PATH:/my/favorite/bin   # adds a directory to your PATH variable
$ echo $PATH
/opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/my/favorite/bin
$

Notice that the different PATH directories are separated by colons. Now, when you execute a command, Bash will scan all of the directories in your PATH variable. To see the path to the binary that Bash found, use the which command.

$ perl -v
This is perl 5, version 12, subversion 3 (v5.12.3)
$ which perl
/usr/bin/perl
$

Note: it is unwise to have . in your PATH. Instead, if you want to run an executable in the current directory, do so by calling ./myprogram:

$ ./myprogram
You just ran myprogram!
$ myprogram
-bash: myprogram: command not found
$

Foreground and Background Processes

A program runs in the foreground (unless it detaches itself from the terminal) by default. You can run a program in the background by adding & at the end of the command (after arguments). In this case, the shell would fork a process for that program and enable the command prompt back for input. At any time, jobs command can be used to see the processes running at the background. fg command brings the specified process back to foreground. A program running in the background can be stopped by typing ctrl-c in most cases. Typing ctrl-z interrupts a program running in the foreground. If a program is interrupted, it will not continue executing until it is resumed. An interrupted program can be brought back to foreground by fg, or it could be send to background by bg.

$ ./myprogram
You just ran myprogram!
I'm taking a long time to run.
^C
$ jobs
$ ./myprogram
You just ran myprogram!
I'm taking a long time to run.
^Z
[1]+  Stopped                 ./myprogram
$ jobs
[1]+  Stopped                 ./myprogram
$ bg
[1]+ ./myprogram &
$ jobs
[1]+  Running                 ./myprogram &
$ fg
^C
$ jobs
$ ./myprogram &
[1] 64741
$ jobs
[1]+  Running                 ./myprogram &
$

Killing Processes

A process can be killed by using the kill command: kill process-number

In some cases the kill signal can be ignored, so it may be necessary to force kill the program by sending an absolute KILL signal: kill -9 process-number

The current processes can be listed using the ps command.

$ ps   # list currently running processes in the current shell
  PID TTY           TIME CMD
19107 ttys000    0:00.75 -bash
 1873 ttys001    0:00.05 -bash
57267 ttys002    0:00.20 -bash
50721 ttys003    0:00.55 -bash
$ ps -eaf   # list all currently running processes
  UID   PID  PPID   C STIME   TTY           TIME CMD
    0     1     0   0 31Dec00 ??         3:24.45 /sbin/launchd
    0 19106   327   0  1Aug12 ttys000    0:00.03 login -pfl sffc /bin/bash -c exec -la bash /bin/bash
  501 19107 19106   0  1Aug12 ttys000    0:00.75 -bash
    0  1872   327   0 31Jul12 ttys001    0:00.02 login -pfl sffc /bin/bash -c exec -la bash /bin/bash
  501  1873  1872   0 31Jul12 ttys001    0:00.05 -bash
    0 57266   327   0 Mon05AM ttys002    0:00.08 login -pfl sffc /bin/bash -c exec -la bash /bin/bash
  501 57267 57266   0 Mon05AM ttys002    0:00.20 -bash
    0 64747 57267   0  9:58AM ttys002    0:00.00 ps -eaf
    0 50720   327   0 Fri12AM ttys003    0:00.03 login -pfl sffc /bin/bash -c exec -la bash /bin/bash
  501 50721 50720   0 Fri12AM ttys003    0:00.55 -bash
$

Directing Output

A program's standard output can be send to a file by typing >filename at the end. Similarly, >> appends to a file. In Linux, there are three default file handlers, standard input or STDIN, standard output or STDOUT, and standard error or STDERR. STDOUT has a file handler number 1 and STDERR has a number of 2. In bash, you can direct either of these handlers to a file. You can also redirect one file handler to another.

$ ./myprogram >filename.txt   # redirects all output to filename.txt
$ cat filename.txt
You just ran myprogram!
$ ./myprogram >>filename.txt   # appends the output to filename.txt
$ cat filename.txt
You just ran myprogram!
You just ran myprogram!
$ ./myprogram 1>filename.txt   # redirects the standard output to filename.txt
$ cat filename.txt
You just ran myprogram!
$ ./myprogram 2>filename.txt   #redirects the error output to filename.txt
You just ran myprogram!
$ ./myprogram 2>&1   # STDERR is redirected to STDOUT
You just ran myprogram!
$

Output of one program can be redirected to the input of another program using pipes.

$ ./program1 | ./program2   # send program1's output as an input to program2
You just ran program2 with the input: You just ran program1!
$

Redirection is possible for STDIN too. A program can get its input by redirecting STDIN using <

$ ./myprogram < inputfile.txt
You just ran myprogram with input from inputfile.txt!
$

Finally, ` (a backtick) can be used to capture the output of a program, and use it as a string such as in setting a variable

$ MYVARIABLE=`./myprogram`
$ echo $MYVARIABLE
You just ran myprogram!

SUDO

Some commands require root privileges to run. In order to run a command as root without logging in as root, use sudo.

$ yum install lynx
You need to be root to perform this command.
$ sudo yum install lynx
[sudo] password: 
.....
Complete!
$

Automatically Running Programs

You will often find it useful for binaries to be executed at predefined intervals, certain days of the week, or at startup. Linux provides you with the tools you need to make these configurations.

Scheduled Programs in Cron

Cron is a system service that will run programs in a periodic manner. For more details on how to configure cron, see the Cron guide.

Programs at Startup

When a Linux system boots there are a series of scripts that are called to start up system processes, daemons, and other programs (such as SSH servers, web servers, database programs, etc). The simplest way to add something to the boot process is to add it to /etc/rc.local, which is a script that is called automatically at the very end of the boot process. Simply write a script that does what you want and then call it from with in /etc/rc.local to ensure that your script is called at the end of the boot process.

You can also add scripts which run at different times during the boot process. The way to do this varies by Linux distribution. For Fedora, see http://www.yolinux.com/TUTORIALS/LinuxTutorialInitProcess.html (specifically the section entitled Init Script Activation).

Shell Scripting

Programs can be scripted using Bash. For more information, see Shell Scripting.

Networking

In Linux, you can see your network information by typing ifconfig. This command shows the status information of each network interface, including the IP address you will need to remotely connect to your instance. The interface lo is the special loopback interface with IP address 127.0.0.1. This refers to your local machine and any connection from your machine to your machine goes through this pseudo-interface. Typical network interfaces include eth0, eth1,..., wlan0, etc. Ethernet cards are represented with ethX. In the past, most wireless cards showed up as wlanX, but it is also common now for them to be represented with ethX names. ifconfig also gives information such as hardware address (MAC), netmask, and broadcast addresses.

You can start or stop networking by calling /etc/init.d/networking script. As with most /etc/init.d scripts, this script takes several options, such as start, stop, restart. Note even if you stop networking, you would still have your lo interface. You can look at the code of the script to find out what it actually does. You can also stop or start individual interfaces by using the ifup and ifdown commands.

The network configuration files are stored in /etc/network. /etc/network/interfaces contains the defaults for each interface. For xample, you could specify static IP, netmask, network, broadcast and default gateway for an interface here, but you should not need to edit this files in general. These default options can be changed with the ifconfig command. The /etc/network/if-down.d and /etc/network/if-up.d directories contain the scripts that are going to be executed when an interface is turned on or off. Of course, most modern Linux distributions have GUI tools for doing network configuration more easily, and you shouldn't need to change anything for the purposes of this course.

Installing Software

The package management tool in Red Hat Enterprise Linux (and therefore also your Amazon EC2 instance) is rpm. (In Debian, it is dpkg.) If you have an rpm package, you can install it by

$ rpm -i somepackage.rpm
$

This requires that somepackage.rpm be in your current directory, which means you will have to download the file yourself (or create it). It requires you to manually install any dependencies the package has.

Repository-Based Package Managers

A better alternative is to use a repository-based package manager. In RHEL, this is yum; in Debian, it is aptitude. To install a package using yum or aptitude, simply run yum install package-name or apt-get install package-name. For example, this is how you would install lynx, a command-line web browser, in your RHEL Linux distribution:

$ lynx --version   # is lynx installed?
-bash: lynx: command not found
$ sudo yum install lynx    # looks like we need to install it
Downloading Packages:
lynx-2.8.6-27.6.amzn1.i686.rpm        | 1.8 MB     00:00     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : lynx-2.8.6-27.6.amzn1.i686             1/1 

Installed:
  lynx.i686 0:2.8.6-27.6.amzn1                                                                                                                                                                                                                

Complete!
$ lynx --version   # test again to see if we have it
Lynx Version 2.8.6rel.5 (09 May 2007)
$

You can also search for available packages by name or by the name of a file that they install.

$ yum search lynx   # search for packages whose name contains lynx
======= N/S Matched: lynx =======
lynx.i686 : A text-based Web browser
$ yum provides lynx   # search for packages that install a file or command named lynx
lynx-2.8.6-27.6.amzn1.i686 : A text-based Web browser
Repo        : installed
Matched from:
Other       : Provides-match: lynx
$

The list of repositories that yum searches is located at /etc/yum.conf. The list of repositories that aptitude searches is located at /etc/apt/sources.list.

Command Reference

Earlier, you saw that cat is a command that shows the contents of a file. Below is a list of other useful commands in Linux.

Navigation and FIle Management

  • ls List file(s) in current working directory
    • ll Shortcut to ls -l. List files with more details than ls. Only available in certain distributions
  • cd Change working directory. Note: cd called without any arguments moves you to your home directory
  • cp Copy a file
  • mv Move or rename a file
  • rm Remove a file
    • rm -r Remove a directory and all files in it
  • ln -s Create a symlink to a file
  • mkdir Create a directory
  • rmdir Remove a directory (directory must be empty; if it's not, use rm -r)
  • cat Display the contents of a file
  • less Display the contents of a file, wait for the user at each page
  • tail Display the last 20 lines of a file
    • tail -f Display the last 20 lines of a file and then wait for changes, displaying them as they occur. Useful for monitoring log files.
  • chown Change the owner of a file
  • chgrp Change the group of a file
  • chmod Change the security permissions of a file
  • grep Display the lines of a file matching a user specified string
  • diff Display the difference between two files

System Administration

  • df Display free diskspace
  • du Display disk usage
  • free Display memory usage information
  • date Display current time and date
  • top Display the CPU and Memory usages of current processes
  • ps Display current processes
  • kill Terminate a running process
  • killall Terminate the running process matching user specified criterias
  • ping hostname Ping a host
  • host Get the IP address of a host
  • passwd Change the user password
  • su user Switch to the privileges of another user
  • shutdown Power off the computer
  • reboot Reboot the computer
  • clear Clear the terminal
  • ifconfig Display/Configure a network device
  • file Show the file type
  • lsmod Display loaded kernel modules
  • insmod Install a kernel module
  • modprobe Load a kernel module (also load the dependencies)
  • adduser Add a new user
  • exit Exit from a shell
  • lpr Print a file
  • head Display lines at the beginning of a file
  • tail Display lines at the end of a file
  • pwd Display the name of the current working directory
  • lsof Open files in the system
  • netstat Statistics related to open sockets

File Editors

It is sometimes convenient to edit files using the command line. Three widely-used command line text editors are vi, emacs, and nano.

Vi

To edit a file using Vi, use the command vi. You will see something like this:

<source>

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~