This document is an outline meant to keep the lecture on track. I, jim, may expand it to be self-standing, complete as a read. I welcome all criticisms, suggestions, questions, and other remarks. Throwing food is acceptable.
Class begins at 8:30 AM. There will be at least
one mid-morning break. Lunch will be 45 minutes.
There will be at least one mid-afternoon break.
Class ends at 5:30 PM.
Please be prompt.
Each attendee will be provided one Knoppix 4.02 CD ROM disk and one 1.4M floppy disk.
...But in order to understand everything else you must understand the one thing.
Therefore...
This class introduces everything at a high level then
hopes to circle through it all repeatedly with increasing
detail.
A note about boredom. In reading technical books or sitting in technical classes, there are two kinds of boredom:
One is that you know everything and you just can't force your mind to stay on the subject.
The other is that you know so little that you're swamped and you just can't keep your mind on the subject.
You can try to break your boredom by asking questions.
There are many shells. The Bourne shell was the first command-line shell (Stephen Bourne wrote it). Bill Joy later wrote the C shell. Other people have written still other shells. Many shell scripts supplied with Linux systems are Bourne shell scripts.
This seminar focuses on the BASH (Bourne Again SHell) shell because it is a superset of Bourne shell functionality and because the BASH shell arguably has the greatest amount of supporting documentation (and probably use).
The kernel is a traffic manager. All data goes through the kernel. The design is highly consistent: all data is seen to come from a file and is sent to a file, regardless of the nature of the hardware, keyboard, mouse, display, storage, and so on. The rubric for this file source and sink concept is "everything is a file".
What's Beyond Scope
As to kernel features, both hardware I/O and networking
are beyond the scope of this seminar.
The init process runs a set of child processes that set up the machine for use. Just what the init process does is determined by a configuration file, /etc/inittab.
There are file types special to a filesystem, notably directories. There is some kind of database for storage media that keeps track of file locations and attributes.
There are many kinds of filesystems: ms-dos floppys use a FAT filesystem; windows 95 uses a VFAT filesystem; Windows NT uses both filesystems as well as NTFS.
The UNIX filesystem design, which Linux uses, consists of a superblock, inode tables, and directories.
A directory is a special kind of file that stores a list of files, and for each file, a name and an inode number.
A file's inode number is an index into an inode table that stores the file's attribute information as well as the location of the bytes on the storage medium.
The superblock stores meta data about the inode tables.
Linux uses ext2 and ext3 filesystems primarily, although there are many other filesystem types available to Linux, depending upon what the particular kernel version is designed to implement. The ext2 and ext3 filesystems are identical other than that ext3 supports journaling, which allows better recovery of corrupted or otherwise lost files.
Typically linux installations span multiple hard drives, each with multiple partitions, and can easily include storage on other machines. Each partition supports a single filesystem. The filesystem on one partition need not be the same as the filesystem on another partition.
An argument for few partitions is efficiency. An argument for many partitions is ease of backup.
At installation time, one of the first tasks done is to partition the storage media, then put filesystems on each partition, then mount some portion of the directory structure onto each partition.
Typically the /boot/ directory is mounted on the first partition. The (all-important) swap partition (which has a swap filesystem) is in the middle of the drive. The /usr/ and /var/ directories are separately mounted. Sometimes the /home/ and /var/log/ and /var/www/ directories are separately mounted. The / directory (everything else) is mounted, usually on an early partition (a low-numbered partition, near the outer edge of the storage medium).
A note about the / character. The / character is a normal character. The shell does not treat it differently from alphanumerics. Its significance is that it is passed to the kernel, which interprets it as a parent/child file separator. The rule is that whatever is to the left of the / character must be the name of a directory. Whatever is to the right is a filename of either a normal file or a directory. Thus thock/thuck guarantees that thock/ is a directory; thuck may or may not be a directory. Note that / explicitly refers to a directory with no name (nothing on the left), which is the root directory of the filesystem.
/ The root directory of the boot disk and the top of the filesystem namespace /bin Stores essential utility programs, available to all run levels /dev Stores device drivers /etc Stores configuration files and shell scripts having to do with system startup and user login /home Stores user's home directories (also may be /u or /users or some other name) /lib Stores library files needed by programs that reside in the /bin and /sbin directories /mnt An empty directory used to mount filesystems for other storage devices such as the floppy disk and CD-ROM drives /sbin Stores utility programs to be used for system configuration and maintenance, available only to the root user (system administrator) /tmp Stores temporary files, often located on a dedicated partition /usr Stores many directories that organize various programs and other files having to do with users' use of the system, typically in read-only permission mode /usr/bin Stores utility programs designed for user maintenance /usr/doc Stores documentation for much of the system /usr/info Stores compressed files (*.gz) with more documentation /usr/lib Stores library files needed by /usr/bin and /usr/sbin programs /usr/local Stores programs created at the particular system site /usr/man Stores man pages /usr/sbin Stores utilities for maintaining system services /var Stores many directories that organize primarily data files that are generated by or for users or user services /var/spool Stores directories dedicated to dynamic datafiles created for print, mail, cron, and other services /var/www Stores directories dedicated to the apache (httpd) web server Important UNIX File Types regular (-), directory (d), character device driver (c), block device driver (b), hard link (-), softlink (l)The file type is one attribute among many, all attributes are stored in the inode table record for each file.
You can see some file attributes by using the ls -l command.
$ ls -l drwxr-xr-x 2 knoppix knoppix 240 Nov 13 20:30 Desktop ...The first field has ten characters. The first character shows the file type. The remaining nine characters are grouped in three sets of three characters each, the first set for the user, the second set for the group, the third set for all others (the unwashed).
The ls -l command shows nine fields:
Use the chmod command to change permissions of a file.
Names of files include alphanumerics and a very few other text characters, including neither the / character (used to separate parent_directory/child_file relationships) nor any of the shell's special characters.
The . character (period) bears mention: it has no significance to the shell or the kernel except with respect to ASCII sort order. Used as the first character of a filename hides the name of that file from commands, rendering such files safe and hidden.
The justification Ken Thompson gave for his original UNIX (late sixties early seventies) was that UNIX makes an excellent platform for documetation. Many of the most-used commands have to do with text processing.
The editors you use to create shell scripts and to edit configuration files should be text editors not word processors.
Following the old teletype model, typefaces used in command-line shells are non-proportional typefaces, rather than proportional typefaces normally used in word processing and HTML documents. Traditionally text on a terminal followed a 80x24 matrix design, 24 lines each up to 80 characters in length. Modern terminal windows may allow more than 80 characters. Be aware of line wraps. A line of text is defined as everything between two newline characters (the character generated by the Enter key).
Think of a command line as as a record with one or more fields.
Fields are typically delimited by whitespace characters: the space, tab, and newline (end of the record or command).
The IFS environment variable specifies which characters separate fields on a command line (Internal Field Separators). Change this value at your peril.
The shell is designed to respond to some special characters such as quotation and punctuation marks.
The term "ASCII sort order" refers to the effect of sorting (usually filenames) with respect to the ASCII character codes.
0x20 is0x2E is . 0x30 is 0 0x39 is 9 0x41 is A 0x42 is B 0x61 is a 0x62 is b
Regular text is that which obeys some pattern rules. The most common is that of fields on a line (a record). The space character and the colon character are the two most common field delimiters.
Configuration files are largely text files, and largely in the /etc directory. Most common field delimiters are the : and ' ' (space) characters. Occasionally configuration files have restrictions for using the tab character. Rarely a final space character on a line or a final blank line in a file will cause problems.
To read a text file, use the more command rather than the cat command. Use the cat command within shell scripts. Of course, you can load a text editor to read a file too.
Some commands are designed to manipulate characters on a line (the cut command). Some commands are designed to work with a complete line of text (the grep command). Some commands are designed to work with many lines of text (the uniq and sort commands).
$ abc bash: abc: command not foundNote the syntax of the error message. There are three fields, each separated by the colon character. The first field shows the name of the program issuing the error, the second field indicates the thing that is in error, the final field is the text of the error message. In the example, "I, bash, am complaining about abc because I can't find a command so named."
The key is a variable name.
The value is all text to the right of the = character (be sure to quote strings that contain field separators).
Some keys are properly called "environment variables" and are available to all shells. Some keys are properly called "shell variables" and differ from shell to shell. You can make your own variables; the syntax for doing so varies among shells. By default, your variables are local variables, with scope limited to the current shell.
Use the env command to see environment variables. Use the set command to see all variables, including shell variables, environment variables, and other variables.
The appendix lists some commonly used environment variables as well as some commonly used shell variables.
$ THIS=that $ echo $THIS that $ echo $WHAT $ MINE:-mine $ echo $MINE mine $ YOURS=$MINE $ echo $YOURS mine $ OURS:=yours $ echo $OURS yours
\t - time
\d - date
\n - newline
\s - Shell name
\W - The current working directory
\w - The full path of the current working directory.
\u - The user name
\h - Hostname
\# - The command number of this command.
\! - The history number of the current command
$ let z=(8 * 9) ; echo $z 72 $ w=$((4 + 8)) ; echo $w 12the expr command returns 1 for true 0 for false
$ echo ' 4.5 * 3 ' | bc 13.5 $ seq 1 3 1 2 3
$ echo {M,N,O}{Q,R,S}
MQ MR MS NQ NR NS OQ OR OS
(date; w; pwd; ls) > ~/curstatsThe above command combines the output of all three commands and redirects it to a file named curstats in the user's home directory.
sleep 3 sleep 3 &
Typically a command either reports state or changes state. For examples,
The pwd command reports what is the current working
directory.
The date command reports date and time according to
the system clock.
The cp command creates a new file that is a copy of
an existing file; the new file retains the attributes
of the original.
The mkdir command creates a new directory.
Command argument requirements depend upon the design of the command.
Default behavior is often a matter of using the command with no options (switches). Variant behavior is triggered by using options.
Generally command usage is extremely regular, trustable. The same usage rules apply to nearly every command.
There are exceptions.
For example, most commands are designed to recognize the - character as beginning a string of options. There is no significance to the order in which the option characters appear.
$ ls -lat $ ls -alt # identical to the command aboveBut the ps command recognizes its options as a simple set of characters:
$ ps auxAlthough it will run if the - character precedes the option string:
$ ps -auxHowever, the ps command issues a Warning statement.
$ ps -aux 2> pswarn $ more pswarn Warning: bad ps syntax, perhaps a bogus '-'? See http://procps.sf.net/faq.html
The child shell immediately execs, overlaying itself with the command, which keeps the supplied environment that the child shell inherited.
The new (external) command has a unique PID.
Given expected circumstances, the command terminates normally, with an exit code of 0, which is returned to the waiting parent shell. You can verify the return value by using the echo $? command.
Normally, especially with respect to shell scripts, a program falls through from beginning to end and at the end dies. But it's possible to send a signal to a program (even a shell script) and direct the program to terminate itself at that time.
A signal is a code that comes to a process from outside the process. An example of a signal is the use of CTL-C (SIGINT) (2).
A trap is a routine within the process that captures and handles a signal.
Each process running is a program written by some human according to some design (hopefully). It is the designer's choice to let the program respond to signals. The designer can choose that the program respond to no signals, all signals, or some signals.
With one exception: the rude signal 9 (SIGKILL). A SIGKILL is actually a call to the kernel, not to the process, that tells the kernel to terminate a particular process specified by its PID. Termination is abrupt, giving the process no opportunity to gracefully clean up.
In a shell script you can use the trap command (one of the built-ins) and specify particular behavior to perform upon receipt of a particular signal (or group of signals).
The use of kill and CTL-C have to do with traps and signals.
Supplied external commands are stored in the /bin and /sbin directories as well as in the /usr/bin and /usr/sbin directories.
The value used to reflect normal exit is zero.
A non-zero value reflects a problem with termination.
Exit codes are integers ranging from 0 to 255.
This behavior has implications with respect to logic
at the shell level. Unlike nearly all other
programming languages, zero is used for TRUE and
non-zero is used for FALSE.
Note that the expr command uses 1 for TRUE and
0 for FALSE and that the context of using the
expr command is different from using an exit status.
When a shell terminates, it returns an exit status to whatever parent program called it into existence.
There are man pages, info pages, and help information on line. The man pages and info pages are available from the GUI shell as well as the command-line shell.
For newcomers the most troubling aspect is the notation, which (fortunately) is consistent across all three information sets. The writing style is similarly opaque, but also (thankfully) consistent. The trick is to get used to things; this section purports to help you in that.
For those who don't keep internal and external commands clearly distinct, there is the frustration of not being able to find man pages for internal commands (such as the help, let, trap, type, ulimit, umask, and other commands). The man page for the internal commands is the man page for the bash shell itself.
$ man bashThis is one big man page.
The appendix has a list of helpful books and web sites. These are generally preferable to reading the man page for even the simplest command-line shell. For dozers, the floppy disk supplied with this seminar has a (hard-won) copy of the bash man page.
$ /bin/pwd --help Usage: /bin/pwd [OPTION] Print the full filename of the current working directory. --help display this help and exit --version output version information and exit Report bugs to <bug-coreutils@gnu.org>.The shell loads the program /bin/pwd, which examines its tail to find exactly
--helpThe /bin/pwd command therefore dumps this help page.
--help --version
A similarly simple way to get started is to use the shell's built-in help command to get information about the built-in pwd command:
$ help pwd pwd: pwd [-PL] Print the current working directory. With the -P option, pwd prints the physical directory, without any symbolic links; the -L option makes pwd follow symbolic links.
No loading is required. This pwd command is part of the bash shell and its help is dumped from the shell's data area in RAM. The format of the shell's help pages is a little different from that of the GNU toolkit external commands.
You might ask why there is an external pwd command as well as an internal pwd command. A quick answer is that there are many other shells, and some of them may not have an internal pwd command. There are rare occasions (usually in shell scripts or for testing purposes) that it's important to get the result by forcing a load (child shell that execs). And the behavior of the external command is guaranteed, independent of which shell is running.
The test and echo commands also have internal and external counterparts for similar reasons.
To force the use of an external command, use the full path specification, which overrides the shell's normal load search order.
$ /bin/pwd
The ls command has a large set of options and its help page reflects most of the commonly used notation.
$ ls --help Usage: ls [OPTION]... [FILE]... List information about the FILEs (the current directory by default). Sort entries alphabetically kif none of -cftuSUX nor --sort. ...
man mvUsing the GUI shell, click the K button (lower left corner) to get the K menu and choose help. In the KDE help center window, scroll down the list in the left frame and choose UNIX manual pages.
ATT Viii (BSD) Focus
1 (1) User commands and applications
2 (2) System calls and kernel errors
3 (3) Library calls
4 (5) Standard file formats
5 (7) Miscellaneous files
6 (6) Games and other miscellaneous
7 (4) Device Drivers, network protocols
(8) System administration
Man is short for "manual", which is short for "Help Manual".
The UNIX online help documentation, the man pages, is famous for the helpless feeling most people get when they try to read them. Many books have been successful because the man pages are so obscure. You should know that man pages exist, but you shouldn't feel bad if you can't figure out what they mean.
One of the biggest drawbacks of the man pages is that the man pages rarely show examples.
Each man page is divided into headings, the most important of which is SYNOPSIS, which shows you how to use the command on a shell's command line.
To get a feel for man pages, try the following commands.
$ man pwd # This is a very small man page and easy to digest $ man ls # This page is long because the ls command is complex $ man sh # This very long page is all about the shell $ man man # Learn about the man faciltity itself
The writing style of the man pages is dry, assuming the reader has deep system experience. Don't feel bad if you can't make much sense out of man pages. Most experienced UNIX users can't, either.
What the man pages are good for is to check exactly how a command may be implemented on the particular system you're using. You should know that there are many different versions of UNIX, including many different Linux versions. They all look and work the same except for small details, but when you get stuck, you'll have to check the details, and then you should refer to the man page for the particular command that's got you stumped, because the man pages should explain that version exactly.
In a few cases the same name applies to an entry in more than one section; for instance, the name of a user command may also be the name of a system call. At the command-line you can force a particular entry by specifying its section number as one of the arguments:
man 5 crontab man 8 crontab
In a terminal window use the info command to read the information page for a command:
info mvUsing the GUI shell, click the K button (lower left corner) to get the K menu and choose help. In the KDE help center window, scroll down the list in the left frame and choose Browse Info Pages.
$ man <cmd> $ info <cmd> $ <cmd> --help
$ man bash $ info bash $ help
$ help <built-in>
$ ls ... $ echo $? 0 $ abc command not found $ echo $? 127
$ mkdir 2> currenterr $ more currenterr mkdir: too few arguments Try `mkdir --help' for more information.
The kernel treats the characters you type on the keyboard as data coming from a file. The kernel sends data to the display as it would send data to any normal file. The kernel treats the mouse, system beeps, the clock, modems, and printers, all these devices just as ordinary files.
In order to manage getting and sending data between the correct files, the kernel uses numbers to identify the files. These numbers are called "file descriptors." The keyboard has a special file descriptor, number 0; the display has its special file descriptor, number 1. These two numbers have special names, standard input (stdin) and standard output (stdout). When you type, the kernel acts as though it's getting data from a file with file descriptor 0. When the kernel sends data to the display, it simply sends data to file descriptor 1.
For example, the cat program by default gets its input from the keyboard (stdin) and sends its output to the display (stdout). Programs that are designed in this manner are called "filters." If you just type cat on the command line, it will appear that nothing has happened. If you type some more characters, you'll see your output on the screen whenever you hit the Enter key. To quit, you have to send the cat program the EOF character, by typing CTRL-D.
If you open a file, the kernel uses the name you give it, gets its inode number (not the file handle), uses the inode number as an index in the inode table for the correct inode data structure, and discovers where on the disk the contents of the file are stored. Then the kernel creates a file descriptor to represent this open file.
For example, you may want to look at a shell script you're working on. To do so you issue the more command.
$ more chexThe shell examines the command, sees the text string chex is a parameter in the tail and that more is the head of the command line. The shell forks to make a child copy of itself, which execs to overlay the more program. The string chex is in the tail of the command line. The more program asks the kernel to find and open the file named chex. The kernel gets the inode number for chex, looks in the inode table, gets the data from the inode data structure for chex, and finds the place on the hard disk where the contents of chex are stored. The kernel then creates a file descriptor and passes that number to the more command (which is loaded and running in memory).
The more command gets the data from the kernel, using whatever file descriptor number the kernel assigned, and the more command, by default, sends up to 25 lines of the incoming data out to file handle descriptor 1, the display. The more command continues sending subsequent sets of 25 lines to the display until all the input data has been sent.
When a program has finished working with a file, the kernel closes the file, writing any changes to the file. When another program makes another request for that file, the kernel goes through the same process, but creating a new file descriptor number.
Note that the inode number of a file remains the same, as it is an index into the inode table for information about that file. The file descriptor is a number the kernel uses to keep track of a file that it has opened on behalf of some process.
When a user logs in, the pseudo terminal or mingetty program calls the /usr/bin/login program.
After successful username and password identification, the login program calls whatever shell is named in the /etc/passwd file for that user.
The shell is designed to support three always-open file descriptors:
It's possible to capture normal output to a file, although in doing so you can't also read it on the display. (There is the tee command, out of scope for this seminar.)
To redirect output to a file rather than to the standard output, use the following notation:
> >> < 2> >&
When you use redirection, the shell switches file handles. For instance, by default the echo command sends the command line tail to the display.
$ echo "hello there" hello there $
The echo command sends its output to a file with file descriptor 1.
If you use redirection, you send the output of a command to a different file descriptor instead of file descriptor 1.
$ echo "hello there" > howdy $
The shell sees the > character after the command and followed by howdy. The rules of syntax dictate that howdy is the name of a file. The shell requests that the kernel either open or create a file named howdy. The kernel does this and passes the file descriptor (for example, file descriptor 568) for howdy to the shell. When the shell gets the file descriptor for howdy, the shell uses it instead of the default, which is file descriptor 1. In other words, the file substitutes 568 where 1 normally goes, so the output of the command goes to the file howdy instead of to the display.
Note that the > character directs the shell to overwrite any existing information in the file howdy. In other words, the output of the command begins at the first byte of the file, and the file is closed after the last byte of the command is sent.
If you use double redirection, you can append the output of a command to the end of a file, and thus save rather than overwrite what's already there.
$ echo "More greetings" >> howdy $This double redirection adds "More greetings" to howdy so that the total contents of howdy are now
Hello there More greetings
The explanation for this is that when the kernel opens a file, the kernel uses a file pointer to point to the place where it should write data.
If you use the > redirect operator, you're telling the kernel to put the file pointer at the beginning of the file and then start writing.
If you use the >> redirect operator, you're telling the kernel to put the file pointer at the end of the file and then start writing.
Where ever the file pointer is, the kernel will begin writing at that location.
When finished writing, the kernel records the position of the last byte written as being the position of the end of the file.
Most commands report errors (such as invalid argument lists in the command-line tail) to standard output (the display, file descriptor 2). You can capture errors to files.
$ mkdir mkdir: too few arguments Try 'mkdir --help' for more information $ : $mkdir 2> errs $
Some commands are designed to get input from the standard input and send their output to the standard output (in from fd 0 and out to fd 1). Programs so designed are called "filters".
To pipe is to connect the output of one process to the input of another, assuming the first process sends its output to stdout and the second gets its input from stdin.
The notation uses the | character.
ls /usr/bin | more
The pipe lets you connect the output of one command, which would normally go to the display, as the input of another command, which would normally get its input from the keyboard.
A good example of using a pipe is in using the set command to dump the Environment to the display. The default output of the set command is the display (stdout). But often, the Environment is so big that the first lines are lost, pushed out of the display area by the last lines. Because the set command uses the standard output device by default, you can pipe its output to the more command (which by default gets its input from the standard input device, the keyboard). Here's the command.
$ set | moreYou'll see the first twenty-five (or so) lines of the environment with the more program's --More-- prompt at the bottom left of the screen. Use the <space> character to see the next set of lines. Use the q character to quit.
The shell connects the output of the set command (file descriptor 1) as the input (file descriptor 0) of the more command.
The more command is a filter because it gets its input from stdin and sends its output to stdout. The filtering effect of the more command is to receive all incoming data but release only a few lines at a time, depending on the size of the terminal window and the user's interactivity.
Fortunately, Ken Thompson was squeaky clean. The fundamentals of his design persist today in modern Unix-like operating systems.
One of the foundation principles of the design is that things are done on behalf of a user. A user ID is a necessary attribute for all activities, including files and processes.
Throughout the filesystem namespace, all files have permissions attributes, shown in the last nine characters in the first field.
Access may be restricted to a particular user or to a particular group, depending on how the owner used the chmod, chown, and chgrp commands.
The user who has permissions to open subdirectories can make files, and each is made with that user as the owner.
The permissions of newly made files are determined by the user's default permissions, set by the umask command.
Because a new process begins life having been forked from a parent process, the new process inherits whatever files the parent has had open. Hopefully the parent has the good manners to wait until the child dies before accessing its open files, otherwise there could be contention.
Daemon processes run with respect to their own (weirdly named) users (for example httpd runs on behalf of a user named apache).
A child shell inherits stdin, stdout, and stderr from its parent, and therefore works with the user's I/O.
A child shell inherits the user ID of its parent.
$ echo $SHLVL 2 $ sh $ echo $SHLVL 3 $ exit $ echo $SHLVL 2Try making a local variable and testing its existence in a child shell.
$ echo $SHLVL 2 $ TRIAL=guilty echo $TRIAL guilty $ sh $ echo $SHLVL 3 $ echo $TRIAL $ exit $ echo $SHLVL 2 $ echo $TRIAL guiltyYou can force a child shell to inherit a local variable through use of the export command, one of the shell's built-in commands.
$ export ...The shell maintains an export table, the contents of which you can see when you use the export command with no arguments.
Assume the previous local variable TRIAL exists. Here's how to export it for child shell use.
$ echo $SHLVL 2 $ echo $TRIAL guilty $ export TRIAL $ export ... declare -x TRIAL="guilty" ... $ sh $ echo $SHLVL 3 $ echo $TRIAL guilty $ TRIAL=innocent $ echo $TRIAL innocent $ exit $ echo $SHLVL $ echo $TRIAL guityThe TRIAL variable has a value of guilty as seen by the echo $TRIAL command and in the export table. In the child shell you can change the value of the TRIAL variable to innocent, but after terminating the child shell you see the value of TRIAL in the parent shell remains guilty.
Inheritance goes down but not up.
family/
gramps/papa/{sonny,baby }
gramma/mama/{lily,millie }
use the mkdir command to make a directory tree
Each line of a shell script must conform to the rules of the shell for which it is written. In short, in learning to create shell scripts you are learning UNIX commands, the rules of the Bourne shell, and the essentials of procedural programming.
To create reasonable shell scripts you must know the basics of how a computer works, the commands and other executable programs you can use in your shell script, and the rules of the shell for which you are designing your shell script.
Set up a work area. Make a subdirectory /home/knoppix/work and /home/knoppix/bin and perhaps /home/knoppix/trash.
Make /home/knoppix/work the current directory when you are creating and testing your shell scripts.
If you're disgusted with a shell script, move it to /home/knoppix/trash for safe keeping. If you use the rm command to delete it, it'll be gone forever.
After you've completed your shell script, move it to the /home/knoppix/bin directory, which is your storage area for your own commands. If you're working in a Unix environment, if the community etiquette permits, snoop around in other users' home directories to inspect their bin/ holdings. Again, if this is acceptable to others, you can learn a lot.
Your primary tool is a text editor. For this seminar use KWrite (in the GUI, KDE menu>Editorsi>KWrite) or the vi editor (in a terminal window--a quick vi how-to document is on the floppy disk).
The best coders--the geniuses among us--tend to write extremely simple, clear code. Neatness and good form really help you write bug-free code. Here is a stub:
#!/bin/sh # name-of-this-shell-script date yourname # brief explanation of the purpose of this program # no claim of genius is made here echo "your commands go here" # End of ScriptThe above script uses the # character as the comment delimiter. When the shell sees the # character, it ignores all other characters on that line and to the right of the # character.
There is an art to commenting. Generally fewer comments are better. Be sure to include the date you wrote the shell script, the date you modified it, your name, and a brief explanation of the purpose of the script. Be sure to comment any "magic lines" (lines that are complex or that use little-known commands or notation).
#!/bin/sh # falling 20051109 jim # exemplifies fallthrough: the first command runs first # the second second, and so on. After the last command # runs, the shell script dies, returning its parent. echo "I am falling" echo "your current directory is" pwd echo "the files in the current directory are" ls echo "goodbye" # end of script
You can create your own variables.
There is a notational issue to spelling variable names. To some degree the spelling conventions are specific to a particular community, so check with other engineers and managers to determine conventions of your enterprise.
Use the set | more command and notice that for Knoppix the environment and shell variables are all uppercase. The function names are all lower case, and most of them begin with the _ character.
#!/bin/bash # nowyouseeum 20051109 jim # exercises variable scope # code should be clean and simple myvar=jim clear echo " Your user ID is $UID Your logname is $LOGNAME You are using the $HOSTNAME machine and $BASH version $BASH_VERSION The current working directory is $PWD " echo "Is your name myvar or \$myvar?" echo " Is your name $myvar or something else? " echo "goodbye again" # end of script
You can use positional parameters in your shell script. For example, when the shell loads your script to run, it assigns $0 to have the value of the name of the script. The shell assigns $1 the value of the next parameter you type on the command line when you run your script. The $2 positional parameter gets assigned the value of the next parameter you entered, and so on.
You can design your shell script to use those positional parameters, for example:
#!/bin/sh # greet 20051112 jim # inane exercise of positional parameters echo "Hi $1 and $2"Run it.
$ ./greet me you Hi me and youThe shell loads the greet shell script, assigns the value of greet for $0, me for $1, and you for $2.
$ ./greet bud marina Hi bud and marinaChange the echo statement in your greet script
echo "Hi $2 and $1"Typing greet me you at the command prompt produces
Hi you and me
Notation for positional parameters includes
$0 which stores the name of your shell script $1...$9 which store command-line arguments $* all arguments in the tail $@ all arguments in the tail $# the number of arguments in the tailThe shift built-in command discards the first argument from the entire command-line tail with the effect that the former second argument is now the first, the third becomes the second, and so on.
#!/bin/bash # shifty 20051112 jim # demonstrates the effect of the shift keyword echo There are $# arguments $* echo "Press the Enter key" read shift echo There are $# arguments $* echo "Press the Enter key" read shift echo There are $# arguments $* echo "Press the Enter key" read shift echo There are $# arguments $* echo "Press the Enter key"Type it in, use the chmod 755 shifty command to make it executable, and run it:
$ chmod 755 shifty $ ./shifty a b c d eWhile you're developing your shell script, include an echo command that dumps the command line tail.
echo "the tail is $* "
The shell supports two branching keywords
if caseThe shell supports three looping keywords
for while untilThe shell supports two testing keywords
test [
Many shell scripts use the test command, many use the [ command, and because the [ command is a little more cryptic (and possibly more common, at least in supplied system shell scripts), this discussion on testing focuses on the [ command. The discussion for the test command is identical.
The [ command is a command, and like all other commands follows the same command-line rules. Your job is to learn usage for the [ command, i.e. what kinds of arguments does it allow.
The [ command has options and some defined arguments that are operators used in logical expressions. Look at a simple use of the [ command.
[ "this" = "that" ]Count the parameters in the above expression. There are five, each separated by a space character.
The expression is a command. Parameter 0, the head of the command-line, is the [ command, which has four arguments in its tail:
"this" = "that" ]The command reads something like this: "Hey [ command, wake up, your tail is a string followed by the = comparison operator, followed by a string; the final ] character tells you that's the end of your work."
The [ command sees an initial string and therefore expects the second argument to be some kind of operator. The = operator fills the bill and dictates that there must be another string following, which is the case. Because the ] termination character follows the second string, the expression that the [ command has to evaluate is whether the first string is identical to the second.
If the two strings are identical, the [ command exits, returning an exit status of 0 (which has the meaning of TRUE or "no problems, boss").
If the two strings are at all different, the [ command returns a non-zero value (which has the meaning of FALSE, or "hey, something's up!").
In the above example, the strings are different, so the resolution of the expression is FALSE.
In general use, the [ command is used to test if the value of some variable is identical to some string:
[ "$LOGNAME" = "knoppix" ] [ "$SHELL" = "/bin/bash" ] [ "$#" = "0" ] [ "$1" != "" ]Note the final example above uses the != not-equals operator. If the strings are at all different, the [ command returns TRUE; if the strings are identical, the [ command returns FALSE.
The [ command takes a set of options (use the help [ and help test commands to see).
[ -z "$1" ]In the above expression, the [ command returns TRUE if there is no first argument in the shell script's command-line tail. The [ command sees its first argument as -z which dictates that the subsequent argument must be a string and that these two arguments are tightly bound and must be immediately resolved.
$ [ -z "this" ] $ echo $? 1 $ [ -z "" ] $ echo $? 0The [ command has other options, many of which have to do with testing files.
[ -f "$1" ]This very commonly used -f option must be followed by a string; the [ command returns TRUE if the string is the name of a file. This use tests that the first argument in the shell script's tail is the name of a file.
The [ command's options include logical AND and OR operators.
[ -z "$1" -a -z "$2" ]How many arguments does the [ command receive above? The answer is six. The first (on the left) -z followed by "$1" must be resolved immediately (TRUE if there is one argument on the shell script's command-line tail). The -a option indicates that there is a subsequent expression to the right. The -z "2" expression must be resolved to either TRUE or FALSE.
If both the left and right expressions are TRUE, then the [ command returns TRUE. If either is FALSE, then the [ command returns FALSE.
Note the following commonly used performance rules of resolution:
For an AND expression (which has a left and right expression to test), if the left expression tests FALSE, do not bother to resolve the right expression.
For an OR expression (also with a left and right expression to test), if the left expression tests TRUE, do not bother to resolve the right expression.
In the above example, if there is no first argument, there cannot be a second argument. That there is a first argument is no guarantee that the user typed in a second argument.
[ "$UID" = "0" or "$LOGNAME" = "knoppix" ]
(Count the arguments above.)
The above OR expression has as its left expression "$UID" = "0" which is TRUE if the current user is the system root user. If so, there's no need to bother resolving the right expression. If not, then evaluate the right expression "$LOGNAME" = "knoppix" which is TRUE if the current user has the username of knoppix.
If either is TRUE, return TRUE. If neither is TRUE, then return FALSE.
The if keyword has a set of rules. The if keyword must be followed by some complete command that returns an exit status, which must be followed by the then keyword, which must be followed by at least one complete command or more, which must be followed by either the fi keyword or the elif keyword or the else keyword (more to come on elif and else, ignore for now).
$ if [ -z "" ] ; then echo $? ; fi 0Note that the ; semicolon character is a statement (command) terminator. The above is the same as
$ if [ -z "" ] > then echo $? > fiMost common notational style in shell scripts is as follows:
if [ -z "" ]; then echo $? echo "goombye" fiThe elif keyword allows subsequent branching. The else keyword allows one final, usually default, branch.
if [ "$UID" = "0" ] ; then echo "all hail" exit 0 elif [ "$LOGNAME" = "knoppix" ] ; then echo "what do you want" exit 0 else echo "leave me alone" exit 1 fiAn if statement with no else or elif constructs executes the commands between the then and fi keywords if the test is TRUE. The default action is to do nothing. In other words, the default action occurs if the test is FALSE.
filetime.sh ---------------------------------------
#!/bin/sh
# filetime.sh 20031015 jim
# report timestamp of file named in arg 1
if ["$1" = "" ]; then
echo "Usage: $0 FILENAME"
exit
fi
if [ -f $1 ]; then
ftime=`ls -l $1 | cut -b 42-55`
echo "The file $1 was last modified on $ftime"
else
echo "Cannot find $1"
fi
# End of filetime.sh
The above script tests if there is at least one
argument in the shell script's tail. If not, the
shell script issues a usage statement and exits.
If so, the script tests if the argument is a
filename, and if so, it captures the time the
file was last modified. If the argument is not
a filename, the script default is to say so.
Modify the above shell script to test for exactly one argument.
The reason for using the if keyword for branching is its flexibility of tests. Any of the examples in the Testing section can be used after the if keyword. Other programs can also be used:
if cmp $1 $2 ; then echo "$1 and $2 have identical contents" fi
Use the help if command to see the syntax for the if keyword.
The case keyword has its own set of rules and its own appropriate purpose.
As to the rules for the case keyword, the following explanation is simplified.
The case keyword must be followed by a value (WORD in the on-line syntax explanation), which must be followed by the in keyword. After the in keyword comes a set of values, each of which is associated with one or more commands followed by the double ;; semicolon. The optional * ) case lets you specify one or more commands as a default. If you do not create such a case, there is no default action. The entire case construct is terminated with the esac keyword.
case $LOGNAME in
"root" )
echo "you should not work as root"
~/bin/callpolice
exit 1
;;
"knoppix" )
echo "hi there"
~/bin/callmom
exit 0
;;
* )
echo "who are you?"
~/bin/callpolice
~/bin/stallthem
;;
esac
Use the case branching construct when you're testing
possible values of a variable. Common advice is to
design your shell script to use the case construct
if you can for the reason that it's faster. It's also
more likely to be bug-free and easier to maintain.
case $LOGNAME in
"root" | "knoppix" )
echo "I know you"
exit 0
;;
* )
echo "Unknown user: $LOGNAME" >> /var/log/mylog
exit 1
;;
esac
The above construct tests two cases: one case is if
the user is either root or knoppix; the other case is
that the root is neither. The example exercises the
use of the | or operator to allow multiple values as
a single case.
The shell has three looping keywords, for, while, and until. The while and until keywords are identical in their use except in how they respond to the test: the while loop continues while the test is TRUE; the until loop continues until the test is TRUE.
The for keyword rules require the for keyword is followed by an identifier that is to be used in the loop body as the loop variable, which is followed by the in keyword, followed by a list of WORDS, followed by the do keyword, then a set of one or more commands, and finally the done keyword.
The test is that the list of words between the in keyword and the do keyword is not empty.
for i in A B C D E do echo $i doneGenerally the list is not exactly known to the programmer at the time of writing the script but is generated at runtime. Probably the most common list is the files in some directory.
for i in $SOURCE_DIR/* do if [ -x $i ] ; then mv $i $TARGET_EXES elif [ "`echo $i | cut -d'.' -f2`" = "config" ] ; then mv $i /etc elif [-d $i) ] ; then mv $i /usr/share else mkdir ./Weirdos echo "Got a weird file: $i" | tee /var/log/mylog mv $i ./Weirdos fiHere's a for loop for all files in the current directory:
for i in * do echo "yow" doneThe list might consist of all arguments in the shell script's tail, as in the following inane example:
for i in $* do if [ -f $i ] ; then echo "$i is a file" elif [ -d $i ] ; then echo "$i is a directory" else echo "who knows?" fiUse the for construct for a predictable (at runtime) number of items that can be listed.
The while keyword (and its near-twin until) are used with tests that generate unpredictable TRUE or FALSE results or that require elaborate expressions. Note that any program's exit status can be used as the test condition.
num=0 limit=9 while [ $num -lt $limit ] do echo $num num=`expr $num + 1` done echo "boo, all done"The above while loop tests that the value of num is less than limit. The body of the loop adds 1 to the value of num. When the value of num is the same as the value of limit, the test fails and the while loop stops, falling through to whatever commands are below the done keyword.
If counting is all that's needed, it may be easier, certainly more appropriate, to use a for loop:
# hardcode the list for i in 1 2 3 4 5 6 7 8 do echo $i done # or use the output of the seq command for i in `seq 1 8` do echo $i doneAs a general rule, try to design your shell scripts to use for loops rather than while loops (don't kill yourself jumping through hoops, either).
It's easy to create tests that fail all the time, in which case the loop body never executes. It's also easy to create tests that never fail, in which case the loop body continues to execute eternally (or until someone notices and uses CTL-C to kill it).
The until keyword is identical to the while keyword except that it runs the loop body while its test is FALSE.
you=notgoofy until [ "$you" = "goofy" ] do echo -n "Who are you?" read you done echo "hi $you"
Environment (and shell and your own local
inherited) variables include
$LOGNAME $UID $SHELL $BASH and more.
Use the set | more command to see them all.
To open a file, assign the output of the cat command to a variable
indat=`cat ./indat`To capture the report of a command, assign the output of the command to a variable.
thisdir=`ls`You can use the output of the command as the list for a for loop
for i in `ls` do echo $i doneYou can prompt the user for input, but you'll have the problem of working with typos and other user confusion.
Use the echo command to tell the user what to type in.
Use the read command to capture the user's input to one or more variables. Best not to user more than one variable with the read command.
whoME.sh ------------------------------------------- #!/bin/sh # whoME.sh 20031015 jim # interactive script compares input with value of LOGIN echo "$0 running " ME="misterblister" until [ $ME = $LOGIN ] do echo -n "who are you? " read ME done echo "hi, $ME" echo "Good bye\!" # End of whoME.shModify the above script to allow only up to four attempts.
Modify the above script to get both the user's first and last names only. Try using the
read firstname lastnamecommand and see what trouble you can get into.
You can create functions for your environment or within your shell script. You can create a shell script that has only functions, those that you've written (or stolen) and that can be loaded by those of your shell scripts that need them.
#!/bin/bash
# boo.sh 20051109 jim
# silly shell script that exercises functions
function announce ()
{
echo $*
return 81
}
announce presenting the incredible announcement
if [ "$?" = 81 ] ;
then
echo "It's alive!"
else
echo "Hmmph"
fi
# end of script
Your functions have to be defined at the top of
your shell script above the code that calls them.
A function takes a variable number of arguments. Within the code block of the function you access inbound parameters with the same positional parameter notation as used in the shell scripts themselves.
functions use the return keyword to return their exit status. Be careful in choosing exit status values, as some are reserved.
Your shell script can capture signals and determine response behavior.
#!/bin/bash # catchme.sh 20051109 jim # yass that exercises traps echo "It's me, $0" while [ "1" = "1" ] do echo "...in the loop..." read trap 'echo "no way"' 2 done # end of catchmeThe above trap disallows using CTL-C to terminate the shell script as it runs.
Use a separate terminal window, run the ps aux command,
note the PID for the shell script catchme and issue
the kill 15
The above script has problems, though, as the trap
construct generates warning messages. Functions can
solve the problem.
Rewrite the catchme shell script so that it defines
a function at the top of the script, the body of
which has the command
echo "no way"
Appendix: Quick Reference
concepts and terms
commonly-used built-in commands:
cd echo exit export help kill pwd set ulimit umask unset
flow control built-in commands:
break case echo for if read shift test until wait
commonly-used external commands in /bin :
basename cat chgrp chmod chown cp cut date
grep hostname kill ln ls mkdir more mv ping
ps rm rmdir sh sleep sort su
Categories of /bin commands:
shells
ash bash bsd-csh csh sh tcsh
filesystem
chgrp chmod chown cp cpio dd df ln ls mkdir
mknod mount mv pwd rm rmdir sync touch umount
file cat dd gunzip gzip tar uncompress
text manipulation
cat echo ed egrep elvis* fgrep grep more sed vi
process
kernelversion kill ps sleep
network
dnsdomainname hostname ip netstat ping
system management
arch date dmesg stty su uname
commonly-used external commands in /usr/bin :
du wc cmp who diff expr find head tail uniq
clear which tar gzip and gunzip vi
process management commands:
ps kill nice job fg bg
filesystem commands:
ls cd pwd cp mv mkdir rmdir rm ln du df sync
Text manipulation commands:
more cat wc head tail cut grep cmp diff sort uniq vi sed
commonly-used environment variables:
HISTSIZE, HOME, HOSTNAME, IFS,
LOGNAME, MAIL, PATH, PWD,
PS1 to 4, SHELL, SHLVL, TERM, USER, _
commonly-used shell variables:
BASH, BASH_VERSION, COLUMNS, EUID, HISTFILE, HOSTTYPE,
LINES, MACHTYPE, PPID, OSTYPE, UID,
special characters:
comment delimiter: #
command-line terminators: \n ; | & && || > >> < 2> &>
quoting characters: " ' \
wildcards: * ?
lists {M,N,O}{a,b,c}
tilde expansion: ~ (the current user)
variable expansion: $