[Sig] Presentations on Shell Scripting

Randy Kramer rhkramer@fast.net
Thu, 24 May 2001 11:07:45 -0400


This started out by trying to summarize what has been offered or
requested so far.  Then I went off on a tangent, attempting to show the
Concept --> Design --> Pseudocode --> Program process for a script.  It
is not finished, as I am in the midst of the Design --> Pseudocode steps
and am seeking comments from my colleagues (everyone on the list) and
the customer (Dan).

It is written pretty roughly, consider it typed on the back of a napkin,
because we are in the "back of a napkin" phase of Design.

After some feedback, I may continue (or someone else may pick up the
ball).

At least two volunteers:

Brian Hechinger
Paul Ryan

And several suggested topics:

-Bourne (bash?) scripts (Brian?)
-Korn scripts (Paul, slides from 3 day course)
-Perl scripts (Paul?)
-Vi (sort of unrelated) (Paul)
-How to Write a Script (requested, no volunteer)
-Explain selected scripts (rc.d, etc.) (suggested, no volunteer)
-Write requested scripts, possibly in more than one shell for comparison
(suggested, no volunteer)
-(Paul has written one script in response to Dan's suggestion -- Dan --
do you see that as meeting your "needs" or were you looking for a script
that extracts the file names from an ls of the directory and the song
names from the separate file, and rewrites the file name?  (Can that be
done? -- I think so, maybe takes Perl -- need to know more about how the
song names are stored in the file -- are they in order by track number,
do they include the track number??)  

<beginning of Major Aside and Digression -- look for end marker near end
of email>

Discussing this leads into "How to Write a Script" -- Dan has put forth
a Concept (or desire) and provided some information about the available
inputs and desired outputs.

Paul has provided a script that confirms there is one way to do at least
part of the task.

I have questioned (in the list above) whether that is the entire task,
and raised a few questions that require more knowledge of the available
inputs and more knowledge of the desired results -- we are engaged in
Design and Paul has provided more than Pseudocode to address a portion
(or all) of the task.  (But, in another sense it is pseudocode -- if we
eventually need to switch to Perl to do the entire task, Paul's ksh code
would have to be rewritten in Perl, or incorporated in the Perl
somehow.)

Inventing a concept (or desire, or requirement) is either the toughest
or easiest part of the task, and I'm not going to talk about that in
this note -- in this case we were handed a requirement, request, or
concept.

Once you have a concept (or requirement), I try to gather information
about the available inputs, the desired outputs, and the available
tools.  The more I program in a given language, the more I know about
the available tools, so that, after a while, I don't have to spend a lot
of time investigating the available tools but know many of them from
past experience (still, I want to watch for new tools that might be
helpful).

Some concepts or requirements might be simple enough that you go from
the inputs to the outputs in one step (one command, or one set of
related commands).  An example -- say I want to know what files are in a
given directory -- if I know the name of that directory, I can get the
list in one step using ls.  

If I can't see how to achieve the result in one step, I have to think
about what steps I could use to get from the inputs to the desired
outputs.  There is always more than one possible set of steps to
accomplish this (especially if you consider more than one programming
language).  

Your experience (or your experience plus the experience of your
colleagues) and ancillary requirements (like how much computer power is
available, are their limits on the amount of time, memory, bandwidth, or
whatever to perform the task) will influence how much time and effort
you put into thinking of and considering alternate approaches.  If you
know there is an ls command, you have a computer that runs the ls
command, the outputs of the ls command meet your needs, and it works
fast enough, it is probably not worth looking for an alternative.  (If
somebody tells you there is an alternative, and it is faster, easier,
the output is nicer, or whatever, you might or might not investigate it
further, depending on your available time, or whether you or your
"customer" is satisfied with the current results.)

Taking Dan's request as an example, I start making notes.  I list the
inputs available and the outputs desired.   My first set of notes may be
very general (and ugly) then I start to fill in the blanks.

Inputs: 
1. (directory of) mp3 files named: "ledzeplin track1 blackdog .MP3"
2. File containing songtitle for each track

Outputs:
1. (directory of) renamed mp3 files like
"ledzeplin__blackdog_songtitle_track#.mp3"

What do I know, what don't I know, what do I think I can do
(brainstorming):
1. Hmm, I can get a list of the mp3 file names using ls
2. Hmm, I could even put that list in a file if helpful, using ls > file
3. Hmm, if I have the file containing songtitles, how do I know which
songtitle goes with which track?  I need to ask Dan or get a sample
file.
4. Hmm, suppose I knew which songtitle went with which track, do I know
how to convert "ledzeplin track1 blackdog .MP3" plus "songtitle" to
"ledzeplin__blackdog_songtitle_track#.mp3"?  Well, my colleague Paul
Ryan jumped in at this point, and demonstrated one way to do it.  Do I
need to consider other ways?  Well, for now I won't, but, as with
everything else, we may recognize reasons to reconsider later.

At this point, I might wait for information from Dan, or I might make
some assumptions just to keep moving.  (This moving could be totally
wasted or could be useful if your assumptions are good or just to
develop options -- if you have other things to do it is probably
appropriate to do those other things and come back when you have more
information.  It is also good to let your subconscious mind have a
chance to work on the problem.)

Let's assume Dan comes back and says either of two things:

1. The file is a list of songtitles, in order by track number, one
songtitle per line (ASCII text file, etc. etc.)  (We might also need to
tell Dan that, if that's all the information in the file, and that file
is in the wrong order, we can do nothing but give the wrong results.  We
might do some error checking to the extent we can -- if we find 11 mp3
files in a directory, but some number other than eleven song titles in
the file, we know something is wrong.)

(We have similar potential problems and can do similar error checks with
the next file format.  We might also ask Dan how critical this is,
should it be checked somehow, how can it be checked.  Maybe this file is
produced by another computer program and there is something like an md5
checksum that can be checked to confirm that the file produced by the
first program is the file we are working with (but of course, how do we
know the first program worked correctly?).  Maybe the outputs of our
script should be presented to the operator so he can visually confirm or
correct them?.  This may be overkill for a program dealing with mp3
songtitles.  If the program is for flying an airplane or a space
shuttle, it is probably underkill.)

2. The file contains all the songtitles, one songtitle per line, with
the track number followed by a period, followed by one or more spaces,
followed by the songtitle (ASCII text file, etc.), but not necessarily
in order by track number.

For best results, we really need to know what is in the file, but we can
imagine that these are probably two realistic possibilities, and we
might realize that we can convert from one to the other if we need to. 
For example, if we get the file with the track numbers, I am sure we can
develop a script to sort the file in order by the track numbers, and
then read each line of the file, strip off the leading number and period
and whitespace, write each songtitle to a new file, one songtitle per
line, songtitles in the proper order (first to last), then rename the
new file to the old file (if necessary).  (I know I can do this in
Pascal, C, Fortran, or any of many other languages.  I suspect I can
find script "commands" that would be easier to use to accomplish the
same results, possibly slower on the computer, but with less effort in
writing the program.)

So, let's continue with the assumption that we have a file of
songtitles, one songtitle per line, sorted by track number.

Now, what about the filenames?  Well, I assume the files are in a
directory and know that I can get the names into a file using ls > file
(if that is useful).  (Since the track number is included in the
filename, I know I can sort the file names in order by track number also
-- although I'll ignore the details of how to do that for now -- it
would require that the sort routine look at the track number which is
not the first thing on the line, and all artist names are not likely to
be the same length, so I need to figure out how to get my sort routine
to sort based on the proper portion of the file name.  And, if there can
be more than 9 tracks (i.e., two digit numbers, is track one labeled
"1", or "01" (or even "0" or "00")  Until I know the details, and know
that I need to sort the file, I won't worry about it -- I'm sure I can
do it if necessary.)  (Is it useful to put the filenames into a file?  I
don't know -- for now, let's move on.)  

Ok, so now what else do I know or need to think about?  The end result
is that we want to rename the mp3 files (wherever they are) with new
names that include the song title.  I know that there are shell commands
that let me rename a file.  (In Linux bash, I think "mv" is one of those
commands.)  What does the syntax of mv look like?

man would say: mv [OPTION] SOURCE DEST
(and some alternatives) 

To serve my own thinking process, I'll rewrite that as: mv
<old_file_name> <new_file_name>

So, where do I go from here?

Well, I've convinced myself that there is probably a way to do each of
the little bits that might be required to do the whole process.  

Questions I'll ask myself now (or anytime):

-Can I outline a sequence of steps to do the whole process, using the
little bits?
-Do I see any roadblocks -- I mean do I think one of these little bits
will be a problem -- too difficult, too slow, too?  Do I already
recognize a need to develop alternatives to one of these little bits, or
flesh it out in more detail to think about whether it will be a problem?

In this case, I'll probably think about developing a sequence of steps:

I may want to:
1. 
2.
3. Move to the directory where the mp3 files are stored.
4. Set up a loop or something to work on each file, one after the
other.  (After I rename the files, will I put them back in this
directory or move them (even temporarily) to another directory -- if I
do this, I know I'm done when there are no more files in this directory
(assuming the directory contains only tracks from one "album").
5. Get one file's name.  (Not sure how I do that yet, but I suspect
there is a fairly simple way.)
6. Get the track number from that file's name.  (Not sure how I do that
yet, but I'm sure it can be done.)
7. Open the file containing the songtitles, and get the songtitle for
this track.  (Not too hard, need to decide whether to delete the
songtitle or keep it in the file -- if I was doing the tracks in order,
I might delete the songtitles -- treat the file as a stack, but if I'm
going to search the file for the nth title, I don't want to delete
songtitles because that will make it more confusing -- if I remove the
3rd songtitle first then I'd have to remember that the songtitles for
the first and second track are first and second in the file, but the
songtitles for the fourth track and up are third, fourth, etc. in the
file.  So, one thing I want to think about is whether I can read the mp3
file names in order by track number, or maybe this is a reason to write
the track numbers to a file, sort the file, and then process the files
slightly differently, as I'll describe below.)
8. Create the new file name, by parsing the old file name and adding the
songtitle and the underscores as required (Paul's code).
9. Rename this file, with something like:
mv "old file name.MP3" newdir/old_file_songtitle_name.MP3
(This combines renaming the file with moving it to a different
directory.)
10. Repeat from step 5 (or 6) if there is another file.
11. If there are no more files, I'm almost done -- I probably want to
move the newly named files from the new directory back to the old
directory, and then destroy the new directory.  I can do this with a
"group" move command (in other words using wild cards, how ever they
work in Linux -- I don't have to have a loop to do one file at a time)
and then a rmdir (or whatever the command is).


Notice that I left steps 1 and 2 blank.  I know that there is some
initialization I have to do, I just didn't want to deal with it yet.  In
this case, some of the initialization is getting the songtitle file
organized and sorted properly.

After going through the above, I sort of think (without knowing exactly
why -- maybe because I'm uncomfortable with several of the steps listed
above) that I might prefer the approach of working with a file of mp3
file names, so I decide to try to develop another sequence of steps
using that approach:

<still leaving reminders for initializations that I might recognize
later>
1.
2.
<some intializations I know I need to do>
3. Organize the songtitle file (sort in track number order, strip
numbers (maybe))
4. Create a file of mp3 file names, using ls > file and then sort by
track number)
<outline the process>
5. Set up a loop (somehow) 
6. Read an mp3 file name from the file of mp3 filenames.  (In this case,
I can delete the file name from the file -- I'll keep processing until
the file is empty.)
7. Get the songtitle for that track from the songtitle file.  (Again, I
can delete the songtitle from this file -- for the next track I'll just
get the first songtitle in the file.)
8. Create the new filename for this file by parsing and combining the
original name and the songtitle (Paul's code)
9. Use the mv command to rename the file and move to a new directory
(like previous procedure -- 
mv "old file name.MP3" newdir/old_file_songtitle_name.MP3
(This combines renaming the file with moving it to a different
directory.)
10. Repeat from step 6 unless there are no more files.
11. Cleanup, like previous procedure (cut and paste is so helpful --
saves my writing hand):
If there are no more files, I'm almost done -- I probably want to move
the newly named files from the new directory back to the old directory,
and then destroy the new directory.  I can do this with a "group" move
command (in other words using wild cards, how ever they work in Linux --
I don't have to have a loop to do one file at a time) and then a rmdir
(or whatever the command is).

For good or bad reasons, I like the second procedure a little better --
it feels more comfortable and logical.

What I have above is some sort of pseudocode for the process, consisting
of some snippets of real code (Paul's) and some psuedocode at various
levels of vagueness.

If I was working by myself, I'd probably try to take a break, let this
sit overnight, come back to it, work out more of the detail.  Since I
have colleagues in the LVLUG, I'm sending this to them.  See what
comments they have.  See what details Dan has.  Paul and others who've
worked longer in Linux may see areas where I've headed in a direction
influenced too much by my past experience in dos / Windows / Pascal (or
lack of experience).

Dan (as sort of the customer), may have comments on whether he sees a
need to make the process more reliable -- should the output be presented
to the "user" for review and confirmation.  Did I make some wrong
assumptions?

<end of Major Aside and Digression>

The material mentioned so far can easily fill more than one monthly
presentation -- a meeting at the library usually starts at 6:30, we
should be out of there by 8:45 (they close at 9:00) -- with other
discussion we should figure on 1 1/2 to 2 hours of presentation time at
a meeting. (??) 

My bigger interest is bash, because that's the shell I'm currently using
(by default).  But, if bash is the wrong choice, or not the best choice,
I'd like to find that out.  (Well, I guess I don't think it's the wrong
choice because, AFAIK, it's the default for Mandrake and RedHat  -- any
Linux user would at least have to know enough about it to switch to his
preferred shell.)

Does it make sense to present one scripting language at one meeting, and
then another at the next meeting, highlighting the differences?

What is anybody else looking for?  What can anybody else volunteer for?

Brian and Paul?  Other volunteers?

Just a brainstorming document -- comments?
Randy Kramer