[Lvlug] Writing an awk or Ruby script
rhkramer at gmail.com
Sun Jan 7 18:35:10 EST 2007
This was mentioned in another thread, and I was going to start by converting
my existing nedit macro and then extending it, but I decided to make it
simpler by keeping the existing nedit macro and using it for the (one-time)
partial conversion. The following steps will be required "continuously" in
the future (i.e., not just for the one time conversion).
As mentioned in the other post, I'll consider either an awk or a Ruby approach
to doing this.
Given a file that looks much like an mbox file, that is with a header for each
record something like this:
From rhk Wed Dec 27 00:05:00 2006
Date: 27 Dec 2006 00:00:00 -0500
Subject: *00000001* A record title
UniqFN: "2006 12 27 00:00:00 *00000001*.aml"
<some other (non-email-like) headers>
<record content (text)>
Break the file apart so that each record is in a separate file (complete with
the "From " record separator).
The name of the file should be either:
The content of the UniqFN "field", e.g.: "2006 12 27 00:00:00 *00000001*.aml"
The text from the Subject: "field" plus a .aml extension, e.g.: "A record
In the case of the UniqFN filename, there are no duplicates (at least during
this initial mass conversion step)--I will have to be prepared to deal with
possible duplicates in the future (see the procedure outlined below for the
In the case of the Subject filename, there is a reasonable chance there will
be duplicates, and I want a good/user friendly way of handling them, perhaps
something like this:
* Check the directory the file would be written to to see if there is
already a file with that name.
* If so, warn the user and display the proposed filename in an edit box
which the user can edit to create a different file name
* Check the directory again to confirm the edited filename is not a
duplicate, and repeat the previous step if necessary
* [Ideally, the Subject: field in the record should be changed to reflect
the edited title]
* Write the file to the directory with that (non-duplicate) filename
An alternate could be to let the script append a (serial) number to any
duplicate filename to make it unique--this eliminates the need for the
operator intervention. As above, the Subject: field should be modified to
reflect the new title.
PS: I don't think it matters here, but my current plan is that these files
will have multiple filenames created via hardlinks, so any particular file
will have a UniqFN and one or more mnemonic file names (from one or more
titles). I'll start with one filename per file, and don't care too much
which. (I'll always have a way of finding either from the other.)
More information about the Lvlug