[Lvlug] Writing an awk or Ruby script

Randy Kramer rhkramer at gmail.com
Sun Jan 7 18:35:10 EST 2007


This was mentioned in another thread, and I was going to start by converting 
my existing nedit macro and then extending it, but I decided to make it 
simpler by keeping the existing nedit macro and using it for the (one-time) 
partial conversion.  The following steps will be required "continuously" in 
the future (i.e., not just for the one time conversion).

As mentioned in the other post, I'll consider either an awk or a Ruby approach 
to doing this.

Given a file that looks much like an mbox file, that is with a header for each 
record something like this:

<example record>
From rhk Wed Dec 27 00:05:00 2006
Date: 27 Dec 2006 00:00:00 -0500
From: rhk
To: rhk
Subject: *00000001* A record title

UniqFN: "2006 12 27 00:00:00 *00000001*.aml"
<some other (non-email-like) headers>

<record content (text)>

</example record>

Break the file apart so that each record is in a separate file (complete with 
the "From " record separator).

The name of the file should be either:

The content of the UniqFN "field", e.g.: "2006 12 27 00:00:00 *00000001*.aml"

or:

The text from the Subject: "field" plus a .aml extension, e.g.: "A record 
title.aml"

In the case of the UniqFN filename, there are no duplicates (at least during 
this initial mass conversion step)--I will have to be prepared to deal with 
possible duplicates in the future (see the procedure outlined below for the 
Subject filename).

In the case of the Subject filename, there is a reasonable chance there will 
be duplicates, and I want a good/user friendly way of handling them, perhaps 
something like this:

   * Check the directory the file would be written to to see if there is 
already a file with that name.
   * If so, warn the user and display the proposed filename in an edit box 
which the user can edit to create a different file name
   * Check the directory again to confirm the edited filename is not a 
duplicate, and repeat the previous step if necessary
   * [Ideally, the Subject: field in the record should be changed to reflect 
the edited title]
   * Write the file to the directory with that (non-duplicate) filename

An alternate could be to let the script append a (serial) number to any 
duplicate filename to make it unique--this eliminates the need for the 
operator intervention.  As above, the Subject: field should be modified to 
reflect the new title.

Randy Kramer

PS: I don't think it matters here, but my current plan is that these files 
will have multiple filenames created via hardlinks, so any particular file 
will have a UniqFN and one or more mnemonic file names (from one or more 
titles).  I'll start with one filename per file, and don't care too much 
which.  (I'll always have a way of finding either from the other.)









More information about the Lvlug mailing list