[Lvlug] Scripting help, Parsing text
Mark
mstanley at technologist.com
Wed Oct 18 23:01:20 EDT 2006
On Wednesday 18 October 2006 10:20 pm, Scott Phelan wrote:
> So I want to
> wget www.toothpastefordinner.com
>
> And that saves index.html to my drive.
> then I
>
> grep "Today's comic:" index.html > today.txt
>
> That leaves me with the one line that contains the link to today's comic.
>
> Next I want to parse today.txt to pull out the link so I can wget the
> .gif image and save it as today.gif.
>
> I know I'm looking to parse out everything between the <a href=" and the
> first > after that, but I don't know what commands I would use.
Don't bother, wget will parse what you need. Save the following in a text file
and change its attribute to executable.
The first line is always the same and lets the system know it's a batch file.
Line 2: Changes to a directory in your home directory (~) called
toothpastefordinner.
Line 3: Deletes all the old files in the directory.
Line 4: You already know.
Line 5: You already know except it saves the output to comic.html
Line 6: wget reads the file for its url input and will grab whatever link is
in there.
Line 7: Renames all gif files (should only be the one) to today.gif.
#!/bin/bash
cd ~/toothpastefordinner
rm *
wget www.toothpastefordinner.com
grep "comic:" index.html > comic.html
wget --force-html -i comic.html
mv *gif today.gif
-Mark
More information about the Lvlug
mailing list