The Maben / Amaus HOLOCAUST
This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.
I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.
No rsync - and wget probe every fucking single file, this is very annoying.
So:
File listing
I download allfiles.txt:
wget amaus.org/static/S100/allfiles.txt
Check for existing files
Get rid of first and last line, those are descriptions
asbesto@rover:~$ head -1 allfiles.txt All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018 asbesto@rover:~$ tail -1 allfiles.txt All files listing completed at Thu Oct 11 16:13:18 BST 2018 asbesto@rover:~$
sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt
Filter and shape the file
Format of the file is:
-rwxrwxrwx 1 root root 2346222 May 1 2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf
We want something like
wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf
only for FILES, no directories! they will be created later.
Get rid of directories:
grep -v drwx allfiles2.txt > allfiles3.txt
Cut out first 8 columns to leave only the filename:
awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt
Now shape the filenames etc.
Add brackets:
sed -i 's/.*/"&"/' allfiles4.txt
and use an editor to trasform
" (8 spaces) /" into COMANDO /"
Now we have lines like those:
COMANDO "data/static/S100/extensys/photos/Extensys RM64 64K RAM.txt" COMANDO "data/static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg" COMANDO "data/static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
Here's the basic file for download those FUCKING PIECES OF SHIT.
CHECK FOR DUPES
Now COMANDO must serve to check if the file exists in our backup. so:
what the FUCK use to download
We try wget now that we DON'T have any FUCKING