The Maben / Amaus HOLOCAUST

This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.

I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.

No rsync - and wget probe every fucking single file, this is very annoying.

So:

File listing

I download allfiles.txt:

wget amaus.org/static/S100/allfiles.txt

Check for existing files

Get rid of first and last line, those are descriptions

asbesto@rover:~$ head -1 allfiles.txt
All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018
asbesto@rover:~$ tail -1 allfiles.txt
All files listing completed at Thu Oct 11 16:13:18 BST 2018
asbesto@rover:~$

sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt

Filter and shape the file

Format of the file is:

-rwxrwxrwx 1 root root 2346222 May  1  2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

We want something like

wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

only for FILES, no directories! they will be created later.

Get rid of directories:

grep -v drwx allfiles2.txt  > allfiles3.txt

Cut out first 8 columns to leave only the filename:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt

Now shape the filenames etc.

Add brackets:

sed -i 's/.*/"&"/' allfiles4.txt

and use an editor to trasform

" (8 spaces) /" 

into 

COMANDO /" 

Now we have lines like those:

COMANDO  "data/static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
COMANDO  "data/static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
COMANDO  "data/static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"

Here's the basic file for download those FUCKING PIECES OF SHIT.

CHECK FOR DUPES

Now COMANDO must serve to check if the file exists in our backup. so:

what the FUCK use to download

We try wget now that we DON'T have any FUCKING