Size: 2904
Comment:
|
Size: 2972
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 95: | Line 95: |
To substitute $ on vi, use {{{ press : and %s/\$/\\$/g }}} |
The Maben / Amaus HOLOCAUST
This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.
I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.
No rsync - and wget probe every fucking single file, this is very annoying.
So:
File listing
I download allfiles.txt:
wget amaus.org/static/S100/allfiles.txt
Check for existing files
Get rid of first and last line, those are descriptions
asbesto@rover:~$ head -1 allfiles.txt All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018 asbesto@rover:~$ tail -1 allfiles.txt All files listing completed at Thu Oct 11 16:13:18 BST 2018 asbesto@rover:~$
sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt
Filter and shape the file
Format of the file is:
-rwxrwxrwx 1 root root 2346222 May 1 2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf
We want something like
wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf
only for FILES, no directories! they will be created later.
Get rid of directories:
grep -v drwx allfiles2.txt > allfiles3.txt
Cut out first 8 columns to leave only the filename:
awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt
Now shape the filenames etc.
Add brackets:
cp allfiles4.txt allfiles5.txt sed -i 's/.*/"&"/' allfiles5.txt
and use an editor to trasform
" (8 spaces) / into "./ Also, "data" MUST GO.
Now we have allfiles5.txt with lines like those:
"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt" "./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg" "./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
CHECK FOR DUPES
NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them! To substitute $ on vi, use
press : and %s/\$/\\$/g
Now we have allfiles8.txt, and so copy it into the correct dir and do:
cd /media/asbesto/BALAZZO/amaus.org while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro
outo will contain existing files, erro the missing files.
wc sum of outo and erro must match the wc count of allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro 80156 278481 5208866 outo 123829 1325067 14779570 erro 203985 1603548 19988436 total root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt 203985 612916 14165878 allfiles8.txt root@rover:/media/asbesto/BALAZZO/amaus.org#
That's it!
Now we can use "erro" to download what we FUCKING NEED.
what the FUCK use to download
We try wget now that we DON'T have any FUCKING