= The Maben / Amaus HOLOCAUST = This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org. I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing. No rsync - and wget probe every fucking single file, this is very annoying. So: == File listing == I download allfiles.txt: {{{ wget amaus.org/static/S100/allfiles.txt }}} == Check for existing files == === Get rid of first and last line, those are descriptions === {{{ asbesto@rover:~$ head -1 allfiles.txt All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018 asbesto@rover:~$ tail -1 allfiles.txt All files listing completed at Thu Oct 11 16:13:18 BST 2018 asbesto@rover:~$ }}} {{{ sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt }}} === Filter and shape the file === Format of the file is: {{{ -rwxrwxrwx 1 root root 2346222 May 1 2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf }}} We want something like {{{ wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf }}} only for FILES, no directories! they will be created later. Get rid of directories: {{{ grep -v drwx allfiles2.txt > allfiles3.txt }}} Cut out first 8 columns to leave only the filename: {{{ awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt }}} Now shape the filenames etc. Add brackets: {{{ cp allfiles4.txt allfiles5.txt sed -i 's/.*/"&"/' allfiles5.txt }}} and use an editor to trasform {{{ " (8 spaces) / into "./ Also, "data" MUST GO. }}} Now we have allfiles5.txt with lines like those: {{{ "./static/S100/extensys/photos/Extensys RM64 64K RAM.txt" "./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg" "./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf" }}} == CHECK FOR DUPES == '''NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them!''' To substitute $ on vi, use {{{ press : and %s/\$/\\$/g }}} Now we have allfiles8.txt, and so copy it into the correct dir and do: {{{ cd /media/asbesto/BALAZZO/amaus.org while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro }}} outo will contain existing files, erro the missing files. wc sum of outo and erro must match the wc count of allfiles8.txt {{{ root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro 80156 278481 5208866 outo 123829 1325067 14779570 erro 203985 1603548 19988436 total root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt 203985 612916 14165878 allfiles8.txt root@rover:/media/asbesto/BALAZZO/amaus.org# }}} That's it! Now we can use "erro" to download what we FUCKING NEED. == what the FUCK use to download == So add the path to the internet site, you must have {{{ 'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF' 'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF' 'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF' 'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf' }}} for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work! {{{ while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt }}}