Differences between revisions 1 and 6 (spanning 5 versions)

The Maben / Amaus HOLOCAUST

This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.

I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.

No rsync - and wget probe every fucking single file, this is very annoying.

So:

File listing

I download allfiles.txt:

wget amaus.org/static/S100/allfiles.txt

Check for existing files

Get rid of first and last line, those are descriptions

asbesto@rover:~$ head -1 allfiles.txt
All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018
asbesto@rover:~$ tail -1 allfiles.txt
All files listing completed at Thu Oct 11 16:13:18 BST 2018
asbesto@rover:~$

sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt

Filter and shape the file

Format of the file is:

-rwxrwxrwx 1 root root 2346222 May  1  2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

We want something like

wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

only for FILES, no directories! they will be created later.

Get rid of directories:

grep -v drwx allfiles2.txt  > allfiles3.txt

Cut out first 8 columns to leave only the filename:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt

Now shape the filenames etc.

Add brackets:

cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt

and use an editor to trasform

" (8 spaces) / 

into 

"./ 

Also, "data" MUST GO.

Now we have allfiles5.txt with lines like those:

"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"

CHECK FOR DUPES

NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them! To substitute $ on vi, use

press : and
%s/\$/\\$/g

Now we have allfiles8.txt, and so copy it into the correct dir and do:

cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt

root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156   278481  5208866 outo
  123829  1325067 14779570 erro
  203985  1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985   612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#

That's it!

Now we can use "erro" to download what we FUCKING NEED.

what the FUCK use to download

So add the path to the internet site, you must have

'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF'
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf'

for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!

 while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt

-  ⇤ ← Revision 1 as of 2018-10-12 16:52:47 → 
  Size: 2180
  Editor: asbesto
  Comment:
+   ← Revision 6 as of 2018-10-12 20:47:28 → ⇥
  Size: 3544
  Editor: asbesto
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 68:
-sed -i 's/.*/"&"/' allfiles4.txt
+cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt
-Line 73:
+Line 74:
-" (8 spaces) /"
+" (8 spaces) /
-Line 77:
+Line 78:
-COMANDO /"
+"./ 

Also, "data" MUST GO.
-Line 81:
+Line 84:
-Now we have lines like those:
+Now we have allfiles5.txt with lines like those:
-Line 84:
+Line 87:
-COMANDO  "data/static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
COMANDO  "data/static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
COMANDO  "data/static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
+"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
-Line 88:
+Line 91:
-Here's the basic file for download those FUCKING PIECES OF SHIT.
-Line 93:
+Line 94:
-Now COMANDO must serve to check if the file exists in our backup. so:
+'''NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them!'''
To substitute $ on vi, use

{{{
press : and
%s/\$/\\$/g
}}}
-Line 96:
+Line 103:
+Now we have allfiles8.txt, and so copy it into the correct dir and do:

{{{
cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro
}}}

outo will contain existing files, erro the missing files. 

wc sum of outo and erro must match the wc count of allfiles8.txt


{{{
root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156   278481  5208866 outo
  123829  1325067 14779570 erro
  203985  1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985   612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#
}}}

That's it!

Now we can use "erro" to download what we FUCKING NEED.
-Line 99:
+Line 131:
-We try wget now that we DON'T have any FUCKING
+So add the path to the internet site, you must have
-Line 101:
+Line 133:
+{{{
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF'
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf'
}}}
-Line 102:
+Line 140:
+for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!
-Line 103:
+Line 142:
+{{{
 while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt
}}}