Differences between revisions 5 and 6
Revision 5 as of 2018-10-12 20:28:53
Size: 3449
Editor: asbesto
Comment:
Revision 6 as of 2018-10-12 20:47:28
Size: 3544
Editor: asbesto
Comment:
Deletions are marked like this. Additions are marked like this.
Line 140: Line 140:
for all files, and do: for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!
Line 145: Line 145:

The Maben / Amaus HOLOCAUST

This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.

I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.

No rsync - and wget probe every fucking single file, this is very annoying.

So:

File listing

I download allfiles.txt:

wget amaus.org/static/S100/allfiles.txt

Check for existing files

Get rid of first and last line, those are descriptions

asbesto@rover:~$ head -1 allfiles.txt
All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018
asbesto@rover:~$ tail -1 allfiles.txt
All files listing completed at Thu Oct 11 16:13:18 BST 2018
asbesto@rover:~$

sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt

Filter and shape the file

Format of the file is:

-rwxrwxrwx 1 root root 2346222 May  1  2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

We want something like

wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

only for FILES, no directories! they will be created later.

Get rid of directories:

grep -v drwx allfiles2.txt  > allfiles3.txt

Cut out first 8 columns to leave only the filename:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt

Now shape the filenames etc.

Add brackets:

cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt

and use an editor to trasform

" (8 spaces) / 

into 

"./ 

Also, "data" MUST GO.

Now we have allfiles5.txt with lines like those:

"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"

CHECK FOR DUPES

NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them! To substitute $ on vi, use

press : and
%s/\$/\\$/g

Now we have allfiles8.txt, and so copy it into the correct dir and do:

cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt

root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156   278481  5208866 outo
  123829  1325067 14779570 erro
  203985  1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985   612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#

That's it!

Now we can use "erro" to download what we FUCKING NEED.

what the FUCK use to download

So add the path to the internet site, you must have

'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF'
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf'

for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!

 while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt

MabenAmausHolocaust (last edited 2018-10-12 20:47:28 by asbesto)