Differences between revisions 1 and 6 (spanning 5 versions)
Revision 1 as of 2018-10-12 16:52:47
Size: 2180
Editor: asbesto
Comment:
Revision 6 as of 2018-10-12 20:47:28
Size: 3544
Editor: asbesto
Comment:
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
sed -i 's/.*/"&"/' allfiles4.txt cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt
Line 73: Line 74:
" (8 spaces) /" " (8 spaces) /
Line 77: Line 78:
COMANDO /" "./

Also, "data" MUST GO.
Line 81: Line 84:
Now we have lines like those: Now we have allfiles5.txt with lines like those:
Line 84: Line 87:
COMANDO "data/static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
COMANDO "data/static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
COMANDO "data/static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
Line 88: Line 91:

Here's the basic file for download those FUCKING PIECES OF SHIT.
Line 93: Line 94:
Now COMANDO must serve to check if the file exists in our backup. so: '''NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them!'''
To substitute $ on vi, use

{{{
press : and
%s/\$/\\$/g
}}}
Line 96: Line 103:
Now we have allfiles8.txt, and so copy it into the correct dir and do:

{{{
cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro
}}}

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt


{{{
root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156 278481 5208866 outo
  123829 1325067 14779570 erro
  203985 1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985 612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#
}}}

That's it!

Now we can use "erro" to download what we FUCKING NEED.
Line 99: Line 131:
We try wget now that we DON'T have any FUCKING So add the path to the internet site, you must have
Line 101: Line 133:
{{{
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF'
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf'
}}}
Line 102: Line 140:
for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!
Line 103: Line 142:
{{{
 while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt
}}}

The Maben / Amaus HOLOCAUST

This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.

I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.

No rsync - and wget probe every fucking single file, this is very annoying.

So:

File listing

I download allfiles.txt:

wget amaus.org/static/S100/allfiles.txt

Check for existing files

Get rid of first and last line, those are descriptions

asbesto@rover:~$ head -1 allfiles.txt
All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018
asbesto@rover:~$ tail -1 allfiles.txt
All files listing completed at Thu Oct 11 16:13:18 BST 2018
asbesto@rover:~$

sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt

Filter and shape the file

Format of the file is:

-rwxrwxrwx 1 root root 2346222 May  1  2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

We want something like

wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

only for FILES, no directories! they will be created later.

Get rid of directories:

grep -v drwx allfiles2.txt  > allfiles3.txt

Cut out first 8 columns to leave only the filename:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt

Now shape the filenames etc.

Add brackets:

cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt

and use an editor to trasform

" (8 spaces) / 

into 

"./ 

Also, "data" MUST GO.

Now we have allfiles5.txt with lines like those:

"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"

CHECK FOR DUPES

NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them! To substitute $ on vi, use

press : and
%s/\$/\\$/g

Now we have allfiles8.txt, and so copy it into the correct dir and do:

cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt

root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156   278481  5208866 outo
  123829  1325067 14779570 erro
  203985  1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985   612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#

That's it!

Now we can use "erro" to download what we FUCKING NEED.

what the FUCK use to download

So add the path to the internet site, you must have

'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook cover.PDF'
'https://amaus.org/static/S100/zilog/z80/older/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/The Z80 microcomputer handbook William Barden.PDF'
'https://amaus.org/static/S100/zilog/z80/Z80 Assembly subroutines Leventhal.pdf'

for all files, and you need to add wget -r -c -np in front of every fucking line because this SHIT down here doesnt work!

 while read -r line; do wget -r -c -np $line ; done < missingfiles1.txt

MabenAmausHolocaust (last edited 2018-10-12 20:47:28 by asbesto)