Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2018-10-12 16:52:47
Size: 2180
Editor: asbesto
Comment:
Revision 4 as of 2018-10-12 20:24:06
Size: 2972
Editor: asbesto
Comment:
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
sed -i 's/.*/"&"/' allfiles4.txt cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt
Line 73: Line 74:
" (8 spaces) /" " (8 spaces) /
Line 77: Line 78:
COMANDO /" "./

Also, "data" MUST GO.
Line 81: Line 84:
Now we have lines like those: Now we have allfiles5.txt with lines like those:
Line 84: Line 87:
COMANDO "data/static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
COMANDO "data/static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
COMANDO "data/static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"
Line 88: Line 91:

Here's the basic file for download those FUCKING PIECES OF SHIT.
Line 93: Line 94:
Now COMANDO must serve to check if the file exists in our backup. so: '''NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them!'''
To substitute $ on vi, use

{{{
press : and
%s/\$/\\$/g
}}}


Now we have allfiles8.txt, and so copy it into the correct dir and do:

{{{
cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro
}}}

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt


{{{
root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156 278481 5208866 outo
  123829 1325067 14779570 erro
  203985 1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985 612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#
}}}

That's it!

Now we can use "erro" to download what we FUCKING NEED.






Line 100: Line 141:



The Maben / Amaus HOLOCAUST

This is the log of my effort to mirror the infamous S100 archive, called MABEN / Amaus.org.

I have most of it in my disk, it was mirrored time ago. Now something was added and I want to download only what's missing.

No rsync - and wget probe every fucking single file, this is very annoying.

So:

File listing

I download allfiles.txt:

wget amaus.org/static/S100/allfiles.txt

Check for existing files

Get rid of first and last line, those are descriptions

asbesto@rover:~$ head -1 allfiles.txt
All files from S100 directory listed here at Thu Oct 11 16:00:01 BST 2018
asbesto@rover:~$ tail -1 allfiles.txt
All files listing completed at Thu Oct 11 16:13:18 BST 2018
asbesto@rover:~$

sed '$d' < allfiles.txt | sed "1d" > allfiles2.txt

Filter and shape the file

Format of the file is:

-rwxrwxrwx 1 root root 2346222 May  1  2009 /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

We want something like

wget -options /data/static/S100/avo/Avometer Model 8 Mk II Working Instructions.pdf

only for FILES, no directories! they will be created later.

Get rid of directories:

grep -v drwx allfiles2.txt  > allfiles3.txt

Cut out first 8 columns to leave only the filename:

awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print}' allfiles3.txt > allfiles4.txt

Now shape the filenames etc.

Add brackets:

cp allfiles4.txt allfiles5.txt
sed -i 's/.*/"&"/' allfiles5.txt

and use an editor to trasform

" (8 spaces) / 

into 

"./ 

Also, "data" MUST GO.

Now we have allfiles5.txt with lines like those:

"./static/S100/extensys/photos/Extensys RM64 64K RAM.txt"
"./static/S100/microdesign/photos/Microdesign MR 8 RAM PROM card.jpg"
"./static/S100/kontron/systems/z80a-ecb-e1_kontron_ger_bwr.pdf"

CHECK FOR DUPES

NOTE: some files contain $. So find them and replace, on joe, from "$" to "\\\$" to escape them! To substitute $ on vi, use

press : and
%s/\$/\\$/g

Now we have allfiles8.txt, and so copy it into the correct dir and do:

cd /media/asbesto/BALAZZO/amaus.org
while read -r line; do ls "$line"; done < allfiles8.txt 1>outo 2>erro

outo will contain existing files, erro the missing files.

wc sum of outo and erro must match the wc count of allfiles8.txt

root@rover:/media/asbesto/BALAZZO/amaus.org# wc outo erro
   80156   278481  5208866 outo
  123829  1325067 14779570 erro
  203985  1603548 19988436 total
root@rover:/media/asbesto/BALAZZO/amaus.org# wc allfiles8.txt
  203985   612916 14165878 allfiles8.txt
root@rover:/media/asbesto/BALAZZO/amaus.org#

That's it!

Now we can use "erro" to download what we FUCKING NEED.

what the FUCK use to download

We try wget now that we DON'T have any FUCKING

MabenAmausHolocaust (last edited 2018-10-12 20:47:28 by asbesto)