Storage size vs. number of documents

Keine Scheu, hier darf alles gefragt und diskutiert werden. Das ist das Forum für YaCy-Anfänger. Hier kann man 'wo muss man klicken' fragen und sich über Grundlagen zur Suchmaschinentechnik unterhalten.
Forumsregeln
Hier werden Fragen beantwortet und wir versuchen die Probleme von YaCy-Newbies zu klären. Bitte beantwortete Fragen im YaCy-Wiki http://wiki.yacy.de dokumentieren!

Storage size vs. number of documents

Beitragvon tinkerphone » Fr Okt 10, 2014 10:50 am

Hi,
i run the server tinkerphone_srv0. It has around 3.4 million docs and 7 million DHT words. This takes up more than 66 GB. The wiki says that "Bei 10 Mio Webseiten ist eine Indexgröße von 20GB nicht untypisch." http://www.yacy-websuche.de/wiki/index.php/De:FAQ#Speicherplatz. So it should take aprox not more than 20 GB on my server. Why does Yacy took 66 GB?
tinkerphone
 
Beiträge: 26
Registriert: Fr Okt 10, 2014 10:38 am

Re: Storage size vs. number of documents

Beitragvon sixcooler » Fr Okt 10, 2014 1:29 pm

Hello tinkerphone,

I'm sorry, but it is not possible to aproximate the amount of disk-space per count of Documents in the index.
Indexes may have Docs of more ore less Words per Documents.
Some indexes use citation reference index, some use webgraph search index.
Some Indexes have a very high count of references per dht-word, or some one uses the limitation of the reference-count.
Theses are some of the factors affecting the usage of your disk-space.

cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: Storage size vs. number of documents

Beitragvon tinkerphone » Fr Okt 10, 2014 1:44 pm

Hi sixcooler,
thanks for that clarification. I could not find a good "newbe" setup for yacy. The defaults don´t seem to be good for a start - alt last if you don´t have a real clue about search engines (such as me).
It would be great if the wiki could have some examples for the setup which makes it more clear "what if...".

I started experimenting with yacy because i like the concept of a free search engine which uses the p2p concept. I am not interested in the depths of search engine algorithms. For an ignorant like me the 3 million links for 66 GB don´t look as a success.

May I ask you about your setup? You have quite some documents on your server. How much GB do they take?
tinkerphone
 
Beiträge: 26
Registriert: Fr Okt 10, 2014 10:38 am

Re: Storage size vs. number of documents

Beitragvon sixcooler » Di Okt 14, 2014 1:50 pm

Hello tinkerphone,

for my index of 48 million documents YaCy uses about 100GB.
But my Setup ist not representative, because I limit my index very much:

I use a Limitation of number of references per word of 10.000, wich is very low, but having a high amount of RWIs uses much RAM
(/IndexControlRWIs_p.html)

I don't use any Web Structure Index.
Even this is a cool feature of YaCy, this takes a lot of resources. For me the benefit does not compensate the cost, but perhaps I should give it another try.
(/IndexFederated_p.html)

I limit the token count in the solr-schema.
Doing so limits the amount of space used by the Index by cost of loosing the full information of the documents.

As you can see I gave up a lot to get my Index that compact.

cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: Storage size vs. number of documents

Beitragvon tinkerphone » Di Okt 14, 2014 2:20 pm

Hi again :)
thanks! Thats some nice info. I will tinker on the basis of that!
tinkerphone
 
Beiträge: 26
Registriert: Fr Okt 10, 2014 10:38 am


Zurück zu Hilfe für Einsteiger und Anwender

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast