Beitragvon ktplulo » Do Jul 17, 2014 11:49 am

On the Status.html page, there are numbers: "Documents" and "DHT Words". What exactly do they mean?

How necessary are the things counted? How safe are they to delete? Is one of them used only for ranking?

At IndexControlURLs_p.html, if I click "Generate Statistics", I can delete things for a domain, what do I lose then? I've deleted all domains, and now sending at http://yacy.local:8090/IndexControlRWIs_p.html sends "0" words, while all URLs are "not found". How do I delete all the URLs without words then, or all words without URLs?

At IndexControlRWIs_p.html, I can delete a word. There are two checkboxes, one of which says it's safe to delete the URLs, although it will produce something unresolved. The other one says it's very extensive. What does that mean?
Beitragvon sixcooler » Do Jul 17, 2014 2:53 pm

Hello ktplulo,

by "Documents" we mean the information of pages, pictures, files, etc, that where indexed. Sometimes we use 'URLs' as synonym for that
By "DHT Words" we mean database-entrys of (hashed) words from Documents an where to find them in the documents-database. That why it also called RWI (reverse word index).
Counting these values is just an information for people, looking at them :-)

The more documents you have - the more entrys could be found on your local-machine.
The dht-words are primary used to be distributed in the p2p-network in order to concentrate documents for words on machines in the network.

YaCy is robust against deleting the one ore other, but deleted dht-words can't be distributed and deleted documents can't be found or distributed.
Since we switched to store documents in solr every document can also be found - even if there is no dht-word pointing at them.

If you delete things for a domain you delete the documents for a domain.
If you deleted all domains your index shoud be very smal now :-)

The delete-options for words may be a little outdated. If your realy whant to delete a word it is ok to do it without any options.
If you also whant delete the documents for a word use that option.

cu, sixcooler.
Beitragvon ktplulo » Do Jul 17, 2014 4:06 pm

Thanks. It might be helpful to add it to the wiki.
