meanings of "DOCUMENTS" and "DHT WORDS", "URLs" and "RWIs"

Discussion in English language.
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

meanings of "DOCUMENTS" and "DHT WORDS", "URLs" and "RWIs"

Beitragvon ktplulo » Do Jul 17, 2014 11:49 am

On the Status.html page, there are numbers: "Documents" and "DHT Words". What exactly do they mean?

How necessary are the things counted? How safe are they to delete? Is one of them used only for ranking?

At IndexControlURLs_p.html, if I click "Generate Statistics", I can delete things for a domain, what do I lose then? I've deleted all domains, and now sending at http://yacy.local:8090/IndexControlRWIs_p.html sends "0" words, while all URLs are "not found". How do I delete all the URLs without words then, or all words without URLs?

At IndexControlRWIs_p.html, I can delete a word. There are two checkboxes, one of which says it's safe to delete the URLs, although it will produce something unresolved. The other one says it's very extensive. What does that mean?
Beiträge: 18
Registriert: Do Mär 01, 2012 11:27 am

Re: meanings of "DOCUMENTS" and "DHT WORDS", "URLs" and "RWI

Beitragvon sixcooler » Do Jul 17, 2014 2:53 pm

Hello ktplulo,

by "Documents" we mean the information of pages, pictures, files, etc, that where indexed. Sometimes we use 'URLs' as synonym for that
By "DHT Words" we mean database-entrys of (hashed) words from Documents an where to find them in the documents-database. That why it also called RWI (reverse word index).
Counting these values is just an information for people, looking at them :-)

The more documents you have - the more entrys could be found on your local-machine.
The dht-words are primary used to be distributed in the p2p-network in order to concentrate documents for words on machines in the network.

YaCy is robust against deleting the one ore other, but deleted dht-words can't be distributed and deleted documents can't be found or distributed.
Since we switched to store documents in solr every document can also be found - even if there is no dht-word pointing at them.

If you delete things for a domain you delete the documents for a domain.
If you deleted all domains your index shoud be very smal now :-)

The delete-options for words may be a little outdated. If your realy whant to delete a word it is ok to do it without any options.
If you also whant delete the documents for a word use that option.

cu, sixcooler.
Beiträge: 495
Registriert: Do Aug 14, 2008 5:22 pm

Re: meanings of "DOCUMENTS" and "DHT WORDS", "URLs" and "RWI

Beitragvon ktplulo » Do Jul 17, 2014 4:06 pm

Thanks. It might be helpful to add it to the wiki.
Beiträge: 18
Registriert: Do Mär 01, 2012 11:27 am

Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 3 Gäste