language recognition and language restriction

Ideen und Vorschläge sind willkommen.

language recognition and language restriction

Beitragvon streetfighter » Sa Jan 03, 2009 2:26 pm

will be great to have options:
- limit crawl result only for german, english, etc
better will be
- limit crawler for specified language - spider will index only german, english, etc. pages
streetfighter
 
Beiträge: 37
Registriert: Sa Jan 03, 2009 9:40 am

Re: language recognition and language restriction

Beitragvon Low012 » Sa Jan 03, 2009 4:33 pm

You can add a parameter to select the language of your search results, but it does not work very well due to several reasons yet.

As you can see, http://4o4.dyndns.org:8080/yacysearch.h ... lr=lang_de and http://4o4.dyndns.org:8080/yacysearch.h ... lr=lang_en give you different results and they are working pretty good, but only part of the results in http://4o4.dyndns.org:8080/yacysearch.h ... lr=lang_pl is in the Polish language.

We are working on it...

Limiting the crawler to a special language is only possible once the page is loaded already. It would probably make more sense to limit the crawler to certain domains which is something which has been requested before.
Low012
 
Beiträge: 2214
Registriert: Mi Jun 27, 2007 12:11 pm

Re: language recognition and language restriction

Beitragvon streetfighter » Sa Jan 03, 2009 5:52 pm

You have right - mechanism is not working well now. I am waiting for final version.


I was asking about limit crawler for specified language becouse I was thinking about indexing let's say whole .de domain (no problem) but only pages in german 8-)
streetfighter
 
Beiträge: 37
Registriert: Sa Jan 03, 2009 9:40 am


Zurück zu Wunschliste

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste