Filtering urls before domain crawling

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Filtering urls before domain crawling

Beitragvon Cyrille37 » Mo Apr 04, 2016 9:59 am

Hi,

It's possible to define regular expression to filter proxied urls (Blacklist_p.html), for cleaned the index (IndexDeletion_p.html), but I do not find filter for crawler. Did I miss something ?

The use case: when crawling Open MediaWiki site all links are indexed such like "action=edit section=1", "action=edit section=2" and so on for each pages sections ... :mrgreen:

Thanks & Cheers
Cyrille37
 
Beiträge: 7
Registriert: So Apr 03, 2016 2:12 pm

Re: Filtering urls before domain crawling

Beitragvon sixcooler » Mo Apr 04, 2016 7:10 pm

Hi Cyrille37,

on Blacklist_p.html, on the right side, you can find checkboxes on which usecase the filter should work.

Cu, sixcooler.
sixcooler
 
Beiträge: 479
Registriert: Do Aug 14, 2008 5:22 pm

Re: Filtering urls before domain crawling

Beitragvon Cyrille37 » Di Apr 05, 2016 1:06 pm

sixcooler hat geschrieben:on Blacklist_p.html, on the right side, you can find checkboxes on which usecase the filter should work.

Thanks a lot !

I was disrupted by the top page sentence: "This function provides an URL filter to the proxy; ..."
Cyrille37
 
Beiträge: 7
Registriert: So Apr 03, 2016 2:12 pm


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron