Increase the priority of the 'Re-Crawl Index Documents'

Forum for developers

Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon LA_FORGE » Sa Nov 05, 2016 4:19 pm

Hi,

I want to increase the priority of the new 'Re-Crawl Index Documents' feature implemented at the bottom of the page 'IndexReIndexMonitor_p.html'. I love this feature and I want to give this thread a higher priority since no other crawls are currently running on my main peer.


Greetings from Germany

LA_FORGE
LA_FORGE
 
Beiträge: 555
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon luc » So Nov 06, 2016 10:17 am

Hello LA_FORGE,
you can configure the ReCrawl job performance settings in "System administration > Performance Settings of Busy Queues" (/PerformanceQueues_p.html). I guess the "Maximum of System-Load" and "Delay between busy loops" settings can fit your needs.

Please note the "ReCrawl" job line only appears in the table AFTER the job has been launched from /IndexReIndexMonitor_p.html.

Best regards
Luc
luc
 
Beiträge: 284
Registriert: Mi Aug 26, 2015 1:04 am

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon LA_FORGE » So Nov 06, 2016 4:41 pm

Thank you very much. I just applied custom values at this page and saved the new values. But shortly after that the values got reset to their default values. I'm looking for something in the java code to change the thread priority. At that time we migrated the build-in index to Solr we had a procedure to migrate the old index to Solr. This procedure ran at low priority in the background. I just changed a value of a variable in the java code to take influence of the thread priority and then it ran much faster. Is this also possible in context I described above? If so, in which class should I made the change and whats the exact area of the corresponding code?
LA_FORGE
 
Beiträge: 555
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon luc » Mo Nov 07, 2016 8:02 pm

Ok LA_FORGE, I didn't played much with this feature, so it is possible there is something to fix so the performance settings on this task are not so easily lost...

By the way, if you wish to experiment with the Thread priority property, you can modify the RecrawlBusyThread class : just change the line
Code: Alles auswählen
this.setPriority(Thread.MIN_PRIORITY);
to
Code: Alles auswählen
this.setPriority(Thread.MAX_PRIORITY);
or
Code: Alles auswählen
this.setPriority(value);
with a value between 1 and 10 (Max priority).

But please note the only involved code is in the RecrawlBusyThread.job() method performing a Solr request to select documents to recrawl (RecrawlBusyThread.processSingleQuery()) and feeding URLs to the local crawler (RecrawlBusyThread.feedToCrawler()). Then local crawler performance settings applies.

If you really wish, you can also modify the local crawler Thread priority when it is created, in the Switchboard.

Happy hacking!
luc
 
Beiträge: 284
Registriert: Mi Aug 26, 2015 1:04 am

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon LA_FORGE » Di Nov 08, 2016 12:07 pm

Thank you very very much. That's exactly what I'm looking for. Since I have only basic skills in programming and java, it's a great exercise to play with.
LA_FORGE
 
Beiträge: 555
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon LA_FORGE » Fr Jan 27, 2017 11:32 am

Hi Luc,

you helped me very much last time to find what I'm looking for. Now I'm looking for the code that does this:

Code: Alles auswählen
I 2017/01/27 11:31:03 CollectionConfiguration convergence step 1 for host www.midnighttrader.com ...
I 2017/01/27 11:31:03 CollectionConfiguration convergence for host www.midnighttrader.com after 1 steps


Is this related to the postprocessing? If so, can you help me to locate the code?

Thank you very much in advance


Greetings

LA_FORGE
LA_FORGE
 
Beiträge: 555
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon luc » Mo Jan 30, 2017 9:31 am

Hi LA_FORGE,
you are right, the log traces you mention are from the postprocessing task.
More precisely, the related code from version 1.92 is in the CollectionConfiguration.createRankingMap() private function called by the postprocessing one : see the first trace and the next.

Have a nice day
luc
 
Beiträge: 284
Registriert: Mi Aug 26, 2015 1:04 am

Re: Increase the priority of the 'Re-Crawl Index Documents'

Beitragvon LA_FORGE » Mo Jan 30, 2017 1:05 pm

Thank you very much!!
LA_FORGE
 
Beiträge: 555
Registriert: Sa Okt 11, 2008 5:24 pm


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron