Thank you very much

Forum for developers

Thank you very much

Beitragvon LA_FORGE » So Sep 24, 2017 1:23 pm

Hi,

Code: Alles auswählen
Postprocessing Progress 
busy:postprocessed 34300 from 106327778 collection documents; 1426 ppm; 74521 minutes remaining


I would thank all the devs involved on the refactoring of the postprocessing routines. The procedure runs fully satisfying now! The timeframe to complete decreased from over 700 years (before the refactoring of the routines) to 52 days.

Outstanding work! Thank you very much

LA_FORGE
LA_FORGE
 
Beiträge: 559
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Thank you very much

Beitragvon luc » Mo Sep 25, 2017 6:21 am

Hi LA_FORGE,
good to know this task is starting to become useful within the bounds of a human life ;)

Do you run YaCy with the very latest sources from GitHub? (I wonder to which extend the latest Solr upgrades also contributed to improve these post-processing performances...)
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Thank you very much

Beitragvon LA_FORGE » Mo Sep 25, 2017 11:25 am

Hi Luc,

exactly, I just pulled the newest commit with the command

git clone https://github.com/yacy/yacy_search_server.git

then made a few hacks because of my giant index size of 200 million documents. But I didn't touch code related to the postprocessing procedures, because the lack of java skills. Then I just compiled the sources with the command

ant clean all

I additionally added the switches -XX:+UseParallelGC -XX:+UseNUMA to the startup script, In multiprocessor environments these switches increase the performance a bit.

Yes you're right, I think the integration of the latest solr version is jointly responsible for the performance gain, too.
LA_FORGE
 
Beiträge: 559
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Thank you very much

Beitragvon LA_FORGE » Fr Okt 06, 2017 7:25 am

After a few days it decreased to 160 ppm and now it takes over 1 year again for the process to complete :-(

Question: When I'm crawling some sites on another peer and export this index via the XML export feature (Rich and full-text Solr data), has this postprocessing procedure already been run and does this data dump already contain the postprocessing data or does it need to be computed again?
LA_FORGE
 
Beiträge: 559
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Thank you very much

Beitragvon luc » Do Okt 12, 2017 8:47 am

Hi LA_FORGE, sorry for the delayed answer, but as far as I know :
- post-processing runs only once all crawls are terminated (see the conditional check)
- once post-processed and committed, related Solr fields are indeed exported with the XML export feature, so they do not need to be computed again.

A few complementary remarks on export/import however :
- the webgraph collection is not exported, so obviously you also loose any post-processing computation on webgraph collcetion fields when exporting
- some post-processed fields computation is related to the local peer data : for example references post-processing uses the citation index, and eventually the webgraph collection if enabled. So to my mind, to be truly accurate, theses values should be computed again when importing to another peer with a larger or a different index. But it wont' be done automatically after import, as the fields marking that post-processing is needed (process_sxt and harvestkey_s) are cleaned-up after a successful post-processing...
luc
 
Beiträge: 305
Registriert: Mi Aug 26, 2015 1:04 am

Re: Thank you very much

Beitragvon LA_FORGE » Fr Okt 13, 2017 9:33 am

Great! Thank you very much
LA_FORGE
 
Beiträge: 559
Registriert: Sa Okt 11, 2008 5:24 pm


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 3 Gäste

cron