Postprocessing: Looking for the code

Forum for developers

Postprocessing: Looking for the code

Beitragvon LA_FORGE » Sa Apr 23, 2016 10:33 am

Hi,

i'm searching for the java code behind

Code: Alles auswählen
Postprocessing Progress 
busy:postprocessed 219800 from 100556956 collection documents; 4 ppm; 21380799 minutes remaining


What's the name of the associated java class?


Greetings

LA_FORGE
LA_FORGE
 
Beiträge: 538
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Postprocessing: Looking for the code

Beitragvon luc » Di Apr 26, 2016 7:57 am

Hi, as this message is displayed in /Crawler_p.html page, you can easily find the java class behind, it is Crawler_p.java (https://github.com/yacy/yacy_search_ser ... ler_p.java).
But I guess you are looking for how is "postprocessing_status" (https://github.com/yacy/yacy_search_ser ... .html#L147) filled...
And you will see it is not directly feed by Crawler_p.java, but rather by JavaScript Crawler.js (https://github.com/yacy/yacy_search_ser ... er.js#L110), which itself got it from /api/status_p.xml (https://github.com/yacy/yacy_search_ser ... _p.xml#L83), which itself is fed by status_p.java class (https://github.com/yacy/yacy_search_ser ... .java#L155) :)
luc
 
Beiträge: 245
Registriert: Mi Aug 26, 2015 1:04 am

Re: Postprocessing: Looking for the code

Beitragvon LA_FORGE » So Mai 29, 2016 2:21 pm

Hi,

thank you very much for this detailed explanation. In which file is the java code that does the postprocessing actually? Since I'm a beginner in programming languages your're welcome to post the snippet of the code that does the postprocessing.
LA_FORGE
 
Beiträge: 538
Registriert: Sa Okt 11, 2008 5:24 pm

Re: Postprocessing: Looking for the code

Beitragvon luc » Mo Mai 30, 2016 7:15 am

Ok, so we can now look for references to CollectionConfiguration postprocessingRunning and postprocessingActivity static proprerties. If you use Eclipse IDE, you can do so by selecting property, and then with this menu : Search > References > Workspace (shortcut keys : Maj + Ctrl + G).
For postprocessingRunning, we get references in status_p.java a,d CollectionConfiguration.java classes. In status_p, we only read the property. But we are looking for the place where we fill the property with some value. It is quite simple in this case : everything is done in CollectionConfiguration.postprocessing method.
At the beginning of the method, property postprocessingRunning is set to true :
Code: Alles auswählen
        // calculate the number of documents to be processed
        String collection1query = collection1query(segment, harvestkey);
        String webgraphquery = webgraphquery(segment, harvestkey);
        postprocessingRunning = true;


And at the of all processings, it is set again to false :
Code: Alles auswählen
        postprocessingWebgraphCount = 0;
        postprocessingActivity = "postprocessing terminated";
        ConcurrentLog.info("CollectionConfiguration", postprocessingActivity);
        postprocessingRunning = false;
        return allcount.get();


The postprocessing method itself as a few hundreds of lines of code I will not detail now. So I don't know if you wish to understand the whole process, but at least I can tell you how to identify the different parts : you can look for lines where postprocessingActivity property is fill. For example with the first parts :

Code: Alles auswählen
postprocessingActivity = "collecting counts";

...
Code: Alles auswählen
postprocessingActivity = "collecting host facets for collection";

...
Code: Alles auswählen
postprocessingActivity = "create ranking map";


And so on...
luc
 
Beiträge: 245
Registriert: Mi Aug 26, 2015 1:04 am

Re: Postprocessing: Looking for the code

Beitragvon LA_FORGE » Sa Sep 10, 2016 4:55 pm

Thank you very much for the detailed explanation.
LA_FORGE
 
Beiträge: 538
Registriert: Sa Okt 11, 2008 5:24 pm


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste