Seite 1 von 1

Save crawler config

BeitragVerfasst: Mo Mai 18, 2015 5:54 pm
von davide
Is it possible to save the crawler configuration to file?
Ideally, the whole crawler config should be saved (domains, filters, ..), and thereof allow it to be restored into YaCy to start a new crawl session based on the saved crawler config.

Right now, when you start a crawler, its configuration is no longer accessible nor changeable.

Re: Save crawler config

BeitragVerfasst: Mo Mai 18, 2015 6:08 pm
von Orbiter
all what you want is possible!

- click on ‚Process Scheduler‘
- all crawl start actions are listen, you can set a scheduler time there
- you can check the checkbox and start them again with „Execute Selected Actions“
- you can click on the clone button (document -> document) and the crawl details are written to the start servlet so you can edit and repeat them slightly differently
- you can copy the whole scheduler database to another peer, just copy DATA/WORK/api.bheap

Re: Save crawler config

BeitragVerfasst: Mo Mai 18, 2015 6:22 pm
von davide
Cool :)
It works like a sort of "macro recorder" for all the GET requests it receives by admin.

Re: Save crawler config

BeitragVerfasst: Mo Mai 18, 2015 9:22 pm
von Orbiter
yes, not all, but most that you can use to manipulate the index, like deletion requests also. You can also use it to copy-paste those GET requests to use them externally from YaCy (i.e. with wget or curl) to start processes i.e. with a cronjob.