Where is the file that makes Table_API_p.html , please

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Where is the file that makes Table_API_p.html , please

Beitragvon xioc752 » Fr Jan 02, 2015 8:34 pm

Hello,
We want to copy the entire file that makes the data displayed on page
Table_API_p.html
Then we want to insert it in a new server.
Imagine there are 900 hundred valuable crawling instructions in http://zxc.asd.e.wt:8090/Table_API_p.html
A URL line on page zxc.asd.e.wt:8090/Table_API_p.html - such as this - will display the entire list
/Table_API_p.html?startRecord=1&maximumRecords=900&inline=false&filter=.*

These URLs and RSS feeds, etcetera are In the list entitled " Recorded Actions "
Prepare the list by cleaning it of the majority of housekeeping duties, if possible.

We wish to extract the crawling data and the frequency of crawling instructions, only.
The common technique is to copy all the instructions, but this becomes impractical as the number of instructions grows and the frequency of crawl is difficult to manage.
Doing this the " old " way is extremely manual and time consuming. It assumes that each instruction and frequency will be manually reinserted into another server.

You can, alternatively, place the data in an spreadsheet and use selectors to remove housekeeping instructions.
Still, this is very manual.

After removing all server housekeeping instructions, we wish to reinsert the pages to crawl and RSS feeds into the same file format, but in a fresh server.
Then we will insert the prepared file into the new server - replacing the generic file.
Why do this?
Our experience indicates that doing an extraction of URLs in RSS format does not capture all the URLs for some unknown reason.

Where is the file that holds the data that generates the/Table_API_p.html , please?
Many thanks
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: Where is the file that makes Table_API_p.html , please

Beitragvon Orbiter » So Jan 11, 2015 11:46 pm

thats easy and it is actually intended to do so to clone a YaCy configuration!
You just need to copy the file DATA/WORK/api.bheap from the source peer to the target peer (while both are not running)
Orbiter
 
Beiträge: 5787
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Where is the file that makes Table_API_p.html , please

Beitragvon xioc752 » Mi Jan 21, 2015 5:35 pm

Great...Thank you very much!
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: Where is the file that makes Table_API_p.html , please

Beitragvon xioc752 » Mo Mai 04, 2015 3:43 pm

Hello + Thank you,
I have copied the indicated api.bheap file from the original machine into the Work folder of the 2nd and new machine.
I had tested the new machine with an RSS feed before doing this.

The new machine contains the segments and the data appears to be normally accessible.
Hence, the new machine contains the segments of the old machine...but not the entire DATA folder, as it was damaged.
However, the process scheduler shows an empty environment with no instructions.

It is the instructions we need, of course. There are hundreds of API based RSS type instructions, plus individual URLs that are needed to be transferred intact, please.

Both machines were off when the copy and insert too place.

This file comes from a previous machine that could not see the outside and is the subject of other tickets where we tried to make it see the outside, again, normally. Hence, it has not crawled for several months. Understandably, the current api.bheap file is identical in size to the one we saved when the problem began.
reference:
viewtopic.php?f=23&t=5471

To help you remember the situation, the original server still shows the following when the status URL is accessed (n.b., its Ubuntu and YaCy have been updated to the currently available versions as of today):

HTTP ERROR: 403
Problem accessing /Status.html. Reason:
proxy use not allowed (see System Administration -> Advanced Settings -> Proxy Access Settings -> Transparent Proxy; switched off).
Powered by Jetty://



Thank you for your advice.
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron