Rescue Data

Keine Scheu, hier darf alles gefragt und diskutiert werden. Das ist das Forum für YaCy-Anfänger. Hier kann man 'wo muss man klicken' fragen und sich über Grundlagen zur Suchmaschinentechnik unterhalten.
Forumsregeln
Hier werden Fragen beantwortet und wir versuchen die Probleme von YaCy-Newbies zu klären. Bitte beantwortete Fragen im YaCy-Wiki http://wiki.yacy.de dokumentieren!

Rescue Data

Beitragvon xioc752 » Fr Feb 20, 2015 5:26 pm

HI,
Problem
We have 2 Robinson type cloud servers that must be replaced
Saving the harvested data from each crawler is CRITICAL
The data in them is different. The crawlers had different tasks.

    Functionally, they are not working properly and cannot be easily repaired. (Many attempted repairs, without success)
    Additionally, it is not useful to copy the DATA folder sets as there are errors in them which continue when the DATA set is inserted in a fresh crawler installation.
    Each of them is out of space for crawling. They cannot be expanded. Available cloud space is blocked in size.
    They must be permanently shut down, very soon, due to ending the hosting agreement
    We have rescued the data lists of what to crawl and frequencies of crawls from each (thanks!)
    There is no space on either of them to make a Solr backups of the mountains (GBs) of harvested data
    Due to many problems, it is not realistic to recrawl all the data. There is too much and much is time specific.
    The installations are generic Ubuntu with a crawler installed in the VM.
    The crawlers use both internal Solr cores (including for webgraph edges, and do not write to any external Solr DBs

We need to recover the folders with the harvested data. This is essential.
1. What folders do we need to "save" and
2. What do we move it to...'where,' please?
We can make new generic crawlers in another cloud space.
Our goal is to add these harvested data to a private [P2P/DHT environment]
However, all the new servers are Robinson servers, that read each other, at this time

Many thanks
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm

Re: Rescue Data

Beitragvon bauhaus05 » So Feb 22, 2015 12:21 pm

I'm only a beginner. On my Ubuntu the data is stored in the folder /home/[username]/yacy/DATA/INDEX/[networkname]/SEGMENTS/

Baobab is an helpful tool for analysing structures!
bauhaus05
 
Beiträge: 19
Registriert: So Feb 08, 2015 7:50 pm

Re: Rescue Data

Beitragvon xioc752 » So Feb 22, 2015 11:28 pm

Thank you
Questions please
1. what needs to be done to reuse it elsewhere?
2. must it stay separate and intact forever in a new location, or
3. can it be the basis for a new crawler that will grow? a fresh YaCy
Many thanks!
xioc752
 
Beiträge: 68
Registriert: Mo Jul 28, 2014 5:01 pm


Zurück zu Hilfe für Einsteiger und Anwender

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste