Remote Crawler

Hier finden YaCy User Hilfe wenn was nicht funktioniert oder anders funktioniert als man dachte. Bei offensichtlichen Fehlern diese bitte gleich in die Bugs (http://bugs.yacy.net) eintragen.
Forumsregeln
In diesem Forum geht es um Benutzungsprobleme und Anfragen für Hilfe. Wird dabei ein Bug identifiziert, wird der thread zur Bearbeitung in die Bug-Sektion verschoben. Wer hier also einen Thread eingestellt hat und ihn vermisst, wird ihn sicherlich in der Bug-Sektion wiederfinden.

Remote Crawler

Beitragvon nstaudt » Mi Sep 22, 2010 7:14 am

I've noticed that since upgrading to SVN 7170+, my remote crawl queue is always empty (even though there are active peers with urls available for remote crawl). Has anyone else noticed this? Is this normal? (my peer: http://abbot.is-a-geek.net:8080)
nstaudt
 
Beiträge: 73
Registriert: Fr Aug 13, 2010 10:54 am

Re: Remote Crawler

Beitragvon Orbiter » Mi Sep 22, 2010 8:24 am

it works faster which may cause that you see no entries when you look at it. If this works correctly then you may see the crawl results in /CrawlResults.html?process=6
Is there any entry?
Orbiter
 
Beiträge: 5793
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Remote Crawler

Beitragvon nstaudt » Mi Sep 22, 2010 3:21 pm

"The stack is empty"
nstaudt
 
Beiträge: 73
Registriert: Fr Aug 13, 2010 10:54 am

Re: Remote Crawler

Beitragvon Quix0r » Mi Sep 22, 2010 3:40 pm

The same at my node, http://free-search.yacy I have already turned it off and back online, also DHT-in/out is enabled, 600 PPM currently.

For several hours, the remote crawler is paused but no entries got added. In earlier versions this was always possible and the size of the pool did not exceed ~115 entries. But since it got moved to http://localhost:8080/RemoteCrawl_p.html, it doesn't work.
Quix0r
 
Beiträge: 1345
Registriert: Di Jul 31, 2007 9:22 am
Wohnort: Krefeld

Re: Remote Crawler

Beitragvon Quix0r » Di Feb 01, 2011 8:13 am

In CrawlQueues.java, method remoteCrawlLoaderJob() it says somewhere:
Code: Alles auswählen
    if (coreCrawlJobSize() > 0 /*&& sb.indexingStorageProcessor.queueSize() > 0*/) {
        if (this.log.isFine()) log.logFine("remoteCrawlLoaderJob: a local crawl is running, omitting processing");
        return false;
    }

This if() statement makes sure, that if the "core" (better say 'local') crawler as at least one entry, which is usually the case, there is no remote crawl done. Remote-Crawling means, your peer fetches an entry from an others "limit" (better say 'global') queue, crawls and indexes it for its own index and sends the receipt back to the other peer.

Removing this if() block would result in a heavy load on the whole network, according to Orbiter which you also don't want on your peer. I have now rewritten this little part:
Code: Alles auswählen
    // Determine ratio local/remote, if lower than 1000 do not do any remote jobs
    if (!isLocalRemoteRatioReached()) {
        Log.logFine(LoggerNames.LOGGER_CRAWL_QUEUES, "remoteCrawlLoaderJob: ratio for remote-triggered crawl not reached.");
        return false;
    }

And the missing method:
Code: Alles auswählen
    /**
     * To determine wether remote-crawling is omitted because of to many local crawls
     *
     * @return Wether remote-crawling is omitted
     */
    private boolean isLocalRemoteRatioReached () {
        if (remoteCrawlJobSize() == 0) {
            // No entries in remote queue
            return true;
        }

        // Determine ratio
        float ratio = (localCrawlJobSize() / remoteCrawlJobSize());

        // Debug message
        Log.logInfo(LoggerNames.LOGGER_CRAWL_QUEUES,
            "isLocalRemoteRatioReached: local.size() = " + localCrawlJobSize() +
            ", global.size() = " + globalCrawlJobSize() +
            ", remote.size() = " + remoteCrawlJobSize() +
            ", ratio = " + new Float(ratio).toString()
        );

        // Check for local/remote ratio is below 1000
        return (ratio <= 1000);
    }

This should make sure that remote crawls are not always performed. I would like to see your feedback (except that missing class) on this. If you require the full patch, try this:
http://free-search.yacy/repository/yacy ... .patch.bz2

Alternatively:
http://www.mxchange.org/downloads/yacy/ ... .patch.bz2

You may need a class LoggerNames which you can find in this patch:
http://free-search.yacy/repository/yacy ... .patch.bz2

It is an attempt to rewrite all loggers to statically called methods and without any instances.
Quix0r
 
Beiträge: 1345
Registriert: Di Jul 31, 2007 9:22 am
Wohnort: Krefeld


Zurück zu Fragen und Antworten

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron