SOLR error auto-optimization Max Disk I/O

Hier finden YaCy User Hilfe wenn was nicht funktioniert oder anders funktioniert als man dachte. Bei offensichtlichen Fehlern diese bitte gleich in die Bugs (http://bugs.yacy.net) eintragen.
Forumsregeln
In diesem Forum geht es um Benutzungsprobleme und Anfragen für Hilfe. Wird dabei ein Bug identifiziert, wird der thread zur Bearbeitung in die Bug-Sektion verschoben. Wer hier also einen Thread eingestellt hat und ihn vermisst, wird ihn sicherlich in der Bug-Sektion wiederfinden.

SOLR error auto-optimization Max Disk I/O

Beitragvon Guims » Mo Mai 19, 2014 2:50 pm

Hello,

For an update, I have several times a day, an error on YaCy / solr.
Indeed Yacy running Solr self-optimization.
After 10 minutes of hard disk optimization my RAID 1 is 50MB I / O.
I am obliged to killYACY.sh and restart YaCy.
SOLR currently weighs 85 GB 8462504 urls

Here is an excerpt of the log before saturation of the disk

Code: Alles auswählen
I 2014/05/19 12:14:06 RESOURCE OBSERVER resources ok
I 2014/05/19 12:14:06 SWITCHBOARD cleanup post-processed 0 documents
I 2014/05/19 12:14:06 NoticedURL CLEARING ALL STACKS
I 2014/05/19 12:14:06 SWITCHBOARD Solr auto-optimization: idleSearch=8214933, idleAdmin=8204933, deltaOptimize=8204933, proccount=0
I 2014/05/19 12:14:06 SWITCHBOARD Solr auto-optimization: running solr.optimize(1)
S 2014/05/19 12:26:35 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:35 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:36 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:38 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:38 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:38 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:39 BusyThread Thread 'BusyThread net.yacy.peers.Network.peerPing' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:39 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteCrawlLoaderJob' runs high load cycle. current: 10.15 max.: 10.0
S 2014/05/19 12:26:40 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:41 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:41 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:42 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:44 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:44 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:44 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.22 max.: 10.0
S 2014/05/19 12:26:46 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:47 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:47 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:48 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:49 BusyThread Thread 'BusyThread net.yacy.search.Switchboard.surrogateProcess' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:49 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteCrawlLoaderJob' runs high load cycle. current: 10.44 max.: 10.0
S 2014/05/19 12:26:50 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:50 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:50 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:52 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:53 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:53 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:54 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.49 max.: 10.0
S 2014/05/19 12:26:56 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:56 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:56 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:58 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteTriggeredCrawlJob' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:59 BusyThread Thread 'BusyThread net.yacy.contentcontrol.SMWListSyncThread.run' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:59 BusyThread Thread 'BusyThread net.yacy.contentcontrol.ContentControlFilterUpdateThread.run' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:59 BusyThread Thread 'BusyThread net.yacy.search.Switchboard.dhtTransferJob' runs high load cycle. current: 10.21 max.: 10.0
S 2014/05/19 12:26:59 BusyThread Thread 'BusyThread net.yacy.crawler.data.CrawlQueues.remoteCrawlLoaderJob' runs high load cycle. current: 10.21 max.: 10.0


did you offer me a solution for my problem?
thank you in advance for your help

Best regards
Guims
 
Beiträge: 40
Registriert: Mo Sep 02, 2013 5:03 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon sixcooler » Mo Mai 19, 2014 4:12 pm

Hello Guims,

the optimization starts after all crawls finished and your machine ist idle.
The optimization causes load on your machine, thats why other tasks are paused.
The optimization merges the URLs to one Segment / 5M URLs.
So it looks ok to me.

But I wonder that you've 85GB for 8.5 M URLs. This means te index gets merged to a singe Segment of 85GB.
This will take a long time!

@all: What is the diskusage of the Solr-direktory at which count of URLs?
(I've 60GB / 41 M URLs)

@Orbiter: perhaps we have to change the count of Segments / URLs on optimization.

cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon Guims » Mo Mai 19, 2014 4:39 pm

Sixcooler thank you for your reply.
You have 60 GB for 41 million urls o_O
You use Solr default scheme?
(If yes, then I have a big problem)

PS: i'm in 1.73/9017

The default merge is not 10 segments ?
Guims
 
Beiträge: 40
Registriert: Mo Sep 02, 2013 5:03 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon sixcooler » Mo Mai 19, 2014 5:03 pm

Hello Guims,

yes I'm using the defaul tscheme.
But no - yout don't have a problem.

On other, smaller Peers I have, there is also about 1GB / 0.1 M URLs.

So I think your volume of data seams to be normal and my big Peer is the strange one :-)

Perhaps this is because it is very old and the index is filled by dht and only light crawling.

I've changed to optimize to 1 Segment / 1 M URLs in 1.73-9029.
So please try this update.

cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon Guims » Mo Mai 19, 2014 5:26 pm

Thx Sixcolor, I will try this.
I will go back with result

Do you think to update ( commit ) debian.yacy.net ?
Guims
 
Beiträge: 40
Registriert: Mo Sep 02, 2013 5:03 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon sixcooler » Mo Mai 19, 2014 6:09 pm

Hello Guims,

I'm sorry. I'm only able to commit code to the repo.
Other guys will generate the packages.

Cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: SOLR error auto-optimization Max Disk I/O

Beitragvon Guims » Do Mai 22, 2014 9:03 am

The last commit solve my problem,

Thx sixcooler !!
Guims
 
Beiträge: 40
Registriert: Mo Sep 02, 2013 5:03 pm


Zurück zu Fragen und Antworten

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast

cron