mashine 100% load, yacy hang when indexing big url list

Hier finden YaCy User Hilfe wenn was nicht funktioniert oder anders funktioniert als man dachte. Bei offensichtlichen Fehlern diese bitte gleich in die Bugs (http://bugs.yacy.net) eintragen.
Forumsregeln
In diesem Forum geht es um Benutzungsprobleme und Anfragen für Hilfe. Wird dabei ein Bug identifiziert, wird der thread zur Bearbeitung in die Bug-Sektion verschoben. Wer hier also einen Thread eingestellt hat und ihn vermisst, wird ihn sicherlich in der Bug-Sektion wiederfinden.

mashine 100% load, yacy hang when indexing big url list

Beitragvon streetfighter » Sa Mär 28, 2009 11:16 am

I am trying to index http://pin-group.pl/mzm/index.html - collection of 30 html files with 10000 links each
Yacy import main url but mashine have over hour 100% cpu load and crawler do nothing, yacy is responsible via browser

When indexing only one file from collection (one html 0,5 Mb with 10000 links) yacy almost every time have this same problem

I have 2 GB RAM (1200 for yacy) and dual core cpu, java 1.6 (1.5 problem exist too). On stable release problem exist too

Code: Alles auswählen
************* Start Thread Dump Sat Mar 28 11:16:45 CET 2009 *******************

YaCy Version: 0.730/05746
Total Memory = 1245446144
Used  Memory = 288100376
Free  Memory = 957345768


THREADS WITH STATES: BLOCKED


THREADS WITH STATES: RUNNABLE

Thread= Session_83.10.32.178:53483#22 id=143 RUNNABLE
at java.lang.Thread.getAllStackTraces(Thread.java:1487)
at Threaddump_p.respond(Threaddump_p.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.http.httpdFileHandler.invokeServlet(httpdFileHandler.java:1171)
at de.anomic.http.httpdFileHandler.doResponse(httpdFileHandler.java:751)
at de.anomic.http.httpdFileHandler.doGet(httpdFileHandler.java:240)
at de.anomic.http.httpd.GET(httpd.java:489)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverCore$Session.listen(serverCore.java:739)
at de.anomic.server.serverCore$Session.run(serverCore.java:620)


Thread= job_pool-1-thread-23 id=76 RUNNABLE
Thread= job_pool-1-thread-19 id=68 RUNNABLE
Thread= job_pool-1-thread-20 id=70 RUNNABLE
Thread= job_pool-1-thread-21 id=72 RUNNABLE
Thread= job_pool-1-thread-22 id=74 RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at de.anomic.crawler.CrawlProfile$entry.domInc(CrawlProfile.java:466)
at de.anomic.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:227)
at de.anomic.crawler.CrawlStacker.job(CrawlStacker.java:122)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


Thread= Session_83.10.32.178:53484#26 id=144 RUNNABLE
Thread= Session_83.10.32.178:49381#2 id=423 RUNNABLE
at java.io.PushbackInputStream.read(PushbackInputStream.java:122)
at de.anomic.server.serverCore.receive(serverCore.java:840)
at de.anomic.server.serverCore$Session.readLine(serverCore.java:566)
at de.anomic.server.serverCore$Session.listen(serverCore.java:671)
at de.anomic.server.serverCore$Session.run(serverCore.java:620)


Thread= Timeout guard daemon id=242 RUNNABLE
Thread= Timeout guard daemon id=424 RUNNABLE
at java.net.Socket.<init>(Socket.java:240)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$1.doit(ControllerThreadSocketFactory.java:91)
at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$SocketTask.run(ControllerThreadSocketFactory.java:158)
at java.lang.Thread.run(Thread.java:619)


Thread= httpd:8080 id=110 RUNNABLE
at java.net.ServerSocket.accept(ServerSocket.java:421)
at de.anomic.server.serverCore.job(serverCore.java:331)
at de.anomic.server.serverAbstractBusyThread.run(serverAbstractBusyThread.java:143)



THREADS WITH STATES: TIMED_WAITING

Thread= de.anomic.crawler.CrawlQueues.remoteTriggeredCrawlJob id=104 TIMED_WAITING
Thread= de.anomic.data.bookmarksDB.autoReCrawl id=55 TIMED_WAITING
Thread= de.anomic.plasma.plasmaSwitchboard.rwiCacheFlush id=102 TIMED_WAITING
Thread= de.anomic.yacy.yacyCore.publishSeedList id=107 TIMED_WAITING
Thread= de.anomic.plasma.plasmaSwitchboard.dhtTransferJob id=109 TIMED_WAITING
Thread= de.anomic.plasma.plasmaSwitchboard.cleanupJob id=101 TIMED_WAITING
Thread= de.anomic.crawler.CrawlQueues.coreCrawlJob id=106 TIMED_WAITING
Thread= de.anomic.crawler.CrawlQueues.remoteCrawlLoaderJob id=105 TIMED_WAITING
Thread= de.anomic.yacy.yacyCore.peerPing id=108 TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at de.anomic.server.serverAbstractBusyThread.ratz(serverAbstractBusyThread.java:199)
at de.anomic.server.serverAbstractBusyThread.run(serverAbstractBusyThread.java:164)


Thread= Thread-117 id=241 TIMED_WAITING
at java.lang.Thread.join(Thread.java:1151)
at org.apache.commons.httpclient.util.TimeoutController.execute(TimeoutController.java:63)
at org.apache.commons.httpclient.util.TimeoutController.execute(TimeoutController.java:82)
at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory.createSocket(ControllerThreadSocketFactory.java:95)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:128)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at de.anomic.http.httpClient.execute(httpClient.java:445)
at de.anomic.http.httpClient.GET(httpClient.java:253)
at de.anomic.crawler.HTTPLoader.load(HTTPLoader.java:149)
at de.anomic.crawler.HTTPLoader.load(HTTPLoader.java:103)
at de.anomic.crawler.ProtocolLoader.load(ProtocolLoader.java:98)
at de.anomic.crawler.ProtocolLoader.process(ProtocolLoader.java:120)
at de.anomic.crawler.CrawlQueues$crawlWorker.run(CrawlQueues.java:576)


Thread= Thread-1 id=9 TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at de.anomic.server.serverProfiling.run(serverProfiling.java:63)



THREADS WITH STATES: WAITING

Thread= parseDocument_pool-1-thread-31 id=92 WAITING
Thread= parseDocument_pool-1-thread-35 id=100 WAITING
Thread= parseDocument_pool-1-thread-34 id=98 WAITING
Thread= parseDocument_pool-1-thread-33 id=96 WAITING
Thread= parseDocument_pool-1-thread-32 id=94 WAITING
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
at de.anomic.server.serverProcessor.enQueue(serverProcessor.java:147)
at de.anomic.crawler.CrawlStacker.enqueueEntry(CrawlStacker.java:151)
at de.anomic.plasma.plasmaSwitchboard.parseDocument(plasmaSwitchboard.java:1579)
at de.anomic.plasma.plasmaSwitchboard.parseDocument(plasmaSwitchboard.java:1512)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


Thread= main id=1 WAITING
at java.lang.Object.wait(Object.java:485)
at de.anomic.server.serverSemaphore.P(serverSemaphore.java:63)
at de.anomic.plasma.plasmaSwitchboard.waitForShutdown(plasmaSwitchboard.java:2112)
at yacy.startup(yacy.java:421)
at yacy.main(yacy.java:1038)


Thread= MultiThreadedHttpConnectionManager cleanup daemon id=153 WAITING
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)


Thread= Reference Handler daemon id=2 WAITING
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)


Thread= Finalizer daemon id=3 WAITING
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)


Thread= urls_pool-1-thread-38 id=155 WAITING
Thread= Java2D Disposer daemon id=125 WAITING
Thread= urls_pool-1-thread-36 id=150 WAITING
Thread= urls_pool-1-thread-37 id=154 WAITING
at java.lang.Thread.run(Thread.java:619)


Thread= de.anomic.plasma.plasmaSwitchboard.deQueueProcess id=103 WAITING
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:254)
at de.anomic.server.serverProcessor.enQueue(serverProcessor.java:147)
at de.anomic.plasma.plasmaSwitchboard.deQueueProcess(plasmaSwitchboard.java:1246)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverInstantBusyThread.job(serverInstantBusyThread.java:96)
at de.anomic.server.serverAbstractBusyThread.run(serverAbstractBusyThread.java:143)


Thread= storeDocumentIndex_pool-1-thread-12 id=42 WAITING
Thread= webStructureAnalysis_pool-1-thread-26 id=82 WAITING
Thread= condenseDocument_pool-1-thread-30 id=90 WAITING
Thread= storeDocumentIndex_pool-1-thread-8 id=34 WAITING
Thread= storeDocumentIndex_pool-1-thread-7 id=32 WAITING
Thread= webStructureAnalysis_pool-1-thread-25 id=80 WAITING
Thread= job_pool-1-thread-18 id=66 WAITING
Thread= job_pool-1-thread-17 id=64 WAITING
Thread= storeDocumentIndex_pool-1-thread-11 id=40 WAITING
Thread= storeDocumentIndex_pool-1-thread-14 id=46 WAITING
Thread= storeDocumentIndex_pool-1-thread-10 id=38 WAITING
Thread= storeDocumentIndex_pool-1-thread-16 id=50 WAITING
Thread= storeDocumentIndex_pool-1-thread-24 id=78 WAITING
Thread= condenseDocument_pool-1-thread-28 id=86 WAITING
Thread= condenseDocument_pool-1-thread-29 id=88 WAITING
Thread= webStructureAnalysis_pool-1-thread-27 id=84 WAITING
Thread= storeDocumentIndex_pool-1-thread-15 id=48 WAITING
Thread= storeDocumentIndex_pool-1-thread-9 id=36 WAITING
Thread= storeDocumentIndex_pool-1-thread-13 id=44 WAITING
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at de.anomic.server.serverProcessor.take(serverProcessor.java:97)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



THREADS WITH STATES: NEW


THREADS WITH STATES: TERMINATED


************* End Thread Dump Sat Mar 28 11:16:45 CET 2009 *******************


I have crawler limit file size to 1Mb, depth to 5, link limit to 100


Maybe someone can try to start indexing http://pin-group.pl/mzm/index.html and check what is going on?
streetfighter
 
Beiträge: 37
Registriert: Sa Jan 03, 2009 9:40 am

Re: mashine 100% load, yacy hang when indexing big url list

Beitragvon streetfighter » Do Apr 09, 2009 5:24 am

Problem exist only when search is limited to .*.pl/.* and crwl results to any number (tested with 100 and 1000)

List provided on previous post is working now on yacy standalone without crawl result limit but with .pl limitations - is working without any problems
streetfighter
 
Beiträge: 37
Registriert: Sa Jan 03, 2009 9:40 am


Zurück zu Fragen und Antworten

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste