Balancer blockiert crawling

Hier finden YaCy User Hilfe wenn was nicht funktioniert oder anders funktioniert als man dachte. Bei offensichtlichen Fehlern diese bitte gleich in die Bugs (http://bugs.yacy.net) eintragen.
Forumsregeln
In diesem Forum geht es um Benutzungsprobleme und Anfragen für Hilfe. Wird dabei ein Bug identifiziert, wird der thread zur Bearbeitung in die Bug-Sektion verschoben. Wer hier also einen Thread eingestellt hat und ihn vermisst, wird ihn sicherlich in der Bug-Sektion wiederfinden.

Balancer blockiert crawling

Beitragvon dulcedo » Di Aug 25, 2009 3:33 am

In aktuellen Versionen (aufgefallen ab 6246, hier 6261) crawlt er nicht richtig wenn bei einzelnen Domains lange Verzögerungszeiten.
Hier gleich nach dem Startup, er versucht die angesprochene Domain einmal zu laden, danach nur noch das hier, obwohl jede menge andere zu crawlende Domains. Erst wenn er diese eine Domain laden darf crawlt er noch einen Schwung andere mit, danach wieder Pause.


Code: Alles auswählen
I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target tqwFo7zt-gSP/KIT01-05

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER starting new index transmission request to iz93p3SvQT__

I 2009/08/25 04:26:06 BALANCER waiting for www.rki.de: 30 seconds remaining...

I 2009/08/25 04:26:06 YACY yacyClient.transferRWI error:Connection timed out: connect

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER Transfer failed of chunk to target 12p1rK2YMmE_/Infokrieger: no connection from transferRWI

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER STORE: Chunk 1DR57RZoPg__ has failed to transmit index; marked peer as busy

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 1DR57RZoPg__

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9bS8zuXJ4z .. iz93p3SvQT__] and 18 URLs to peer kellerlanplayerRoot:ukTQ5w2Z6gVN in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target ukTQ5w2Z6gVN/kellerlanplayerRoot

I 2009/08/25 04:26:06 INDEX-TRANSFER-DISPATCHER STORE: Chunk iz93p3SvQT__ has FINISHED all transmissions!

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [NDRs5Snjeopf .. 1DR57RZoPg__] and 7 URLs to peer dulcedo:148qJoTxccAZ in 0 seconds successful (4 words/s)

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target 148qJoTxccAZ/dulcedo

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER STORE: Chunk 1DR57RZoPg__ has FINISHED all transmissions!

I 2009/08/25 04:26:07 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 68

I 2009/08/25 04:26:07 PLASMA dhtTransferJob: result from dequeueing: true

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER starting new index transmission request to mz93p3SvQT__

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9qcB-gKhWD .. mz93p3SvQT__] and 10 URLs to peer ZZZ:pAYESzaeIOKH in 0 seconds successful (9 words/s)

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target pAYESzaeIOKH/ZZZ

I 2009/08/25 04:26:07 INDEX-TRANSFER-DISPATCHER starting new index transmission request to mz93p3SvQT__

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9qcB-gKhWD .. mz93p3SvQT__] and 10 URLs to peer KIT01-05:tqwFo7zt-gSP in 0 seconds successful (10 words/s)

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target tqwFo7zt-gSP/KIT01-05

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER starting new index transmission request to mz93p3SvQT__

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9qcB-gKhWD .. mz93p3SvQT__] and 10 URLs to peer kellerlanplayerRoot:ukTQ5w2Z6gVN in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target ukTQ5w2Z6gVN/kellerlanplayerRoot

I 2009/08/25 04:26:08 INDEX-TRANSFER-DISPATCHER STORE: Chunk mz93p3SvQT__ has FINISHED all transmissions!

I 2009/08/25 04:26:09 BALANCER waiting for www.rki.de: 27 seconds remaining...

I 2009/08/25 04:26:09 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 67

I 2009/08/25 04:26:09 PLASMA dhtTransferJob: result from dequeueing: true

I 2009/08/25 04:26:09 INDEX-TRANSFER-DISPATCHER starting new index transmission request to qz93p3SvQT__

I 2009/08/25 04:26:09 PLASMA Received 107 Entries 4 Words [Yi8omSKAWVlT .. Yi-QzroZxme3]/1265016531510421668 from 1vrggnp0-Z0z:Lordy/0.9100617, processed in 50 milliseconds, requesting 9/107 URLs, blocked 0 RWIs

I 2009/08/25 04:26:09 PLASMA Received 9 URLs from peer 1vrggnp0-Z0z:Lordy/0.9100617 in 23 ms, blocked 0 URLs

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9wfeYDv5lE .. qz93p3SvQT__] and 5 URLs to peer KIT01-05:tqwFo7zt-gSP in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target tqwFo7zt-gSP/KIT01-05

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER starting new index transmission request to qz93p3SvQT__

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9wfeYDv5lE .. qz93p3SvQT__] and 5 URLs to peer kellerlanplayerRoot:ukTQ5w2Z6gVN in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target ukTQ5w2Z6gVN/kellerlanplayerRoot

I 2009/08/25 04:26:10 INDEX-TRANSFER-DISPATCHER starting new index transmission request to qz93p3SvQT__

I 2009/08/25 04:26:11 PLASMA Received 15 Entries 9 Words [kjfgdnhYR2Ta .. kjf8_uVYtHpR]/-465571904155602532 from CHk1tMQFkR22:locutus/0.91006255, processed in 5 milliseconds, requesting 2/15 URLs, blocked 0 RWIs

I 2009/08/25 04:26:11 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9wfeYDv5lE .. qz93p3SvQT__] and 5 URLs to peer vaisheshika:xY5gLzZqjn6d in 0 seconds successful (4 words/s)

I 2009/08/25 04:26:11 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target xY5gLzZqjn6d/vaisheshika

I 2009/08/25 04:26:11 INDEX-TRANSFER-DISPATCHER STORE: Chunk qz93p3SvQT__ has FINISHED all transmissions!

I 2009/08/25 04:26:11 PLASMA Received 2 URLs from peer CHk1tMQFkR22:locutus/0.91006255 in 6 ms, blocked 0 URLs

I 2009/08/25 04:26:11 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 66

I 2009/08/25 04:26:11 PLASMA dhtTransferJob: result from dequeueing: true

I 2009/08/25 04:26:11 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 2z93p3SvQT__

I 2009/08/25 04:26:11 PLASMA Received 9 Entries 9 Words [kjfgdnhYR2Ta .. kjf8_uVYtHpR]/-465571904155602532 from CHk1tMQFkR22:locutus/0.91006255, processed in 3 milliseconds, requesting 0/9 URLs, blocked 0 RWIs

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9nUzzRKGjs .. 2z93p3SvQT__] and 6 URLs to peer KIT01-09f:_NTXbiAmfN__ in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target _NTXbiAmfN__/KIT01-09f

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 2z93p3SvQT__

I 2009/08/25 04:26:12 BALANCER waiting for www.rki.de: 24 seconds remaining...

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9nUzzRKGjs .. 2z93p3SvQT__] and 6 URLs to peer bigbird:Czf8mtXEXYVM in 0 seconds successful (4 words/s)

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target Czf8mtXEXYVM/bigbird

I 2009/08/25 04:26:12 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 2z93p3SvQT__

I 2009/08/25 04:26:13 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [Gz9nUzzRKGjs .. 2z93p3SvQT__] and 6 URLs to peer mixer:C4_SBp33vU__ in 0 seconds successful (4 words/s)

I 2009/08/25 04:26:13 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target C4_SBp33vU__/mixer

I 2009/08/25 04:26:13 INDEX-TRANSFER-DISPATCHER STORE: Chunk 2z93p3SvQT__ has FINISHED all transmissions!

I 2009/08/25 04:26:13 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 65

I 2009/08/25 04:26:13 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 9rVHaiJ-Xx__

I 2009/08/25 04:26:13 PLASMA dhtTransferJob: result from dequeueing: true

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [prU8iRLuRlWQ .. 9rVHaiJ-Xx__] and 19 URLs to peer KIT01-09f:_NTXbiAmfN__ in 0 seconds successful (5 words/s)

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target _NTXbiAmfN__/KIT01-09f

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 9rVHaiJ-Xx__

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [prU8iRLuRlWQ .. 9rVHaiJ-Xx__] and 19 URLs to peer locutus:CHk1tMQFkR22 in 0 seconds successful (21 words/s)

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target CHk1tMQFkR22/locutus

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER starting new index transmission request to 9rVHaiJ-Xx__

I 2009/08/25 04:26:14 PLASMA Received 1 Entries 1 Words [_X3fyATwfF4l .. _X3fyATwfF4l]/-4330496591671905688 from _NTXbiAmfN__:KIT01-09f/0.9100625, processed in 1 milliseconds, requesting 0/1 URLs, blocked 0 RWIs

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Index transfer of 3 words [prU8iRLuRlWQ .. 9rVHaiJ-Xx__] and 19 URLs to peer bigbird:Czf8mtXEXYVM in 0 seconds successful (4 words/s)

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target Czf8mtXEXYVM/bigbird

I 2009/08/25 04:26:14 INDEX-TRANSFER-DISPATCHER STORE: Chunk 9rVHaiJ-Xx__ has FINISHED all transmissions!

I 2009/08/25 04:26:15 BALANCER waiting for www.rki.de: 21 seconds remaining...
Dateianhänge
yacy090825.png
yacy090825.png (16.45 KiB) 10690-mal betrachtet
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Di Aug 25, 2009 4:44 pm

naja das blockieren soll der balancer ja machen .. bzw. immer so eine domäne ziehen, so dass er nicht blockieren soll. Man müsste mal sehen woran es im Detail gelegen hat, also nicht nur melden dass das Ding 30 Sekunden wartet, sondern auch welche Regel das bewirkt hat. Da müsste ich erst einiges umbauen.

Aber an aktuellen Änderungen kann es eigentlich nicht sein, da hat niemand was dran gemacht.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Aug 25, 2009 5:15 pm

Ist unter Linux genauso, nach Version 6177, die funktioniert noch korrekt.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Fr Aug 28, 2009 12:08 pm

SVN6275 funktioniert als Neuinstallation nun seit einem halben Tag sehr gut, crawling und auch die Suchprobleme weg (cluster nicht getestet). Ob das an der Neuinstallation lag oder am seit v0.7 mitgeschlepptem Datenbestand versuche ich die nächsten Tage mal. Welche Daten gehen denn verloren wenn ich bei jeweils 0.9 nur /freeworld/text/ bei einem neu aufgesetzten peer einkopiere, bzw bringt das überhaupt einen Nutzen diesbezüglich? Die Sache mit ooms und dem httpd behalte ich im Auge.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Sa Aug 29, 2009 2:24 pm

SVN6275 unter debian nach 20 Stunden einwandfreienm crawlen mit ca 1000ppm, er schreibt nun nur noch diese Meldungen ins log, ist aber erreichbar und ich kann auch das ganze log nachliefern. Der crawl ist noch lange nicht abgearbeitet, es blockieren keine threads
Code: Alles auswählen
I 2009/08/29 12:00:45 PLASMA Excluded 0 words in URL http://www.taz.de/regional/berlin/aktuell/artikel/kommentarseite/1/baden-bis-die-aerzte-kommen/kommentare/1/1/?type=98/&type=98
I 2009/08/29 12:00:45 PLASMA *Indexed 77 words in URL http://www.taz.de/regional/berlin/aktuell/artikel/kommentarseite/1/baden-bis-die-aerzte-kommen/kommentare/1/1/?type=98/&type=98 [bXzoKU9C5n5A]
   Description:  Kommentarseite - taz.de
   MimeType: text/html | Charset: UTF-8 | Size: 816 bytes | Anchors: 15
   LinkStorageTime: 1 ms | indexStorageTime: 0 ms
D 2009/08/29 12:00:45 CRAWLER LOCALCRAWL[913602, 0, 0, 0]: URL=http://www.fnp.de/tz/region/lokales/rmn01.c.6436679.de_index.htm?sv%5Bonline_date%5D%5B%5D=<2009-08-05&sv%5Bonline_date%5D%5B%5D=>2009-08-05&sv%5Bstatus%5D=online_archive, initiator=w6fmmh5tMQWr, crawlOrder=false, depth=6, crawlDepth=6, must-match=.*, must-not-match=, permission=true
D 2009/08/29 12:00:45 CRAWLER LOCALCRAWL[913601, 0, 0, 0]: URL=http://www.rundschau-online.de/html/service/tv-programm/index.php?ref=kr&aktion=archiv&mid=2001_rush_hour_2, initiator=w6fmmh5tMQWr, crawlOrder=false, depth=6, crawlDepth=6, must-match=.*, must-not-match=, permission=true
D 2009/08/29 12:00:45 CRAWLER LOCALCRAWL[913600, 0, 0, 0]: URL=http://www.lr-online.de/mediacenter/bilder/bilddetail/cme78835%2C1368761.html?SORT=PRIO&fCMS=358ace4f1d1fd302a00dc4b8c529c6c1, initiator=w6fmmh5tMQWr, crawlOrder=false, depth=6, crawlDepth=6, must-match=.*, must-not-match=, permission=true
I 2009/08/29 12:00:45 BALANCER forcing crawl-delay of 6000 milliseconds for http://www.merkur-online.de (forced latency)
I 2009/08/29 12:00:45 PLASMA CRAWL: ADDED 129 LINKS FROM http://www.merkur-online.de/verschiedenes/search/index.html?qr=elfriede%20jelinek&tt=1&sb=1&sn=&rs=&fd=&td=&es=1, NEW CRAWL STACK SIZE IS 913598, STACKING TIME = 1, PARSING TIME = 9
I 2009/08/29 12:00:45 PLASMA Excluded 0 words in URL http://www.merkur-online.de/verschiedenes/search/index.html?qr=elfriede%20jelinek&tt=1&sb=1&sn=&rs=&fd=&td=&es=1
I 2009/08/29 12:00:45 PLASMA *Indexed 422 words in URL http://www.merkur-online.de/verschiedenes/search/index.html?qr=elfriede%20jelinek&tt=1&sb=1&sn=&rs=&fd=&td=&es=1 [xaYbD0oJK6JC]
   Description:  Suchergebnis für 'elfriede jelinek'
   MimeType: text/html | Charset: UTF-8 | Size: 6311 bytes | Anchors: 129
   LinkStorageTime: 1 ms | indexStorageTime: 2 ms
D 2009/08/29 12:00:45 CRAWLER problem loading http://www.tz-online.de/community/profiles/vollstrecker27/photos/2275/favorites/create?referer=http:/www.tz-online.de/community/allMedia/showPhotosByUploadDate%3FlistIdx%3D211: REJECTED WRONG STATUS TYPE '500 Internal Server Error' for URL http://www.tz-online.de/community/profiles/vollstrecker27/photos/2275/favorites/create?referer=http:/www.tz-online.de/community/allMedia/showPhotosByUploadDate%3FlistIdx%3D211
D 2009/08/29 12:00:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:00:45 CRAWLER problem loading http://www.fnp.de/tz/region/lokales/rmn01.c.6436679.de_index.htm?sv%5Bonline_date%5D%5B%5D=<2009-08-05&sv%5Bonline_date%5D%5B%5D=>2009-08-05&sv%5Bstatus%5D=online_archive: REJECTED WRONG STATUS TYPE '404 Not Found' for URL http://www.fnp.de/tz/region/lokales/rmn01.c.6436679.de_index.htm?sv%5Bonline_date%5D%5B%5D=<2009-08-05&sv%5Bonline_date%5D%5B%5D=>2009-08-05&sv%5Bstatus%5D=online_archive
D 2009/08/29 12:00:48 CRAWLER omitting de-queue/remote: stack is empty
I 2009/08/29 12:00:49 PARSER Unable to parse 'http://www.neuepresse.de/index.php/disclose/send/237890'. No resource content available (1) source == null, url = http://www.neuepresse.de/index.php/disclose/send/237890
W 2009/08/29 12:00:49 PLASMA Unable to parse the resource 'http://www.neuepresse.de/index.php/disclose/send/237890'. No resource content available (1) source == null, url = http://www.neuepresse.de/index.php/disclose/send/237890; url = http://www.neuepresse.de/index.php/disclose/send/237890
I 2009/08/29 12:00:50 PLASMA Excluded 0 words in URL http://www.lr-online.de/mediacenter/bilder/bilddetail/cme78835%2C1368761.html?SORT=PRIO&fCMS=358ace4f1d1fd302a00dc4b8c529c6c1
I 2009/08/29 12:00:50 PLASMA *Indexed 1183 words in URL http://www.lr-online.de/mediacenter/bilder/bilddetail/cme78835%2C1368761.html?SORT=PRIO&fCMS=358ace4f1d1fd302a00dc4b8c529c6c1 [ALIj5sMeonGB]
   Description:  Lausitzer Rundschau - Brandenburger Tageszeitung mit Nachrichten aus Wirtschaft, Politik und Kultur :: lr-online
   MimeType: text/html | Charset: ISO-8859-1 | Size: 10245 bytes | Anchors: 1117
   LinkStorageTime: 0 ms | indexStorageTime: 4 ms
D 2009/08/29 12:00:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:00:54 CRAWLER problem loading http://www.rundschau-online.de/html/service/tv-programm/index.php?ref=kr&aktion=archiv&mid=2001_rush_hour_2: Read timed out
D 2009/08/29 12:00:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:00:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:21 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:27 CRAWLER omitting de-queue/remote: stack is empty
I 2009/08/29 12:01:27 YACY hello: responded remote peer 'derchris-eu' [94.23.1.26] in 78 milliseconds
D 2009/08/29 12:01:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:01:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:18 CRAWLER omitting de-queue/remote: stack is empty
I 2009/08/29 12:02:19 YACY yacyClient.publishMySeed thread 'PublishSeed_hax404' contacted peer at 89.14.96.25:9090, received 10453 bytes, time = 635 milliseconds
I 2009/08/29 12:02:19 YACY connect: SELF reference 84.38.74.230:8090
I 2009/08/29 12:02:19 YACY publish: handshaked senior peer 'hax404' at 89.14.96.25:9090
I 2009/08/29 12:02:19 YACY PeerPing: I am accessible for 6 peer(s), not accessible for 0 peer(s).
I 2009/08/29 12:02:19 YACY PeerPing: myType is senior
D 2009/08/29 12:02:21 CRAWLER omitting de-queue/remote: stack is empty
E 2009/08/29 12:02:23 YACY yacyClient.queryUrlCount error asking peer '192-168-178-31-16dpnw64':java.net.NoRouteToHostException: No route to host
I 2009/08/29 12:02:23 YACY hello: responded remote junior peer '192-168-178-31-16dpnw64' from 87.234.135.251
I 2009/08/29 12:02:23 YACY hello: responded remote peer '192-168-178-31-16dpnw64' [87.234.135.251] in 128 milliseconds
D 2009/08/29 12:02:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:27 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:02:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:21 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:27 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/08/29 12:03:59 KELONDRO file '/home/yacy/y2/yacy/DATA/INDEX/freeworld/NETWORK/newsProcessed.stack' closed.
D 2009/08/29 12:03:59 KELONDRO file '/home/yacy/y2/yacy/DATA/INDEX/freeworld/NETWORK/newsPublished.stack' closed.
I 2009/08/29 12:03:59 YACY rulebasedUpdateInfo: not an automatic update selected
I 2009/08/29 12:03:59 RESOURCE OBSERVER df of Volume /: 200824 MB
I 2009/08/29 12:03:59 RESOURCE OBSERVER run completed; everything in order
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Aug 30, 2009 7:59 am

Fehler ist reproduzierbar, auch 6278. Werden mehr informationen benötigt?
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » So Aug 30, 2009 9:29 am

meinst du hier "omitting de-queue/remote: stack is empty" ? das ist kein Fehler, sondern es sagt dir einfach nur dass deine remote crawl queue leer ist. Diese Meldung kommt auch nur wenn man das Logging auf 'fine' stellt.

Das 'pumpen' beim Crawlen ist ein Effekt des aktuellen Balancer-Designs, da kann man nichts auf die Schnelle fixen, da müsste ich erst ein ganz neues Konzept entwickeln.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Aug 30, 2009 11:01 am

Nein, er hört auf zu crawlen und schreibt nur noch diese Meldungen ins log, DHT macht dieser peer nicht. Nach Neustart gehts dann weiter.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » So Aug 30, 2009 11:34 am

hm, kannst du mal gucken ob da irgendwo eine Exception rausgeschrieben wird, wahrscheinlich an der Grenze zwischen Crawlen und nicht-mehr-crawlen. Ich nehme an den hättest du schon gepostet, aber was anderes fällt mir momentan nicht ein.
Vielleicht ein Deadlock? Hast du im Thread Dump blockierte Threads wenn der Crawl hängt?
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Aug 30, 2009 12:56 pm

Das betreffende log habe ich angehängt. Sieht immer so aus, ich finde auch keine Auffälligkeit, er hätte noch eine halbe mio gut durchmischte urls zu crawlen.

Code: Alles auswählen
************* Start Thread Dump Sun Aug 30 13:53:16 CEST 2009 *******************

YaCy Version: 0.91/6278
Total Memory = 821362688
Used  Memory = 476276016
Free  Memory = 345086672


THREADS WITH STATES: BLOCKED


THREADS WITH STATES: RUNNABLE
Dateianhänge
yacy01_hang.zip
(111.61 KiB) 67-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mo Aug 31, 2009 8:05 am

Ich möchte es auf dem System ungern testen, aber könnte es sein dass es an java 1.5 liegt? Dort läuft debian etch LTS.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon bluumi » Mo Aug 31, 2009 12:00 pm

yacy01_hang.zip
^^ Hast DU noch ein Log File "vor" diesem? mir scheint, wenn DU einen Exception hattest, so fand es vor diesem an, denn bereits am Anfang hat es ja diese Crawl "leer fehler".
bluumi
 
Beiträge: 388
Registriert: Mi Okt 08, 2008 7:27 am

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mo Aug 31, 2009 3:18 pm

Bisher finde ich nur solche Sachen, der letzte ist mir ähnlich schon öfters aufgefallen. Wenn er wieder hängt packe ich alle logs ein.

Code: Alles auswählen
E 2009/08/31 14:47:14 YACY yacyClient.queryUrlCount error asking peer 'itg-0000-4299':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms

E 2009/08/31 14:48:52 SERVER receive interrupted - exception 2 = Connection resetE 2009/08/31 14:48:52 SERVER receive interrupted - exception 2 = Connection resetE 2009/08/31 14:48:52 SERVER receive interrupted - exception 2 = Connection reset

E 2009/08/31 14:49:13 YACY yacyClient.queryUrlCount error asking peer '192-168-123-1-31dpnw3':java.net.ConnectException: Connection refusedI


2009/08/31 16:16:07 YACY yacyClient.queryUrlCount error asking peer '192-168-220-100-111dpnw99':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 msI 2009/08/31 16:16:07 YACY hello: responded remote junior peer '192-168-220-100-111dpnw99' from 89.48.143.96I 2009/08/31 16:16:07 YACY hello: responded remote peer '192-168-220-100-111dpnw99' [89.48.143.96] in 5005 millisecondsW 2009/08/31 16:16:08 FILEHANDLER Unexpected error while processing query.
Session: Session_79.207.159.26:63201#0
Query:   /yacy/search.html
Client:  79.207.159.26
Reason:  java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
   at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:913)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)D 2009/08/31 16:16:08 CRAWLER omitting de-queue/remote: stack is empty



Unter Crawler Errors desöfteren das hier zu finden:
cannot load: not enqueued to indexer
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mi Sep 02, 2009 1:05 pm

SVN6167 habe ich diesen Hänger nun wohl als komplettes log, hier die Stelle wo die Probleme anfangen, danach lässt er nur noch remote crawlen, was hier auch ein böser Bug ist, es crawlen mehrere peers ausserhalb des eigenen clusters. (Robinson, public cluster aus 2 peers).

Code: Alles auswählen
D 2009/09/02 10:47:23 CRAWLER Crawling of URL 'http://gdata.youtube.com/feeds/base/users/fredbauer1/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile' disallowed by robots.txt.
I 2009/09/02 10:47:24 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.9100628 for URL PKaLMWZfZtXQ:http://karlweiss.twoday.net/stories/4056209/
I 2009/09/02 10:47:24 PLASMA Excluded 0 words in URL http://www.youtube.com/user/artistMarcus
I 2009/09/02 10:47:24 PLASMA *Indexed 289 words in URL http://www.youtube.com/user/artistMarcus [P7TU9YQnvQBY]
   Description:  YouTube - artistMarcus's Channel
   MimeType: text/html | Charset: UTF-8 | Size: 6253 bytes | Anchors: 50
   LinkStorageTime: 1 ms | indexStorageTime: 1 ms
D 2009/09/02 10:47:25 CRAWLER LOCALCRAWL[133070, 354981, 0, 0]: URL=http://www.youtube.com/watch?v=I77faKfZDIg, initiator=QUsWv-Ir0T3-, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/02 10:47:26 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.9100628 for URL PIO34QGobKmY:http://www.shinjiru.com/aup.html
I 2009/09/02 10:47:26 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.9100628 for URL PQhwa3jHD7GA:http://wetter.t-online.de/Wetter/Blaustein/Heute/TA1DBE5LNKPBP.html
I 2009/09/02 10:47:26 PLASMA Excluded 0 words in URL http://www.youtube.com/watch?v=I77faKfZDIg
I 2009/09/02 10:47:26 PLASMA *Indexed 281 words in URL http://www.youtube.com/watch?v=I77faKfZDIg [j8kqYQQnvQBY]
   Description:  Das Fernsehen in Gehren
   MimeType: text/html | Charset: UTF-8 | Size: 4767 bytes | Anchors: 104
   LinkStorageTime: 0 ms | indexStorageTime: 2 ms
I 2009/09/02 10:47:27 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.9100628 for URL PNQP5TmpKvHY:http://www.stone.com/iPhone/Twittelator/Twittelator_Hints_Tips.html
I 2009/09/02 10:47:27 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.9100628 for URL PDsI2_oHm34A:http://bloggar.se/om/riksdagen
I 2009/09/02 10:47:28 BALANCER forcing crawl-delay of 12 seconds for www.moviepilot.de (forced latency)
D 2009/09/02 10:47:29 CRAWLER problem loading http://www.moviepilot.de/movies/berli...: REJECTED WRONG STATUS TYPE '404 Not Found' for URL http://www.moviepilot.de/movies/berli...
D 2009/09/02 10:47:31 ROBOTS Trying to download the robots.txt file from URL 'http://blog.peijnik.at/robots.txt'.
D 2009/09/02 10:47:31 ROBOTS robots.txt could not be downloaded from URL 'http://blog.peijnik.at/robots.txt'. [404 Not Found].
D 2009/09/02 10:47:31 ROBOTS Trying to download the robots.txt file from URL 'http://wikieducator.org/robots.txt'.
D 2009/09/02 10:47:32 ROBOTS Robots.txt successfully loaded from URL 'http://wikieducator.org/robots.txt' in 39 ms.
D 2009/09/02 10:47:32 ROBOTS Trying to download the robots.txt file from URL 'http://www.dinosaurgardens.com/robots.txt'.
D 2009/09/02 10:47:32 ROBOTS Robots.txt successfully loaded from URL 'http://www.dinosaurgardens.com/robots.txt' in 280 ms.
I 2009/09/02 10:47:32 PLASMA crawlReceipt: RECEIVED RECEIPT from nlVtODd2EZ__:konstantinnII/0.9000613 for URL NNHODQ8OgEuA:http://www.zebralog.de/
I 2009/09/02 10:47:40 YACY hello: responded remote peer 'vaisheshika' [87.155.246.81] in 194 milliseconds


Kurz danach und auch mehrmals vorher das hier:
Code: Alles auswählen
I 2009/09/02 10:48:53 PLASMA crawlReceipt: RECEIVED RECEIPT from Z7fXdABG6r__:KIT01-04f-/0.9100625 for URL BwwYcMYDFdkB:http://www.sueddeutsche.de/politik/881/484319/zoom_0_0/
I 2009/09/02 10:48:54 PLASMA crawlReceipt: RECEIVED RECEIPT from Z7fXdABG6r__:KIT01-04f-/0.9100625 for URL Pcxwnes8VLHQ:http://www.faz.net/s/RubBD6B20C3D01A48D58DA92331B0A80BC3/Doc~E932EE46F61E84E76ACEEE49D77D721A8~ATpl~Ecommon~Sspezial.html
I 2009/09/02 10:48:54 PLASMA crawlReceipt: RECEIVED RECEIPT from Z7fXdABG6r__:KIT01-04f-/0.9100625 for URL PUbMMz_0wW0h:https://piraten-mfr.de/kalender/29-08-2009/infostand-fuerth
W 2009/09/02 10:48:54 FILEHANDLER Unexpected error while processing query.
Session: Session_84.189.107.50:56122#0
Query:   /yacy/search.html
Client:  84.189.107.50
Reason:  java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
   at de.anomic.http.httpd.parseMultipart(httpd.java:910)
   at de.anomic.http.httpdFileHandler.doResponse(httpdFileHandler.java:352)
   at de.anomic.http.httpdFileHandler.doPost(httpdFileHandler.java:248)
   at de.anomic.http.httpd.POST(httpd.java:627)
   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:742)
   at de.anomic.server.serverCore$Session.run(serverCore.java:621)



Vorher nur das unter Error zu finden:
Code: Alles auswählen
E 2009/09/02 09:59:00 YACY yacyClient.queryUrlCount error asking peer 'bakkdoor':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms
I 2009/09/02 09:59:00 YACY hello: responded remote junior peer 'bakkdoor' from localhost
I 2009/09/02 09:59:00 YACY hello: changing remote peer 'bakkdoor' [localhost] peerType from 'virgin' to 'junior'.
I 2009/09/02 09:59:00 YACY hello: responded remote peer 'bakkdoor' [localhost] in 5007 milliseconds


Code: Alles auswählen
I 2009/09/02 07:16:50 PLASMA *Indexed 1772 words in URL http://www.tagesspiegel.de/kultur/Urheberrecht;art772%2C2835794 [6yQ21_SYPx9B]
   Description:  Die Ideen der anderen
   MimeType: text/html | Charset: ISO-8859-1 | Size: 41763 bytes | Anchors: 126
   LinkStorageTime: 0 ms | indexStorageTime: 6 ms
D 2009/09/02 07:16:50 CRAWLER LOCALCRAWL[136281, 340799, 0, 0]: URL=http://de.youtube.com/cthru?key=MqvIMLz4KN0YMaslzarGoo9r00zv6VZga3DFOhXfUKdl8XdPcjTC8OH_lPBj36r_voRxyjmd5twJn_d_yxhZD-xL5J_xX5xqSKxIhO6ODpc43DKseWhey25rSgkJnntVtQ3EZqtXNElAktqfA98yuEor2aJgzCI6PBPDI-QtU3CQL6oXKkvVRyOlQWjThPW_nGC7NQfC9cIcq9exeIp1cA%3D%3D, initiator=QUsWv-Ir0T3-, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
D 2009/09/02 07:16:50 CRAWLER LOCALCRAWL[136280, 340803, 0, 0]: URL=http://www.taz.de/1/leben/internet/artikel/1/piraten-blamieren-die-anklaeger/, initiator=QUsWv-Ir0T3-, crawlOrder=true, depth=3, crawlDepth=4, must-match=.*, must-not-match=, permission=true
I 2009/09/02 07:16:50 PLASMA CRAWL: ADDED 226 LINKS FROM http://www.ahlener-zeitung.de/lokales/muenster/special/kommunalwahl/meldungen/1082332_Piratenpartei_stellt_in_Muenster_fuenf_Kandidaten_fuer_Kommunalwahl_auf.html, NEW CRAWL STACK SIZE IS 136278, STACKING TIME = 3097, PARSING TIME = 21
E 2009/09/02 07:16:50 BLOCKINGTHREAD Runtime Error in serverInstantThread.job, thread 'java.lang.reflect.Method.webStructureAnalysis.20': null; target exception: null
java.lang.ArrayIndexOutOfBoundsException
   at java.lang.System.arraycopy(Native Method)
   at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
   at java.lang.StringBuilder.append(StringBuilder.java:131)
   at java.lang.StringBuilder.append(StringBuilder.java:172)
   at de.anomic.plasma.plasmaWebStructure.generateCitationReference(plasmaWebStructure.java:136)
   at de.anomic.plasma.plasmaParserDocument.notifyWebStructure(plasmaParserDocument.java:451)
   at de.anomic.plasma.plasmaSwitchboard.webStructureAnalysis(plasmaSwitchboard.java:1780)
   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
   at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
   at java.util.concurrent.FutureTask.run(FutureTask.java:123)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
   at java.lang.Thread.run(Thread.java:595)

dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Do Sep 03, 2009 6:10 pm

Diesmal die aktuelle SVN6286, 50k URLs noch zu crawlen, nach dieser hier (unable to parse resource) crawlt er nichts mehr.

Code: Alles auswählen
D 2009/09/03 16:36:45 CRAWLER LOCALCRAWL[57421, 0, 0, 0]: URL=http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778, initiator=w6fmmh5tMQWr, crawlOrder=true, depth=3, crawlDepth=3, must-match=.*, must-not-match=, permission=true
I 2009/09/03 16:36:45 BALANCER forcing crawl-delay of 6000 milliseconds for www.neuepresse.de (forced latency)
D 2009/09/03 16:36:46 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/03 16:36:47 PARSER Unable to parse 'http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778'. No resource content available (1) source == null, url = http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778
W 2009/09/03 16:36:47 PLASMA Unable to parse the resource 'http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778'. No resource content available (1) source == null, url = http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778; url = http://www.neuepresse.de/layout/set/gallery/Mediathek/Fotostrecken/Vollsperrung-auf-der-A-2/q13/(at)/232778
D 2009/09/03 16:36:49 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:36:52 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:36:55 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:36:58 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:01 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:04 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:07 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:10 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:13 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:16 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:19 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:22 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:25 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:28 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:31 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/03 16:37:34 CRAWLER omitting de-queue/remote: stack is empty
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon bluumi » Fr Sep 04, 2009 9:01 pm

"Reason: java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out"

seltsam, den hat ich auch, aber das war irgendwo um 6247er - 6256er
Leider sind die alten Logs aber längst überschrieben. Leider kann ich Dir nicht behilflich sein.
bluumi
 
Beiträge: 388
Registriert: Mi Okt 08, 2008 7:27 am

Re: Balancer blockiert crawling

Beitragvon dulcedo » Sa Sep 05, 2009 5:19 am

War ein Tippfehler und 6286 nicht 68.
Ich hebe alle diese logs komplett auf falls sich jemand um den Fehler kümmern möchte.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mo Sep 07, 2009 11:00 am

W 2009/09/07 07:53:41 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException

Die ist neu und anschliessend kein crawling mehr, 50K URLs noch im Puffer. SVN6290
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 08, 2009 3:08 am

Der selben peers blockieren schon wieder den crawl, in allen neueren Versionen unter debian und win7. Die Anzeige für urls im crawl puffer kann nicht fehlerhaft sein weil er nach Neustart ganz normal weitercrawlt. Hier Win7 SVN6300, nach dem letzten erfolgreichen Seitencrawl und indexierung crawlt er nicht weiter, nur noch Verteilung und Empfang. Ich sehe weit und breit keine Fehlermedung dazu, logs sind gesichert.

Code: Alles auswählen
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER Index transfer of 9 words [IkX0Q4yYv3UR .. gkZAk4nx6k__] and 83 URLs to peer Pandora:oXz0bFOoZqFM in 0 seconds successful (12 words/s)
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target oXz0bFOoZqFM/Pandora
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER starting new index transmission request to gkZAk4nx6k__
D 2009/09/07 23:10:54 CRAWLER LOCALCRAWL[17331, 0, 0, 0]: URL=http://www.karlsruhe.city-map.de/01110000/physiotherapeutin-fuer-hunde-kabierske-pia-tierheilpraktikerin, initiator=hUvYXnWm4S1L, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/07 23:10:54 BALANCER forcing crawl-delay of 51000 milliseconds for www.bmg.bund.de (forced latency)
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER Index transfer of 9 words [IkX0Q4yYv3UR .. gkZAk4nx6k__] and 83 URLs to peer ZZZ:pAYESzaeIOKH in 0 seconds successful (14 words/s)
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target pAYESzaeIOKH/ZZZ
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER STORE: Chunk gkZAk4nx6k__ has FINISHED all transmissions!
I 2009/09/07 23:10:54 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 76
I 2009/09/07 23:10:54 PLASMA dhtTransferJob: result from dequeueing: true
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER starting new index transmission request to QkZAk4nx6k__
I 2009/09/07 23:10:54 PLASMA Excluded 0 words in URL http://www.karlsruhe.city-map.de/01110000/physiotherapeutin-fuer-hunde-kabierske-pia-tierheilpraktikerin
I 2009/09/07 23:10:54 PLASMA *Indexed 99 words in URL http://www.karlsruhe.city-map.de/01110000/physiotherapeutin-fuer-hunde-kabierske-pia-tierheilpraktikerin [8llPMQy9YpnA]
   Description:  Physiotherapeutin für Hunde Kabierske Pia Tierheilpraktikerin, Rheinstetten, Tierheilpraktiker
   MimeType: text/html | Charset: ISO-8859-1 | Size: 1409 bytes | Anchors: 40
   LinkStorageTime: 2 ms | indexStorageTime: 3 ms
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER Transfer failed of chunk to target RrPkRuW9m7mM/derchris-eu: busy
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER STORE: Chunk QkZAk4nx6k__ has failed to transmit index; marked peer as busy
I 2009/09/07 23:10:54 INDEX-TRANSFER-DISPATCHER starting new index transmission request to QkZAk4nx6k__
I 2009/09/07 23:10:55 PLASMA Excluded 0 words in URL http://www.karlsruhe.city-map.de/01110000/igk-geotechnik
I 2009/09/07 23:10:55 PLASMA *Indexed 93 words in URL http://www.karlsruhe.city-map.de/01110000/igk-geotechnik [i59p5Qy9YpnA]
   Description:  IGK Geotechnik, Weingarten, Geologie
   MimeType: text/html | Charset: ISO-8859-1 | Size: 1195 bytes | Anchors: 40
   LinkStorageTime: 2 ms | indexStorageTime: 1 ms
I 2009/09/07 23:10:55 PLASMA Received 7 Entries 7 Words [xApIsz5HSb9a .. xAr5bRN9zk8E]/-2260635870734177644 from oXz0bFOoZqFM:Pandora/0.91006297, processed in 1 milliseconds, requesting 1/7 URLs, blocked 0 RWIs


Ist angedacht dem mal auf den Grund zu gehe ich würde sie gerne benutzen, ansonsten spiele ich v0.81 auf. Dort hatte ich damals zum Linuxtag wie besprochen meine Entwicklungen angesetzt die nun entweder nicht nutzbar sind oder ein Dauerbetrieb nicht möglich ist. Gemischtes crawlen/suchen in unterschiedlichen Versionen geht auch nicht weil die cluster-funktion fehlerhaft arbeitet.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 08, 2009 1:46 pm

Diese Warning war schon wiederholt Ursache für die crawlprobleme, auf alle Fälle taucht sie oft zeitnah im log auf.
SVN6303

Code: Alles auswählen
I 2009/09/08 09:46:00 PLASMA CRAWL: ADDED 140 LINKS FROM http://www.westfaelischer-anzeiger.de/hammherringenstart/welver.html, NEW CRAWL STACK SIZE IS 88662, STACKING TIME = 1, PARSING TIME = 9
I 2009/09/08 09:46:00 PLASMA Excluded 0 words in URL http://www.westfaelischer-anzeiger.de/hammherringenstart/welver.html
I 2009/09/08 09:46:00 PLASMA *Indexed 581 words in URL http://www.westfaelischer-anzeiger.de/hammherringenstart/welver.html [6eAOg2FA8IMD]
   Description:  Lokales: Homepage | Nachrichten aus Hamm | Westfälischer Anzeiger
   MimeType: text/html | Charset: UTF-8 | Size: 9032 bytes | Anchors: 140
   LinkStorageTime: 13 ms | indexStorageTime: 2 ms
I 2009/09/08 09:46:00 PARSER Unable to parse 'http://www.sueddeutsche.de/thema/Donau'. No resource content available (1) source == null, url = http://www.sueddeutsche.de/thema/Donau
W 2009/09/08 09:46:00 PLASMA Unable to parse the resource 'http://www.sueddeutsche.de/thema/Donau'. No resource content available (1) source == null, url = http://www.sueddeutsche.de/thema/Donau; url = http://www.sueddeutsche.de/thema/Donau
D 2009/09/08 09:46:00 CRAWLER LOCALCRAWL[88663, 0, 0, 0]: URL=http://www.stimme.de/heilbronn/nachrichten/region/Heilbronn-Bundesgartenschau;art16305%2C1637279?fCMS=7d1c1a44677b43f77615dc736b057c60, initiator=w6fmmh5tMQWr, crawlOrder=true, depth=5, crawlDepth=4, must-match=.*, must-not-match=.*memberlist.*|.*previous.*|.*next.*|.*p=.*, permission=true
I 2009/09/08 09:46:00 CRAWLER shifted 1 jobs from global crawl to local crawl (coreCrawlJobSize()=88663, limitCrawlJobSize()=0, cluster.mode=publicpeer, robinsonMode=on
I 2009/09/08 09:46:00 BALANCER forcing crawl-delay of 60000 milliseconds for www.westfaelischer-anzeiger.de (forced latency)
I 2009/09/08 09:46:02 PLASMA Excluded 0 words in URL http://www.taz.de/1/politik/europa/artikel/kommentarseite/1/abgeschobene-iraker-verhaftet/kommentare/1/1/
I 2009/09/08 09:46:02 PLASMA *Indexed 486 words in URL http://www.taz.de/1/politik/europa/artikel/kommentarseite/1/abgeschobene-iraker-verhaftet/kommentare/1/1/ [7ahOMB9C5n5A]
   Description:  Heftige Proteste in Dänemark: Abgeschobene Iraker verhaftet - taz.de
   MimeType: text/html | Charset: UTF-8 | Size: 7601 bytes | Anchors: 80
   LinkStorageTime: 1 ms | indexStorageTime: 2 ms
D 2009/09/08 09:46:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:21 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:27 CRAWLER omitting de-queue/remote: stack is empty
W 2009/09/08 09:46:29 FILEHANDLER Unexpected error while processing query.
Session: Session_213.168.93.5:39309#0
Query:   /yacy/hello.html
Client:  213.168.93.5
Reason:  java.io.IOException: FileUploadException Stream ended unexpectedly
java.io.IOException: FileUploadException Stream ended unexpectedly
   at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:913)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
D 2009/09/08 09:46:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:46:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:15 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/08 09:47:15 YACY yacyClient.publishMySeed thread 'PublishSeed_Motorrad-Suche' contacted peer at 80.134.241.209:8085, received 9697 bytes, time = 1090 milliseconds
I 2009/09/08 09:47:15 YACY connect: SELF reference 84.38.74.230:8090
I 2009/09/08 09:47:15 YACY publish: handshaked senior peer 'Motorrad-Suche' at 80.134.241.209:8085
I 2009/09/08 09:47:15 YACY PeerPing: I am accessible for 8 peer(s), not accessible for 0 peer(s).
I 2009/09/08 09:47:15 YACY PeerPing: myType is senior
D 2009/09/08 09:47:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:21 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:27 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:47:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:03 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:06 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:12 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:21 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/08 09:48:21 YACY hello: responded remote peer 'KIT01-15n017-ZUW' [141.52.175.29] in 35 milliseconds
D 2009/09/08 09:48:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:27 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:30 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:33 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:36 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:39 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:42 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:45 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:48 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:51 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:54 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:48:57 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:49:00 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/08 09:49:03 CRAWLER omitting de-queue/remote: stack is empty

dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon bluumi » Di Sep 08, 2009 8:46 pm

Also
unable to parse Errors
können den Fehler kaum auslösen, wenn du den meinst, den habe ich andauernd ohne dann
"D 2009/09/03 16:36:49 CRAWLER omitting de-queue/remote: stack is empty"
Code: Alles auswählen
I 2009/09/08 21:44:52 PARSER Unable to parse 'http://mobile.gearlog.com/'. No resource content available (1) source == null, url = http://mobile.gearlog.com/
W 2009/09/08 21:44:52 PLASMA Unable to parse the resource 'http://mobile.gearlog.com/'. No resource content available (1) source == null, url = http://mobile.gearlog.com/; url = http://mobile.gearlog.com/
bluumi
 
Beiträge: 388
Registriert: Mi Okt 08, 2008 7:27 am

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mi Sep 09, 2009 3:34 am

Nein, diese Einträge wiederhohlen sich am Ende weil dieser Peer nicht verteilt oder Daten empfängt und auch nicht remote crawlt. Also macht er nichts andres als zu prüfen ob Remote-Crawls abzuarbeiten sind, was auch korrekt ist. Wenn er lokal nichts zu crawlen hätte, dort warten aber zu diesem Zeitpunkt noch viele URLs.
Der genannte Fehler bezieht sich auf die Warning betreffend "hello.html"
W 2009/09/08 09:46:29 FILEHANDLER Unexpected error while processing query.

Dort wird der Peer von aussen abgefragt und meine Vermutung ist dass das Nebeneffekte hat, weil nach dieser Warning schon wiederhohlt das Crawlen gestoppt hat.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mi Sep 09, 2009 3:56 pm

SVN6307 nimmt hier keine Verbindungen mehr an, nach ca 6 Stunden crawling.

Code: Alles auswählen
I 2009/09/09 15:36:57 SEARCH resultWorker thread 0 terminated
D 2009/09/09 15:36:57 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/09 15:36:57 PLASMA Received 31 Entries 6 Words [wpcc6UDdVRhP .. wpclGGb-rURn]/2616712110753394220 from CHk1tMQFkR22:locutus/0.91006297, processed in 29 milliseconds, requesting 8/31 URLs, blocked 0 RWIs
I 2009/09/09 15:36:57 PLASMA Received 11 Entries 4 Words [wpcZLD-ifyNH .. wpclGGb-rURn]/2616713138329478056 from CHk1tMQFkR22:locutus/0.91006297, processed in 47 milliseconds, requesting 0/11 URLs, blocked 0 RWIs
I 2009/09/09 15:36:57 PLASMA Received 3 Entries 1 Words [Y9OlFZj5xbT9 .. Y9OlFZj5xbT9]/-3192441070971171864 from hUvYXnWm4S1L:dulcedo-TEOi/0.9100631, processed in 34 milliseconds, requesting 0/3 URLs, blocked 0 RWIs
D 2009/09/09 15:37:04 CRAWLER LOCALCRAWL[19526, 0, 0, 0]: URL=http://www.mister-wong.com/users/10446953/, initiator=Czf8mtXEXYVM, crawlOrder=true, depth=6, crawlDepth=3, must-match=.*, must-not-match=.*memberlist.*|.*previous.*|.*next.*|.*p=.*, permission=true
D 2009/09/09 15:37:09 CRAWLER problem loading http://www.mister-wong.com/users/10446953/: CRAWLER Rejecting URL 'http://www.mister-wong.com/users/10446953/'. URL is in blacklist.
D 2009/09/09 15:37:09 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/09 15:37:28 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/09 15:37:53 CRAWLER LOCALCRAWL[19525, 0, 0, 0]: URL=http://www.bild.de/BILD/sport/fussball/EM-2008/2008/06/05/deutsche-nationalmannschaft/jungs-siegt-jogi-loew-michael-ballack.html, initiator=Czf8mtXEXYVM, crawlOrder=true, depth=4, crawlDepth=3, must-match=.*, must-not-match=.*memberlist.*|.*previous.*|.*next.*|.*p=.*, permission=true
E 2009/09/09 15:39:04 SERVER receive interrupted - exception 2 = Read timed out
D 2009/09/09 15:39:23 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/09 15:41:14 CRAWLER LOCALCRAWL[19524, 0, 0, 0]: URL=http://www.rundschau-online.de/html/artikel/1246895320144.shtml, initiator=Czf8mtXEXYVM, crawlOrder=true, depth=3, crawlDepth=3, must-match=.*, must-not-match=.*memberlist.*|.*previous.*|.*next.*|.*p=.*, permission=true
D 2009/09/09 15:43:38 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/09 15:45:35 PLASMA Received 2 Entries 2 Words [f1iZMJTvEAjm .. f1wd7HqwcfRJ]/-4184177729182478780 from HYgT5grapqPJ:ubuntu/0.910063, processed in 419031 milliseconds, requesting 0/2 URLs, blocked 0 RWIs
D 2009/09/09 15:46:09 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/09 15:47:13 PARSER Charset transformation needed from 'UTF-8' to 'ISO-8859-1' for URL = http://www.rundschau-online.de/html/artikel/1246895320144.shtml
D 2009/09/09 16:01:13 CRAWLER omitting de-queue/remote: stack is empty
E 2009/09/09 16:04:47 SERVER receive interrupted - exception 2 = Read timed out
E 2009/09/09 16:04:47 SERVER receive interrupted - exception 2 = Read timed out


Vorher sind einige erors zu finden.
Code: Alles auswählen

D 2009/09/09 15:24:32 CRAWLER omitting de-queue/remote: stack is empty
I 2009/09/09 15:24:32 PLASMA Received 1 Entries 1 Words [RU-vMDMz4CtI .. RU-vMDMz4CtI]/-2093005366625812816 from inxtrrXW2h__:apfelmaennchen/0.9100631, processed in 0 milliseconds, requesting 0/1 URLs, blocked 0 RWIs
E 2009/09/09 15:24:32 YACY yacyClient.permissionTransfer error:The host did not accept the connection within timeout of 10000 ms
I 2009/09/09 15:24:32 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090708023150448.# Name=YaCy .cr.gz to yacy.dyndns.org:8000: no connection to remote address yacy.dyndns.org:8000; phase 1
I 2009/09/09 15:24:33 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090708023150448.# Name=YaCy .cr.gz to 84.38.74.230:8090: remote peer rejected transfer: denied
I 2009/09/09 15:24:33 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090708023150448.# Name=YaCy .cr.gz to 130.75.2.29:8080: remote peer rejected transfer: denied
I 2009/09/09 15:24:33 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090708023150448.# Name=YaCy .cr.gz to 134.107.24.49:8080: remote peer failed with transfer: transfer failure


E 2009/09/09 15:25:21 YACY yacyClient.queryUrlCount error asking peer 'ZZZ':java.net.SocketException: Connection reset
E 2009/09/09 15:25:21 YACY yacyClient.queryUrlCount error asking peer 'ZZZ':java.net.SocketException: Connection reset
I 2009/09/09 15:25:21 YACY hello: responded remote junior peer 'ZZZ' from 194.204.0.26
I 2009/09/09 15:25:21 YACY hello: responded remote peer 'ZZZ' [194.204.0.26] in 176 milliseconds
I 2009/09/09 15:25:21 PLASMA Received 4 Entries 4 Words [AaOf7MmL6nQQ .. AdUCXUsu-1hD]/341664145444402512 from MMMusbojoRw0:tiggerswelt_testsearch/0.9000613, processed in 1 milliseconds, requesting 0/3 URLs, blocked 0 RWIs
D 2009/09/09 15:25:21 CRAWLER omitting de-queue/remote: stack is empty



I 2009/09/09 15:30:10 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/010_owncr/CRG-A-20090629232733046.eYcdYBHYL5fZ.cr.gz to 92.200.87.92:8888: remote peer rejected transfer: denied
E 2009/09/09 15:30:11 YACY yacyClient.permissionTransfer error:Connection refused
I 2009/09/09 15:30:11 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090614083145253.uUv81PMZibfQ.cr.gz to kaskelix.de:8080: no connection to remote address kaskelix.de:8080; phase 1
E 2009/09/09 15:30:11 YACY yacyClient.permissionTransfer error:Connection refused
I 2009/09/09 15:30:11 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090614083145253.uUv81PMZibfQ.cr.gz to yacy.dyndns.org:8000: no connection to remote address yacy.dyndns.org:8000; phase 1
I 2009/09/09 15:30:11 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090614083145253.uUv81PMZibfQ.cr.gz to 84.151.162.31:8472: remote peer rejected transfer: denied
I 2009/09/09 15:30:11 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090614083145253.uUv81PMZibfQ.cr.gz to 130.75.2.29:8080: remote peer rejected transfer: denied
I 2009/09/09 15:30:11 PLASMA RankingDistribution - transmitted file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090614083145253.uUv81PMZibfQ.cr.gz to 141.52.175.54:8080 successfully in 0 seconds
D 2009/09/09 15:30:11 KELONDRO file '/media/ext4/b/yacy-b/DATA/INDEX/freeworld/NETWORK/newsProcessed.stack' closed.
D 2009/09/09 15:30:11 KELONDRO file '/media/ext4/b/yacy-b/DATA/INDEX/freeworld/NETWORK/newsPublished.stack' closed.
I 2009/09/09 15:30:11 YACY rulebasedUpdateInfo: not an automatic update selected
I 2009/09/09 15:30:11 RESOURCE OBSERVER The observer is out of order: dfUnix: Cannot run program "df": java.io.IOException: error=12, Cannot allocate memory



I 2009/09/09 15:32:04 PLASMA Received 122 URLs from peer ILKnAGb2o5it:Hermes-bk2/0.61605394 in 95 ms, blocked 9 URLs
W 2009/09/09 15:32:04 FILEHANDLER Unexpected error while processing query.
Session: Session_121.242.105.211:54710#1
Query:   /yacy/crawlReceipt.html
Client:  121.242.105.211
Reason:  java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
java.io.IOException: FileUploadException Processing of multipart/form-data request failed. Read timed out
   at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:913)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/09 15:32:04 SERVER receive interrupted - exception 2 = Connection reset
I 2009/09/09 15:32:04 PLASMA Received 48 URLs from peer 8NWub8_NTVvu:75-52-205-238-187dpnw35/0.61005247 in 287 ms, blocked 0 URLs
I 2009/09/09 15:32:05 PLASMA Received 13 Entries 3 Words [txq2AraCGHXZ .. txq_OPiumxjP]/3030536614181208544 from qlSmZGiwiEB7:whitecloud/0.9100631, processed in 2 milliseconds, requesting 0/13 URLs, blocked 0 RWIs
D 2009/09/09 15:32:05 CRAWLER omitting de-queue/remote: stack is empty


E 2009/09/09 15:36:08 YACY yacyClient.permissionTransfer error:Connection refused
I 2009/09/09 15:36:08 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090526205418885.ZIsGrA0UUrJA.cr.gz to kaskelix.de:8080: no connection to remote address kaskelix.de:8080; phase 1
E 2009/09/09 15:36:08 YACY yacyClient.permissionTransfer error:Connection refused
I 2009/09/09 15:36:08 PLASMA RankingDistribution - error transmitting file /media/ext4/b/yacy-b/DATA/RANKING/GLOBAL/014_othercr/CRG-A-20090526205418885.ZIsGrA0UUrJA.cr.gz to yacy.dyndns.org:8000: no connection to remote address yacy.dyndns.org:8000; phase 1
I
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mi Sep 09, 2009 4:10 pm

Selbe Version unter Windows, gleichzeitig heute morgen gestartet. Er ist noch erreichbar, keine blockerten Threads aber er crawlt nicht.
Die 2 logs um diesen Fehler herum sind angehängt.
Code: Alles auswählen
I 2009/09/09 12:03:51 PLASMA Received 52 Entries 18 Words [rAI5OpvOb0c9 .. rAN8YYVNUn8W]/-1394850586271383516 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3109 milliseconds, requesting 1/52 URLs, blocked 0 RWIs
I 2009/09/09 12:03:51 PLASMA Received 223 Entries 9 Words [j0NQjWOxlFJ5 .. j0NsiPSOuX2R]/-359095096093483744 from 148qJoTxccAZ:dulcedo/0.8100598, processed in 193988 milliseconds, requesting 53/223 URLs, blocked 0 RWIs
E 2009/09/09 12:03:51 HTTPD Unexpected Error ... (Software caused connection abort: recv failed), client = 85.216.61.209
java.net.SocketException: Software caused connection abort: recv failed
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.io.FilterInputStream.read(Unknown Source)
   at java.io.PushbackInputStream.read(Unknown Source)
   at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:204)
   at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
   at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
   at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:648)
   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
I 2009/09/09 12:03:51 PLASMA Received 85 Entries 9 Words [uMFElBvvsEpK .. uMFgah0i_YBS]/-1854002074433436384 from inxtrrXW2h__:apfelmaennchen/0.9100631, processed in 3393 milliseconds, requesting 39/85 URLs, blocked 0 RWIs
I 2009/09/09 12:03:52 PLASMA Received 118 Entries 20 Words [VRqKciJDvVpZ .. VRt6oxqXuIuJ]/1736255209264157468 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3220 milliseconds, requesting 37/118 URLs, blocked 0 RWIs
I 2009/09/09 12:03:52 PLASMA Received 225 Entries 8 Words [pPSkq9nCzZb8 .. pPVQ3qXrJQpy]/-1140696255557012500 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3405 milliseconds, requesting 11/224 URLs, blocked 0 RWIs
I 2009/09/09 12:03:52 PLASMA Received 191 Entries 9 Words [j0NboVtIrZFK .. j0NsiPSOuX2R]/-359098141190023472 from 148qJoTxccAZ:dulcedo/0.8100598, processed in 2670 milliseconds, requesting 25/191 URLs, blocked 0 RWIs
I 2009/09/09 12:03:52 PLASMA dhtTransferJob: no selection, too many entries in transmission cloud: 74
I 2009/09/09 12:03:52 PLASMA dhtTransferJob: result from dequeueing: true
I 2009/09/09 12:03:52 INDEX-TRANSFER-DISPATCHER starting new index transmission request to BjRvyVtPj8__
I 2009/09/09 12:03:52 PLASMA Received 214 Entries 8 Words [pPTYVUF9B_9C .. pPVQ3qXrJQpy]/-1140710456227699372 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3460 milliseconds, requesting 29/214 URLs, blocked 0 RWIs
E 2009/09/09 12:03:52 HTTPD Unexpected Error ... (Software caused connection abort: recv failed), client = 85.216.61.209
java.net.SocketException: Software caused connection abort: recv failed
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.io.FilterInputStream.read(Unknown Source)
   at java.io.PushbackInputStream.read(Unknown Source)
   at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:204)
   at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
   at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
   at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:648)
   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
I 2009/09/09 12:03:52 PLASMA Received 283 Entries 21 Words [pPRMwRdRYqe7 .. pPVQ3qXrJQpy]/-1140672089107968344 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3487 milliseconds, requesting 17/282 URLs, blocked 0 RWIs
I 2009/09/09 12:03:52 PLASMA Received 338 Entries 39 Words [pPQ7lFXiZpSy .. pPVQ3qXrJQpy]/-1140667368127590996 from bAQ1k6PNpagJ:sixcooler/0.9100631, processed in 3510 milliseconds, requesting 10/337 URLs, blocked 0 RWIs
E 2009/09/09 12:03:52 HTTPD Unexpected Error ... (Software caused connection abort: recv failed), client = 85.178.101.128
java.net.SocketException: Software caused connection abort: recv failed
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.io.FilterInputStream.read(Unknown Source)
   at java.io.PushbackInputStream.read(Unknown Source)
   at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:204)
   at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
   at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
   at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:648)
   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/09 12:03:52 HTTPD Unexpected Error ... (Software caused connection abort: recv failed), client = 85.178.101.128
java.net.SocketException: Software caused connection abort: recv failed
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.net.SocketInputStream.read(Unknown Source)
   at java.io.FilterInputStream.read(Unknown Source)
   at java.io.PushbackInputStream.read(Unknown Source)
   at org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:204)
   at org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
   at org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
   at org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
   at org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:648)
   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
I 2009/09/09 12:03:52 INDEX-TRANSFER-DISPATCHER Index transfer of 7 words [xjQbVXLbXQMP .. RjRvyVtPj8__] and 339 URLs to peer tp-guybrush242:YPPL8FIjhg4- in 197 seconds successful (0 words/s)
I 2009/09/09 12:03:52 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target YPPL8FIjhg4-/tp-guybrush242
I 2009/09/09 12:03:52 INDEX-TRANSFER-DISPATCHER starting new index transmission request to RjRvyVtPj8__
I 2009/09/09 12:03:52 PLASMA RankingDistribution - error transmitting file F:\YaCy\DATA\RANKING\GLOBAL\010_owncr\CRG-A-20090708004228097.# Name=YaCy .cr.gz to 141.52.175.54:8080: remote peer failed with transfer: transfer failure
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Index transfer of 8 words [xjRLCuMuSWvo .. VjRvyVtPj8__] and 343 URLs to peer KSBA-YaCy:grjH7GYu__lo in 5 seconds successful (1 words/s)
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target grjH7GYu__lo/KSBA-YaCy
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Transfer of chunk to myself-target
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER STORE: Chunk VjRvyVtPj8__ has FINISHED all transmissions!
I 2009/09/09 12:03:53 PLASMA Received 9 Entries 5 Words [2nKnNH74jM5T .. 2nMIUXFFiHTZ]/-3067936174744802492 from zb7wAwrBD5__:KIT01-09f/0.910063, processed in 5 milliseconds, requesting 0/9 URLs, blocked 0 RWIs
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from 1e9_1t8XkKjf:4o4/0.9100631 for URL 5XGb__nkfPBA:http://www.mainpost.de/nachrichten/wirtschaft/topthemen/?fCMS=406ca5d26e35395383ad60330bd6e55f
I 2009/09/09 12:03:53 PLASMA RankingDistribution - error transmitting file F:\YaCy\DATA\RANKING\GLOBAL\010_owncr\CRG-A-20090708004228097.# Name=YaCy .cr.gz to 141.52.175.26:8080: remote peer failed with transfer: io error
E 2009/09/09 12:03:53 YACY yacyClient.queryUrlCount error asking peer 'kyeldon':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms
I 2009/09/09 12:03:53 YACY hello: responded remote junior peer 'kyeldon' from 85.179.86.204
E 2009/09/09 12:03:53 YACY yacyClient.queryUrlCount error asking peer '192-168-178-20-47dpnw2':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms
I 2009/09/09 12:03:53 YACY hello: responded remote junior peer '192-168-178-20-47dpnw2' from 93.135.76.9
I 2009/09/09 12:03:53 YACY yacyClient.publishMySeed thread 'PublishSeed_KIT01-07f' contacted peer at 141.52.175.11:8080, received 10494 bytes, time = 5295 milliseconds
I 2009/09/09 12:03:53 YACY yacyClient.publishMySeed: Peer 'KIT01-07f' reported us as junior.
I 2009/09/09 12:03:53 YACY hello: responded remote peer 'kyeldon' [85.179.86.204] in 5107 milliseconds
I 2009/09/09 12:03:53 YACY hello: responded remote peer '192-168-178-20-47dpnw2' [93.135.76.9] in 5091 milliseconds
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.910063 for URL kWDSPNoJK6JC:http://www.mainpost.de/lokales/franken/weinkoenigin/?fCMS=53eb79f56e1db6cf7bc889f517afbcd8
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.910063 for URL YWEd941R8IiC:http://www.lr-online.de/mitmachen/horoskop/?mode=detail&xwert=2&bild=stier&fCMS=2df3385fbed6b6b03069a81fe7221875
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.910063 for URL sxhVcvVrlYWA:http://www.mainpost.de/sport/wuerzburg/asvrimpar/?fCMS=81c1ecd7fba07c4dabee977312ab673a
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Index transfer of 7 words [xjRC1CAIFIU9 .. NjRvyVtPj8__] and 362 URLs to peer KSBA-YaCy:grjH7GYu__lo in 6 seconds successful (1 words/s)
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Transfer finished of chunk to target grjH7GYu__lo/KSBA-YaCy
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.910063 for URL uvEhhc0DU1iA:http://www.lr-online.de/mediacenter/bilder/bilddetail/cme104213%2C1406810.html?SORT=PRIO&fCMS=9b907ba0e6399b8d94e9307a88bc5e90
I 2009/09/09 12:03:53 PLASMA crawlReceipt: RECEIVED RECEIPT from MhS7yC-_6Q__:KIT01-07f/0.910063 for URL DGhM2sRibhVA:http://www.mainpost.de/nachrichten/kulturwelt/dpakultur/?fCMS=c4203e762253ca9f543b0843013c157a
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER Transfer of chunk to myself-target
I 2009/09/09 12:03:53 INDEX-TRANSFER-DISPATCHER STORE: Chunk NjRvyVtPj8__ has FINISHED all transmissions!
I 2009/09/09 12:03:54 PLASMA RankingDistribution - error transmitting file F:\YaCy\DATA\RANKING\GLOBAL\010_owncr\CRG-A-20090708004228097.# Name=YaCy .cr.gz to 80.134.250.34:8090: remote peer rejected transfer: denied
I 2009/09/09 12:03:54 YACY publish: handshaked senior peer 'KIT01-07f' at 141.52.175.11:8080
I 2009/09/09 12:03:54 YACY PeerPing: I am accessible for 6 peer(s), not accessible for 1 peer(s).
I 2009/09/09 12:03:54 YACY PeerPing: myType is senior

Dateianhänge
090909b.zip
logfiles
(402.73 KiB) 59-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Do Sep 10, 2009 5:37 am

SVN6307, debian. Er crawlt und ich suche von einem anderen Server aus per Portalsuche, dann kracht es gewaltig. Peer arbeitet noch aber port nicht mehr erreichbar. Speicher hat er 5 von 6GB belegt.

Code: Alles auswählen
I 2009/09/10 06:08:46 YACY REMOTE SEARCH - no answer from remote peer -ZAAFt45tfoC:KSBA-BSCW
D 2009/09/10 06:08:46 CRAWLER LOCALCRAWL[32984, 0, 0, 0]: URL=http://www.flickr.com/photos/melissawitcher/2829813302/comment72157622017751303/, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
E 2009/09/10 06:12:13 SERVER command execution, target exception null for client 84.179.121.227
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
   at org.apache.commons.fileupload.MultipartStream.<init>(MultipartStream.java:337)
   at org.apache.commons.fileupload.MultipartStream.<init>(MultipartStream.java:365)
   at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.<init>(FileUploadBase.java:938)
   at org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
   at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
   at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:910)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   ... 5 more
E 2009/09/10 06:12:13 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
   at org.apache.commons.fileupload.MultipartStream.<init>(MultipartStream.java:337)
   at org.apache.commons.fileupload.MultipartStream.<init>(MultipartStream.java:365)
   at org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.<init>(FileUploadBase.java:938)
   at org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
   at org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
   at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:910)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/10 06:12:13 SERVER command execution, target exception null for client 85.179.202.129
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:13 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:13 SERVER command execution, target exception null for client 85.179.202.129
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:13 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:13 SERVER command execution, IO exception Broken pipe for client 85.179.202.129
java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
   at java.net.SocketOutputStream.write(SocketOutputStream.java:124)
   at de.anomic.server.serverCore.send(serverCore.java:865)
   at de.anomic.server.serverCore$Session.writeLine(serverCore.java:560)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:763)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/10 06:12:13 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.awt.image.Raster.createWritableRaster(Raster.java:994)
   at java.awt.image.Raster.createWritableRaster(Raster.java:938)
   at java.awt.image.BufferedImage.getData(BufferedImage.java:1401)
   at com.sun.imageio.plugins.png.PNGImageWriter.encodePass(PNGImageWriter.java:806)
   at com.sun.imageio.plugins.png.PNGImageWriter.write_IDAT(PNGImageWriter.java:930)
   at com.sun.imageio.plugins.png.PNGImageWriter.write(PNGImageWriter.java:1146)
   at javax.imageio.ImageWriter.write(ImageWriter.java:598)
   at javax.imageio.ImageIO.write(ImageIO.java:1479)
   at javax.imageio.ImageIO.write(ImageIO.java:1565)
   at de.anomic.ymage.ymageMatrix.exportImage(ymageMatrix.java:706)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:542)
   at de.anomic.http.server.HTTPDFileHandler.doGet(HTTPDFileHandler.java:246)
   at de.anomic.http.server.HTTPDemon.GET(HTTPDemon.java:493)
   ... 5 more
E 2009/09/10 06:12:13 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.awt.image.Raster.createWritableRaster(Raster.java:994)
   at java.awt.image.Raster.createWritableRaster(Raster.java:938)
   at java.awt.image.BufferedImage.getData(BufferedImage.java:1401)
   at com.sun.imageio.plugins.png.PNGImageWriter.encodePass(PNGImageWriter.java:806)
   at com.sun.imageio.plugins.png.PNGImageWriter.write_IDAT(PNGImageWriter.java:930)
   at com.sun.imageio.plugins.png.PNGImageWriter.write(PNGImageWriter.java:1146)
   at javax.imageio.ImageWriter.write(ImageWriter.java:598)
   at javax.imageio.ImageIO.write(ImageIO.java:1479)
   at javax.imageio.ImageIO.write(ImageIO.java:1565)
   at de.anomic.ymage.ymageMatrix.exportImage(ymageMatrix.java:706)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:542)
   at de.anomic.http.server.HTTPDFileHandler.doGet(HTTPDFileHandler.java:246)
   at de.anomic.http.server.HTTPDemon.GET(HTTPDemon.java:493)
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/10 06:12:13 FILEHANDLER INTERNAL ERROR: java.lang.reflect.InvocationTargetException:null target exception at /media/ext4/b/yacy-b/htroot/yacy/transferRWI.class: java.lang.OutOfMemoryError: GC overhead limit exceeded:GC overhead limit exceeded
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:52 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:52 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:52 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:52 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
D 2009/09/10 06:12:52 CRAWLER omitting de-queue/remote: stack is empty
W 2009/09/10 06:12:52 Balancer no profile entry for handle O0tOmQmIchzA
W 2009/09/10 06:12:52 Balancer no profile entry for handle t7iLA7GdLX5C
D 2009/09/10 06:12:52 CRAWLER LOCALCRAWL[32983, 0, 0, 0]: URL=http://die-linke.de/politik/international/ausgewaehlte_informationen_zu_internationalen_themen/archiv/2008/april/kategorie/international/, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
E 2009/09/10 06:12:52 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:12:52 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
W 2009/09/10 06:12:52 FILEHANDLER Unexpected error while processing query.
Session: Session_85.178.127.148:54590#9
Query:   /yacy/transferRWI.html
Client:  85.178.127.148
Reason:  java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
D 2009/09/10 06:14:47 CRAWLER LOCALCRAWL[32980, 0, 0, 0]: URL=http://www.flickr.com/photos/tags/hills/, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
D 2009/09/10 06:19:52 CRAWLER omitting de-queue/remote: stack is empty
E 2009/09/10 06:19:52 SERVER command execution, target exception null for client 85.216.61.209
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:19:52 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:19:52 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 SERVER receive interrupted - exception 2 = Read timed out
E 2009/09/10 06:21:09 SERVER receive interrupted - exception 2 = Read timed out
E 2009/09/10 06:21:09 SERVER receive interrupted - exception 2 = Read timed out
D 2009/09/10 06:21:09 CRAWLER problem loading http://www.flickr.com/photos/melissawitcher/2829813302/comment72157622017751303/: The host did not accept the connection within timeout of 9000 ms
E 2009/09/10 06:21:09 FILEHANDLER INTERNAL ERROR: java.lang.reflect.InvocationTargetException:null target exception at /media/ext4/b/yacy-b/htroot/Status.class: java.lang.OutOfMemoryError: GC overhead limit exceeded:GC overhead limit exceeded
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doGet(HTTPDFileHandler.java:246)
   at de.anomic.http.server.HTTPDemon.GET(HTTPDemon.java:493)
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 BUSYTHREAD Runtime Error in serverInstantThread.job, thread 'de.anomic.yacy.yacyCore.peerPing': null; target exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 FILEHANDLER INTERNAL ERROR: java.lang.reflect.InvocationTargetException:null target exception at /media/ext4/b/yacy-b/htroot/yacy/transferRWI.class: java.lang.OutOfMemoryError: GC overhead limit exceeded:GC overhead limit exceeded
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 SERVER command execution, target exception null for client 0:0:0:0:0:0:0:1
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at de.anomic.kelondro.util.FileUtils.copy(FileUtils.java:244)
   at de.anomic.http.server.HTTPDemon.sendRespondError(HTTPDemon.java:1235)
   at de.anomic.http.server.HTTPDemon.sendRespondError(HTTPDemon.java:1061)
   at de.anomic.http.server.HTTPDemon.UNKNOWN(HTTPDemon.java:452)
   ... 6 more
E 2009/09/10 06:21:09 HTTPD Unexpected Error. java.net.SocketException
java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
   at de.anomic.kelondro.util.FileUtils.copy(FileUtils.java:244)
   at de.anomic.http.server.HTTPDemon.sendRespondError(HTTPDemon.java:1235)
   at de.anomic.http.server.HTTPDemon.sendRespondError(HTTPDemon.java:1061)
   at de.anomic.http.server.HTTPDemon.UNKNOWN(HTTPDemon.java:452)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
E 2009/09/10 06:21:09 SERVER command execution, IO exception Broken pipe for client 0:0:0:0:0:0:0:1
java.net.SocketException: Broken pipe
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
   at java.net.SocketOutputStream.write(SocketOutputStream.java:124)
   at de.anomic.server.serverCore.send(serverCore.java:865)
   at de.anomic.server.serverCore$Session.writeLine(serverCore.java:560)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:763)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
W 2009/09/10 06:21:09 FILEHANDLER Unexpected error while processing query.
Session: Session_84.60.138.12:35390#7
Query:   /yacy/transferRWI.html
Client:  84.60.138.12
Reason:  java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 FILEHANDLER INTERNAL ERROR: java.lang.reflect.InvocationTargetException:null target exception at /media/ext4/b/yacy-b/htroot/yacy/transferRWI.class: java.lang.OutOfMemoryError: GC overhead limit exceeded:GC overhead limit exceeded
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 BUSYTHREAD Runtime Error in serverInstantThread.job, thread 'de.anomic.search.Switchboard.cleanupJob': null; target exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
W 2009/09/10 06:21:09 FILEHANDLER Unexpected error while processing query.
Session: Session_141.52.175.47:60938#60
Query:   /yacy/transferRWI.html
Client:  141.52.175.47
Reason:  java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.http.server.HTTPDFileHandler.invokeServlet(HTTPDFileHandler.java:1179)
   at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:757)
   at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
   at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 SERVER command execution, target exception null for client 85.216.61.209
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
   at de.anomic.server.serverCore$Session.run(serverCore.java:619)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
E 2009/09/10 06:21:09 HTTPD Unexpected Error. java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: GC overhead limit exceeded
D 2009/09/10 06:21:11 CRAWLER LOCALCRAWL[32979, 0, 0, 0]: URL=http://www.familien-partei-deutschlands.de/public/index/regional/bergischesland/leverkusen, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=2, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/10 06:21:17 PLASMA Received 6 Entries 3 Words [swHrvWDd8lBF .. swKHzaQdBJ7e]/3178100335292150024 from oXz0bFOoZqFM:Pandora/0.9100631, processed in 771702 milliseconds, requesting 0/6 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 PLASMA Received 3 Entries 1 Words [iEHpdi-sSpJ1 .. iEHpdi-sSpJ1]/-4504996478190876296 from 8TB5J3xHaIVX:Hermes/0.910063, processed in 5 milliseconds, requesting 0/3 URLs, blocked 0 RWIs
D 2009/09/10 06:21:17 CRAWLER LOCALCRAWL[32978, 0, 0, 0]: URL=http://www.rp-online.de/personen/Rafael--Marquez, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=4, crawlDepth=5, must-match=.*, must-not-match=, permission=true
D 2009/09/10 06:21:17 CRAWLER problem loading http://www.familien-partei-deutschlands.de/public/index/regional/bergischesland/leverkusen: REJECTED WRONG STATUS TYPE '404 Not Found' for URL http://www.familien-partei-deutschlands.de/public/index/regional/bergischesland/leverkusen
I 2009/09/10 06:21:17 PLASMA Received 1 Entries 1 Words [XIRZr7sdSsW9 .. XIRZr7sdSsW9]/-2929079779815356064 from GuTqcE_nEpAA:sixcooler1/0.9100631, processed in 57 milliseconds, requesting 0/1 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 PLASMA Received 14 Entries 1 Words [0prznHWJMF4Y .. 0prznHWJMF4Y]/2039713363048371864 from 8TB5J3xHaIVX:Hermes/0.910063, processed in 64 milliseconds, requesting 0/14 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 PLASMA Received 17 Entries 1 Words [1VZpgi8ajPcu .. 1VZpgi8ajPcu]/1941273043927361608 from HYgT5grapqPJ:ubuntu/0.910063, processed in 5648 milliseconds, requesting 0/17 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 PLASMA Received 8 Entries 1 Words [1VZpgi8ajPcu .. 1VZpgi8ajPcu]/1941273043927361608 from HYgT5grapqPJ:ubuntu/0.910063, processed in 5670 milliseconds, requesting 0/8 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 PLASMA Received 4 Entries 1 Words [A0bD_ukA_wTe .. A0bD_ukA_wTe]/286150434998038336 from GuTqcE_nEpAA:sixcooler1/0.9100631, processed in 750545 milliseconds, requesting 0/4 URLs, blocked 0 RWIs
D 2009/09/10 06:21:17 CRAWLER LOCALCRAWL[32977, 0, 0, 0]: URL=http://www.npd-fraktion-sachsen.de/index.php?verweis=3%2C1%2C1&drucksache=pressemitteilungen&drucksacheid=605, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/10 06:21:17 PLASMA Received 1 Entries 1 Words [-mqlMKtibZJe .. -mqlMKtibZJe]/605359994160871416 from HIHQHn5X5HKO:geo-snap/0.910063, processed in 505200 milliseconds, requesting 1/1 URLs, blocked 0 RWIs
I 2009/09/10 06:21:17 YACY hello: responded remote peer 'geo-snap' [62.75.241.201] in 87 milliseconds
I 2009/09/10 06:21:17 YACY hello: responded remote peer 'geo-snap' [62.75.241.201] in 4005 milliseconds
I 2009/09/10 06:21:17 YACY hello: responded remote peer 'geo-snap' [62.75.241.201] in 4003 milliseconds
D 2009/09/10 06:21:17 CRAWLER LOCALCRAWL[32976, 0, 0, 0]: URL=http://www.spiegel.de/wissenschaft/natur/0%2C1518%2C477257%2C00.html, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
W 2009/09/10 06:21:17 YACY transferRWI: blocked URL hash 'NAbWKQ0SfFZe' (the urlhash 'NAbWKQ0SfFZe' is local, but local addresses are not accepted) from peer ILKnAGb2o5it:Hermes-bk2/0.61605394; peer is suspected to be a spam-peer (or something is wrong)
D 2009/09/10 06:21:17 CRAWLER LOCALCRAWL[32975, 0, 0, 0]: URL=http://www.style.com/peopleparties/celebritysearch/person1744, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=5, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/10 06:21:19 YACY hello: responded remote peer 'sixcooler' [85.178.127.148] in 2030 milliseconds
I 2009/09/10 06:21:19 YACY hello: responded remote peer 'sixcooler' [85.178.127.148] in 2047 milliseconds
D 2009/09/10 06:21:19 CRAWLER LOCALCRAWL[32974, 0, 0, 0]: URL=https://www.die-linke.de/presse/presseerklaerungen/presseerklaerungen/, initiator=Czf8mtXEXYVM, crawlOrder=false, depth=3, crawlDepth=5, must-match=.*, must-not-match=, permission=true
I 2009/09/10 06:21:19 PLASMA Received 14 Entries 8 Words [ssghdOH5NPyJ .. ssiLh-0C_SXX]/3186247319395967548 from inxtrrXW2h__:apfelmaennchen/0.9100631, processed in 7649 milliseconds, requesting 0/14 URLs, blocked 0 RWIs
I 2009/09/10 06:21:19 YACY hello: responded remote peer 'sixcooler' [85.178.127.148] in 2042 milliseconds

Dateianhänge
090910a.zip
(188.86 KiB) 63-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Do Sep 17, 2009 7:46 pm

Ca. 16:00 hört er auf zu crawlen, über 200k URLs wären noch zu bearbeiten. DHT ist abgeschaltet.
Code: Alles auswählen
************* Start Thread Dump Thu Sep 17 20:45:41 CEST 2009 *******************

YaCy Version: 0.91/6318
Total Memory = 810090496
Used  Memory = 493922536
Free  Memory = 316167960


THREADS WITH STATES: BLOCKED


THREADS WITH STATES: RUNNABLE

Dateianhänge
090917.zip
(1.51 MiB) 56-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Do Sep 17, 2009 8:08 pm

die logs sagen mir nichts, da ist nichts drin was hilft. Wenn du eine Exception beobachten könntest wäre das hilfreich, die bekommst du aber nur mit der Startoption -l und ein Schreiben der Ausgabe in eine Datei
./startYACY.sh -l > yacy.log

an einer anderen Stelle habe ich gesehen dass mal das Crawl Profile nicht gefunden werden kann. Dann leert sich aber die Queue schnell.
Ist den bei dir die Queue noch gefüllt und er macht trotzdem nichts? dann hilft vielleicht ein Thread dump per kill -3, der würde im Fall des Starts mit Optionen wie oben ebenfalls ins yacy.log gehen.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Do Sep 17, 2009 8:16 pm

Der Peer fährt nun nicht mehr hoch, ich hatte über den browser restart gemacht.

Code: Alles auswählen
I 2009/09/17 20:54:22 HeapReader finished index generation for /home/yacy/y2/yacy/DATA/INDEX/freeworld/TEXT/RICELL/index.20090917121400070.blob, 47807 entries, 0 gaps.
I 2009/09/17 20:54:23 HeapReader saturation of index.20090917135749097.blob.1B2M2Y8AsgTp.idx: keylength = 5, vallength = 3, possible saving: 0 MB
S 2009/09/17 20:54:23 YACY CORE INITIALIZED
E 2009/09/17 20:54:23 STARTUP FATAL ERROR: null
java.lang.NullPointerException
   at de.anomic.search.Switchboard.<init>(Unknown Source)
   at yacy.startup(Unknown Source)
   at yacy.main(Unknown Source)
S 2009/09/17 20:54:23 SHUTDOWN goodbye. (this is the last line)
S 2009/09/17 20:54:23 YACY thread 'de.anomic.search.Switchboard.loadSeedLists' deployed, starting job.
I 2009/09/17 20:54:23 YACY BOOTSTRAP: 61 seeds known from previous run
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Do Sep 17, 2009 8:22 pm

mist. Wenn da nicht die Zeile drin steht wo der Fehler vorkommt kann ich auch nichts machen. Daher habe ich mal versucht da ein debug-flag in SVN 6321 einzubauen, das du manuell drüberinstallieren müsstest und dann noch mal probieren bitte.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Do Sep 17, 2009 8:49 pm

Orbiter hat geschrieben:an einer anderen Stelle habe ich gesehen dass mal das Crawl Profile nicht gefunden werden kann. Dann leert sich aber die Queue schnell.
Ist den bei dir die Queue noch gefüllt und er macht trotzdem nichts? dann hilft vielleicht ein Thread dump per kill -3, der würde im Fall des Starts mit Optionen wie oben ebenfalls ins yacy.log gehen.


Ja und nein deswegen wollte ich ja restart machen, dann crawlt er jeweils weiter bis er nach einiger Zeit wieder aufhört, so hatte ich mir bis jetzt über cron geholfen. In dem einen Fall hatte ich dazugeschrieben dass der manuelle crawl nicht ging weil wohl das profil weg war, weil kein bookmark angelegt. anderer bug.
kill -3 zeigt bei mir unter debian gar nichts an. Noch nie als ich das versucht hatte.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Do Sep 17, 2009 9:10 pm

kill -3 geht nur bei Start mit -l
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon Orbiter » Do Sep 17, 2009 9:47 pm

der hier viewtopic.php?p=17442#p17442 kann dein Problem auch beeinflusst haben. Hast du vielleicht besonders viel RAM zugewiesen? Bei mehr als 4GB aktivieren sich überall die Table RAM copies, die durch o.g. Bug gefärdet waren. Fix ist in SVN 6322
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Fr Sep 18, 2009 2:10 am

Zum Verständnis: ich teste jede neue Version mit 2GB JVMs unter Win und Debian, und mit 6GB JVM ebenfalls unter Win und Debian. Alles mit 64-Bit Versionen. Zusätzlich noch einenn Debian-Peer mit 1GB JVM und Debian Lenny32. 3xxxMB bringt unter Win kaum Vorteile deswegen teste ich das nicht, wieviel Speicher der Peer zur Verfügung und frei hat steht im Thread-Dump. Alles Senior-Peers aber die kleineren ohne DHT, Clusterfunktion teste ich mit 2 von den Robinson-Peers, Probleme habe ich geschildert.

Peers <4GB zeigen das Verhalten dass sie blockieren, die mit mehr als 4GB erzeugen hauptsächlich die OOMs aus dem anderen Thread. Dass die auch blockieren kann sein, so lange laufen sie nicht.
Ein Peer mit 6GB unter 8.1 SVN5978 unter Win funktioniert einwandfrei, einer mit 2GB unter Debian unter SVN6167 mit kleineren Macken ebenfalls. Die äussern sich auch schon in einem Hängenbleiben von Crawls aber lange nicht so häufig wie in späteren Versionen.

Er schreibt nun das logfile über -l in ./yacy.log, das wird allerdings durch alle I und D-Meldungen sehr gross. Warnings und Errors erscheinen aber nur auf der Konsole nicht im neuen log, ist das Absicht? (SVN6322)
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Sep 20, 2009 10:03 am

Konsolenausgabe SVN6331, er macht nur noch sehr holprig weiter, ich ergänze fast in Echtzeit.
Was ist nun in yacy.log interessant? In dem Fall 1GB zu durchsuchen.
Code: Alles auswählen
alt-arzthaftungsrecht-muenster-kim.html?9bca3a0242c428da25b89b2d6b1d955a=c319c7bdc23b0ca6e12eec4f77222ed0
W 2009/09/20 09:32:38 PLASMA Unable to parse the resource 'http://www.edhardysell.com/'. No resource content available (1) source == null, url = http://www.edhardysell.com/; url = http://www.edhardysell.com/
W 2009/09/20 09:32:38 PLASMA Unable to parse the resource 'http://www.digitaldruck-web-to-print.de/buch-aus-word-guenstig-drucken.html'. No resource content available (1) source == null, url = http://www.digitaldruck-web-to-print.de/buch-aus-word-guenstig-drucken.html; url = http://www.digitaldruck-web-to-print.de/buch-aus-word-guenstig-drucken.html
W 2009/09/20 09:32:39 PLASMA Unable to parse the resource 'http://stauden.garten-arkaden.de/product_info.php?info=p3167'. No resource content available (1) source == null, url = http://stauden.garten-arkaden.de/product_info.php?info=p3167; url = http://stauden.garten-arkaden.de/product_info.php?info=p3167
W 2009/09/20 09:32:39 PLASMA Unable to parse the resource 'http://garten.garten-arkaden.de/product_info.php?info=p2506'. No resource content available (1) source == null, url = http://garten.garten-arkaden.de/product_info.php?info=p2506; url = http://garten.garten-arkaden.de/product_info.php?info=p2506
W 2009/09/20 09:32:39 PLASMA Unable to parse the resource 'http://www.revision3.com/'. No resource content available (1) source == null, url = http://www.revision3.com/; url = http://www.revision3.com/
W 2009/09/20 09:32:39 PLASMA Unable to parse the resource 'http://www.barelypolitical.com/'. No resource content available (1) source == null, url = http://www.barelypolitical.com/; url = http://www.barelypolitical.com/
W 2009/09/20 09:32:39 PLASMA Unable to parse the resource 'http://9cf9.com/'. No resource content available (1) source == null, url = http://9cf9.com/; url = http://9cf9.com/
E 2009/09/20 09:32:40 PARSER Unable to parse 'http://www.amazon.fr/exec/obidos/ASIN/B0000072SD/typepad-21'. Binary data found in resource
W 2009/09/20 09:32:40 PLASMA Unable to parse the resource 'http://www.amazon.fr/exec/obidos/ASIN/B0000072SD/typepad-21'. Binary data found in resource; url = http://www.amazon.fr/exec/obidos/ASIN/B0000072SD/typepad-21
W 2009/09/20 09:32:40 PLASMA Unable to parse the resource 'http://www.arrl.org/news/features/2004/03/30/1/'. No resource content available (1) source == null, url = http://www.arrl.org/news/features/2004/03/30/1/; url = http://www.arrl.org/news/features/2004/03/30/1/
W 2009/09/20 09:32:40 PLASMA Unable to parse the resource 'http://www.gizmag.com/go/5933/'. No resource content available (1) source == null, url = http://www.gizmag.com/go/5933/; url = http://www.gizmag.com/go/5933/
W 2009/09/20 09:32:40 PLASMA Unable to parse the resource 'http://www.phonematchup.com/phones.php?pid=44'. No resource content available (1) source == null, url = http://www.phonematchup.com/phones.php?pid=44; url = http://www.phonematchup.com/phones.php?pid=44
java.lang.OutOfMemoryError: Java heap space
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
E 2009/09/20 09:32:41 BLOCKINGTHREAD Runtime Error in serverInstantThread.job, thread 'java.lang.reflect.Method.parseDocument.27': null; target exception: Java heap space
java.lang.OutOfMemoryError: Java heap space
W 2009/09/20 09:32:41 PLASMA Unable to parse the resource 'http://slashdot.org/article.pl?sid=04/07/07/0740232'. No resource content available (1) source == null, url = http://slashdot.org/article.pl?sid=04/07/07/0740232; url = http://slashdot.org/article.pl?sid=04/07/07/0740232
W 2009/09/20 09:32:42 PLASMA Unable to parse the resource 'http://amygrindhouse.com/paris-hilton-sued-class-action-lawsuit.html'. No resource content available (1) source == null, url = http://amygrindhouse.com/paris-hilton-sued-class-action-lawsuit.html; url = http://amygrindhouse.com/paris-hilton-sued-class-action-lawsuit.html
W 2009/09/20 09:32:42 PLASMA Unable to parse the resource 'http://www.macnews.de/'. No resource content available (1) source == null, url = http://www.macnews.de/; url = http://www.macnews.de/
W 2009/09/20 09:32:43 PLASMA Unable to parse the resource 'http://www.martinbrotzler.de/modules/rssc/single_link.php?lid=143'. No resource content available (1) source == null, url = http://www.martinbrotzler.de/modules/rssc/single_link.php?lid=143; url = http://www.martinbrotzler.de/modules/rssc/single_link.php?lid=143
W 2009/09/20 09:32:43 PLASMA Unable to parse the resource 'http://terrassenmoebel.garten-arkaden.de/product_info.php?info=p9497'. No resource content available (1) source == null, url = http://terrassenmoebel.garten-arkaden.de/product_info.php?info=p9497; url = http://terrassenmoebel.garten-arkaden.de/product_info.php?info=p9497
W 2009/09/20 09:32:43 PLASMA Unable to parse the resource 'http://gartenhaus.garten-arkaden.de/product_info.php?info=p12467'. No resource content available (1) source == null, url = http://gartenhaus.garten-arkaden.de/product_info.php?info=p12467; url = http://gartenhaus.garten-arkaden.de/product_info.php?info=p12467
W 2009/09/20 09:32:43 PLASMA Unable to parse the resource 'http://www.stumbleupon.com/submit?url=http:/www.wired.com/threatlevel/2009/09/classified-material/&'. No resource content available (1) source == null, url = http://www.stumbleupon.com/submit?url=http:/www.wired.com/threatlevel/2009/09/classified-material/&; url = http://www.stumbleupon.com/submit?url=http:/www.wired.com/threatlevel/2009/09/classified-material/&
W 2009/09/20 09:32:44 PLASMA Unable to parse the resource 'http://gartendeko.garten-arkaden.de/'. No resource content available (1) source == null, url = http://gartendeko.garten-arkaden.de/; url = http://gartendeko.garten-arkaden.de/
W 2009/09/20 09:32:44 PLASMA Unable to parse the resource 'http://pflanzenschutz.garten-arkaden.de/product_info.php?info=p3748'. No resource content available (1) source == null, url = http://pflanzenschutz.garten-arkaden.de/product_info.php?info=p3748; url = http://pflanzenschutz.garten-arkaden.de/product_info.php?info=p3748
W 2009/09/20 09:32:44 PLASMA Unable to parse the resource 'http://www.rechtsanwalt-kanzleimarketing.de/index.php/rechtsanwalt-arbeitsrecht-muenster.html'. No resource content available (1) source == null, url = http://www.rechtsanwalt-kanzleimarketing.de/index.php/rechtsanwalt-arbeitsrecht-muenster.html; url = http://www.rechtsanwalt-kanzleimarketing.de/index.php/rechtsanwalt-arbeitsrecht-muenster.html
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://blumenstrauss.garten-arkaden.de/'. No resource content available (1) source == null, url = http://blumenstrauss.garten-arkaden.de/; url = http://blumenstrauss.garten-arkaden.de/
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://outdoor.garten-arkaden.de/product_info.php?info=p10283'. No resource content available (1) source == null, url = http://outdoor.garten-arkaden.de/product_info.php?info=p10283; url = http://outdoor.garten-arkaden.de/product_info.php?info=p10283
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://www.emmys.tv/media'. No resource content available (1) source == null, url = http://www.emmys.tv/media; url = http://www.emmys.tv/media
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://pflanzen.garten-arkaden.de/product_info.php?info=p30530'. No resource content available (1) source == null, url = http://pflanzen.garten-arkaden.de/product_info.php?info=p30530; url = http://pflanzen.garten-arkaden.de/product_info.php?info=p30530
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://www.blogger.com/rearrange?blogID=1170270570597612047&widgetType=HTML&widgetId=HTML14&action=editWidget'. No resource content available (1) source == null, url = http://www.blogger.com/rearrange?blogID=1170270570597612047&widgetType=HTML&widgetId=HTML14&action=editWidget; url = http://www.blogger.com/rearrange?blogID=1170270570597612047&widgetType=HTML&widgetId=HTML14&action=editWidget
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://gartentechnik.garten-arkaden.de/'. No resource content available (1) source == null, url = http://gartentechnik.garten-arkaden.de/; url = http://gartentechnik.garten-arkaden.de/
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://www.ettlingen.de/servlet/PB/menu/1281832_l1_pprintPDF_yno/print.pdf?ContentType=pdf'. No resource content available (1) source == null, url = http://www.ettlingen.de/servlet/PB/menu/1281832_l1_pprintPDF_yno/print.pdf?ContentType=pdf; url = http://www.ettlingen.de/servlet/PB/menu/1281832_l1_pprintPDF_yno/print.pdf?ContentType=pdf
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://kuebelpflanzen.garten-arkaden.de/product_info.php?info=p2267'. No resource content available (1) source == null, url = http://kuebelpflanzen.garten-arkaden.de/product_info.php?info=p2267; url = http://kuebelpflanzen.garten-arkaden.de/product_info.php?info=p2267
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://gewaechshaus.garten-arkaden.de/'. No resource content available (1) source == null, url = http://gewaechshaus.garten-arkaden.de/; url = http://gewaechshaus.garten-arkaden.de/
W 2009/09/20 09:32:45 PLASMA Unable to parse the resource 'http://stauden.garten-arkaden.de/'. No resource content available (1) source == null, url = http://stauden.garten-arkaden.de/; url = http://stauden.garten-arkaden.de/
W 2009/09/20 09:32:46 PLASMA Unable to parse the resource 'http://garten.garten-arkaden.de/'. No resource content available (1) source == null, url = http://garten.garten-arkaden.de/; url = http://garten.garten-arkaden.de/
W 2009/09/20 09:33:34 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 09:36:54 SERVER receive interrupted - exception 2 = Connection reset
W 2009/09/20 09:38:34 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 09:43:34 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 09:43:55 YACY yacyClient.queryUrlCount error asking peer '192-168-1-2-78dpnw65':java.net.ConnectException: Connection refused
W 2009/09/20 09:48:35 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 09:53:35 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 09:57:30 YACY yacyClient.queryUrlCount error asking peer 'phat':org.apache.commons.httpclient.NoHttpResponseException: The server 141.41.64.246 failed to respond
W 2009/09/20 09:58:35 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 10:00:55 YACY yacyClient.queryUrlCount error asking peer 'KSBA-YaCy':java.net.SocketTimeoutException: Read timed out
E 2009/09/20 10:02:55 YACY yacyClient.queryUrlCount error asking peer 'nullserver':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 10:03:35 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:08:36 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:13:36 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:18:36 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:23:36 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 10:25:52 YACY yacyClient.queryUrlCount error asking peer 'emx-lon-uk-01':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 10:28:37 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 10:30:39 YACY yacyClient.queryUrlCount error asking peer 'phat':org.apache.commons.httpclient.NoHttpResponseException: The server 141.41.64.246 failed to respond
E 2009/09/20 10:30:46 YACY yacyClient.queryUrlCount error asking peer '192-168-1-3-332dpnw97':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms
E 2009/09/20 10:32:59 YACY yacyClient.queryUrlCount error asking peer 'debian-suche':java.net.SocketTimeoutException: Read timed out
E 2009/09/20 10:33:06 YACY yacyClient.queryUrlCount error asking peer 'bulldog2':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 10:33:37 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:38:37 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 10:40:01 YACY yacyClient.queryUrlCount error asking peer '192-168-123-1-31dpnw3':java.net.ConnectException: Connection refused
E 2009/09/20 10:40:01 YACY yacyClient.queryUrlCount error asking peer '192-168-123-1-31dpnw3':java.net.ConnectException: Connection refused
E 2009/09/20 10:40:10 YACY yacyClient.queryUrlCount error asking peer 'clondike3':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 10:43:37 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:48:38 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 10:53:38 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 10:54:04 YACY yacyClient.queryUrlCount error asking peer '192-168-6-101-328dpnw99':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms

W 2009/09/20 10:58:38 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 11:05:20 YACY yacyClient.queryUrlCount error asking peer 'mortenoesterlundjoergensen':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 11:08:39 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 11:10:58 YACY yacyClient.queryUrlCount error asking peer '192-168-1-3-332dpnw97':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms
W 2009/09/20 11:13:39 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 11:17:31 YACY yacyClient.queryUrlCount error asking peer 'fastbull-yacy':java.net.SocketException: Connection reset
W 2009/09/20 11:18:39 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 11:23:39 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException

W 2009/09/20 11:28:40 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
W 2009/09/20 11:29:44 ReferenceContainerArray timout in index retrieval (2): 3 tables searched. timeout = 1000
E 2009/09/20 11:31:52 YACY yacyClient.queryUrlCount error asking peer 'hkj':java.net.ConnectException: Connection refused
E 2009/09/20 11:31:53 YACY yacyClient.queryUrlCount error asking peer 'KSBA-YaCy':java.net.SocketTimeoutException: Read timed out
W 2009/09/20 11:33:40 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException
E 2009/09/20 11:33:48 YACY yacyClient.queryUrlCount error asking peer '192-168-1-2-250dpnw33':org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 5000 ms

dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » So Sep 20, 2009 11:40 am

hast du eigentlich short memory cycles in PerformanceQueues_p.html?
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Sep 20, 2009 12:01 pm

Was ich geändert habe ist die Puffergrösse von 100k auf 30k, der Rest müsste Standard sein.
YaCy Version: 0.91/6331
Assigned Memory = 745668608
Used Memory = 692287176
Available Memory = 53381432

1,3mio links / 2,9mio worte

Hier ist er nun völlig idle, weil kein DHT und crawl blockiert.
Dateianhänge
yacy_090920a.png
yacy_090920a.png (75.82 KiB) 10029-mal betrachtet
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » So Sep 20, 2009 1:02 pm

Ich vermute immer noch diese Warning ist Ursache oder Nebeneffekt. Ich lasse ihn gerade weiterlaufen und ausser yacycore macht er eigentlich nichts.
Code: Alles auswählen
E 2009/09/20 13:55:02 YACY yacyClient.queryUrlCount error asking peer 'fduppa':java.net.SocketTimeoutException: Read timed out
E 2009/09/20 13:57:26 YACY yacyClient.queryUrlCount error asking peer '192-168-1-2-78dpnw65':java.net.ConnectException: Connection refused
W 2009/09/20 13:58:08 FILEHANDLER Unexpected error while processing query.
Session: Session_213.168.93.5:37035#0
Query:   /yacy/hello.html
Client:  213.168.93.5
Reason:  java.io.IOException: FileUploadException Stream ended unexpectedly
java.io.IOException: FileUploadException Stream ended unexpectedly
at de.anomic.http.server.HTTPDemon.parseMultipart(HTTPDemon.java:913)
at de.anomic.http.server.HTTPDFileHandler.doResponse(HTTPDFileHandler.java:358)
at de.anomic.http.server.HTTPDFileHandler.doPost(HTTPDFileHandler.java:254)
at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:630)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverCore$Session.listen(serverCore.java:728)
at de.anomic.server.serverCore$Session.run(serverCore.java:619)
W 2009/09/20 13:58:47 HTTPC cleanUp ConnectionInfo interrupted by ConcurrentModificationException

dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mo Sep 21, 2009 8:18 am

Hoffentlich bringt das nun was, er ist wieder idle mit 500k URLs im Puffer. Wenn ich jetzt neustarte macht er weiter.
Lasse ihn jetzt so falls du weitere Informationen brauchst, er beantwortet Anfragen.
************* Start Thread Dump Mon Sep 21 09:18:11 CEST 2009 *******************

YaCy Version: 0.91/6331
Assigned Memory = 747307008
Used Memory = 655679496
Available Memory = 91627512


THREADS WITH STATES: BLOCKED


THREADS WITH STATES: RUNNABLE

Code: Alles auswählen
yacy@myacy:~/y2/yacy$ kill -3 15461
yacy@myacy:~/y2/yacy$ tail -f yacy.log
"VM Thread" prio=1 tid=0x00002aaaaab5f800 nid=0x3c68 runnable

"GC task thread#0 (ParallelGC)" prio=1 tid=0x0000000040134c00 nid=0x3c66 runnable

"GC task thread#1 (ParallelGC)" prio=1 tid=0x00000000401359f0 nid=0x3c67 runnable

"VM Periodic Task Thread" prio=1 tid=0x000000004012bb60 nid=0x3c70 waiting on condition

D 2009/09/21 09:16:15 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/21 09:16:18 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/21 09:16:21 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/21 09:16:24 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/21 09:16:27 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/21 09:16:30 CRAWLER omitting de-queue/remote: stack is empty
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 22, 2009 6:44 am

Das ist nun in 6335 neu, er crawlt aber stabil seit 2 Stunden.
Auffällig ist dass er beim crawlen mehr freien Speicher hat.

Weiss jemand wie ich bei putty bequemer von der Konsole kopieren kann, während diese scrollt, oder die Ausgabe auf dem client umleite? Mir gehen sehr viele angezeigte errors verloren.

************* Start Thread Dump Tue Sep 22 07:43:04 CEST 2009 *******************

YaCy Version: 0.91/6335
Assigned Memory = 783351808
Used Memory = 641696440
Available Memory = 141655368


Code: Alles auswählen
W 2009/09/22 07:01:31 IODispatcher emergency merge of files index.20090916043421726.blob, index.20090922045808242.blob to index.20090922050131517.blob
W 2009/09/22 07:08:39 IODispatcher emergency merge of files index.20090910040351701.blob, index.20090922050131517.blob to index.20090922050839020.blob
java.lang.AssertionError
at de.anomic.kelondro.text.ReferenceContainerCache.add(ReferenceContainerCache.java:447)
at de.anomic.kelondro.text.IndexCell.add(IndexCell.java:114)
at de.anomic.kelondro.text.Segment.addPageIndex(Segment.java:184)
at de.anomic.kelondro.text.Segment.storeDocument(Segment.java:290)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1695)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1678)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.AssertionError
at de.anomic.kelondro.text.ReferenceContainerCache.add(ReferenceContainerCache.java:447)
at de.anomic.kelondro.text.IndexCell.add(IndexCell.java:114)
at de.anomic.kelondro.text.Segment.addPageIndex(Segment.java:184)
at de.anomic.kelondro.text.Segment.storeDocument(Segment.java:290)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1695)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1678)
... 12 more
E 2009/09/22 07:26:00 BLOCKINGTHREAD Runtime Error in serverInstantThread.job, thread 'java.lang.reflect.Method.storeDocumentIndex.17': null; target exception: null
java.lang.AssertionError
at de.anomic.kelondro.text.ReferenceContainerCache.add(ReferenceContainerCache.java:447)
at de.anomic.kelondro.text.IndexCell.add(IndexCell.java:114)
at de.anomic.kelondro.text.Segment.addPageIndex(Segment.java:184)
at de.anomic.kelondro.text.Segment.storeDocument(Segment.java:290)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1695)
at de.anomic.search.Switchboard.storeDocumentIndex(Switchboard.java:1678)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantBlockingThread.job(serverInstantBlockingThread.java:87)
at de.anomic.server.serverAbstractBlockingThread.run(serverAbstractBlockingThread.java:64)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)


dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 22, 2009 9:24 am

Ich erwische immer nur einzelne, das hier alle paar Sekunden.
Code: Alles auswählen
W 2009/09/22 10:21:59 PLASMA Unable to parse the resource 'http://www.wikio.co.uk/forward?from=sharethis&go=ask&url=http%3A%2F%2Fwww.wikio.co.uk%2Finfo%3Fid%3D127074208&title=Channel+4+boss+Duncan+to+resign'. No resource content available (1) source == null, url = http://www.wikio.co.uk/forward?from=sharethis&go=ask&url=http%3A%2F%2Fwww.wikio.co.uk%2Finfo%3Fid%3D127074208&title=Channel+4+boss+Duncan+to+resign; url = http://www.wikio.co.uk/forward?from=sharethis&go=ask&url=http%3A%2F%2Fwww.wikio.co.uk%2Finfo%3Fid%3D127074208&title=Channel+4+boss+Duncan+to+resign
W 2009/09/22 10:22:02 PLASMA Unable to parse the resource 'http://twitter.com/mammarazzi1'. No resource content available (1) source == null, url = http://twitter.com/mammarazzi1; url = http://twitter.com/mammarazzi1
java.lang.IllegalArgumentException: Host name may not be null
at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:68)
at org.apache.commons.httpclient.HttpHost.<init>(HttpHost.java:107)
at org.apache.commons.httpclient.HttpMethodBase.setURI(HttpMethodBase.java:280)
at org.apache.commons.httpclient.HttpMethodDirector.processRedirectResponse(HttpMethodDirector.java:616)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:179)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at de.anomic.http.client.Client.execute(Client.java:457)
at de.anomic.http.client.Client.GET(Client.java:276)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:132)
at de.anomic.crawler.retrieval.HTTPLoader.load(HTTPLoader.java:78)
at de.anomic.crawler.retrieval.LoaderDispatcher.load(LoaderDispatcher.java:213)
at de.anomic.crawler.retrieval.LoaderDispatcher.load(LoaderDispatcher.java:128)
at de.anomic.crawler.CrawlQueues$crawlWorker.run(CrawlQueues.java:565)
W
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 22, 2009 3:33 pm

Selbes Verhalten nach einem halben tag crawling, sehr konstant 1000ppm.
YaCy Version: 0.91/6335
Assigned Memory = 803405824
Used Memory = 628932168
Available Memory = 174473656

Code: Alles auswählen
yacy@myacy:~/y2/yacy$ kill -3 9616
yacy@myacy:~/y2/yacy$ tail -f yacy.log

"GC task thread#0 (ParallelGC)" prio=1 tid=0x0000000040134c00 nid=0x2591 runnable

"GC task thread#1 (ParallelGC)" prio=1 tid=0x00000000401359f0 nid=0x2592 runnable

"VM Periodic Task Thread" prio=1 tid=0x000000004012bb60 nid=0x259b waiting on condition

D 2009/09/22 16:30:28 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/22 16:30:29 YACY connect: updated KNOWN direct senior peer 'yacy2-ubfuberlin' from 160.45.152.2:8081
I 2009/09/22 16:30:29 YACY hello: responded remote peer 'yacy2-ubfuberlin' [160.45.152.2] in 82 milliseconds
D 2009/09/22 16:30:31 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/22 16:30:31 YACY connect: updated KNOWN direct senior peer 'locutus' from 92.227.74.95:8080
I 2009/09/22 16:30:31 YACY hello: responded remote peer 'locutus' [92.227.74.95] in 101 milliseconds
D 2009/09/22 16:30:34 CRAWLER omitting de-queue/remote: stack is empty
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Di Sep 22, 2009 3:39 pm

ahh, etwas mehr muss es schon sein vom Dump! Das beginnt viel weiter vorher

für die letzten beiden dumps davor habe ich fixes 6336 und 6337 gemacht.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 22, 2009 5:54 pm

Das ist das normale log, wenn du Ausschnitte aus dem neuen brauchst sag die ungefähre Uhrzeit.
Dateianhänge
090922.zip
(416.25 KiB) 55-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon anubis » Mi Sep 23, 2009 11:54 am

Ja ich habe das selbe Problem: Version 0.91/6339
Linux (Ubutnu 9.04)

die 0.9 die man von der Website runterladen kann ging noch.
anubis
 

Re: Balancer blockiert crawling

Beitragvon dulcedo » Fr Sep 25, 2009 10:33 am

Erreichbar aber crawlt nicht.
In yacy01.log ist die Stelle wo er stoppt, in .pt11 ist der komplette Threaddump per kill -3

************* Start Thread Dump Fri Sep 25 11:28:12 CEST 2009 *******************

YaCy Version: 0.91/6343
Assigned Memory = 932118528
Used Memory = 734310848
Available Memory = 197807680


THREADS WITH STATES: BLOCKED


THREADS WITH STATES: RUNNABLE
Dateianhänge
090925.zip
(389.59 KiB) 63-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 29, 2009 4:56 am

Selber peer SVN6351.
Dateianhänge
yacy.log.pt12.zip
(24.81 KiB) 62-mal heruntergeladen
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Di Sep 29, 2009 9:49 am

in beiden letzten Postings hier konnte ich nichts verdächtiges finden. Wie sieht es denn aus wenn du in so einem Peer-Zustand den Peer neu startest, crawlt er dann weiter?
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Di Sep 29, 2009 10:28 am

Ja er macht nach Neustart (auch kill) dann weiter. Unter Win habe ich das Verhalten auch, aber dort bekomme ich keinen Threaddump wenn er nicht mehr ansprechbar ist.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Re: Balancer blockiert crawling

Beitragvon Orbiter » Di Sep 29, 2009 11:15 pm

was heisst 'nicht ansprechbar' ganz genau: ist der http server dann ganz weg? geht ein 'telnet localhost 8080' noch irgendwie?

update: jetzt macht mein eigener Peer das auch; da stand im yacy00.log nach dem telnet:
Code: Alles auswählen
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.60.4.xx:27336
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.52.175.xx:40904
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.181.214.xx:49911
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.91.22.xx:61213
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.181.214.xx:49912
W 2009/09/30 00:16:28 SERVER * connections (200) exceeding limit (200), closing new incoming connection from /yy.181.214.xx:49913


Hast du das auch? dann guck ich morgen mal was das nun wieder ist.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Balancer blockiert crawling

Beitragvon dulcedo » Mi Sep 30, 2009 5:32 am

Das speziell nicht aber ich habe schon einige male einen Zusammenhang mit Zugriffen auf den Peer festgestellt. An Meldungen dass Connections nicht geöffnet werden können auch, allerdings nicht diese spezielle.

Telnet von andrem rechner aus ist erfolglos, der 2. Versuch bringt keinen timeout mehr. Im log sieht das so aus, versuch mit timeout, der peer ist aber über den browser erreichbar.
Was ich mit nicht erreichbar meine ist dass er weiterläuft aber nicht auf seinem port reagiert, das ist hier nicht der fall, ich habe allerdings die crawls pausiert.
Vermutlich wird er bei telnet-Versuchen während des crawlings dann hängenbleiben, das teste ich jetzt mit 6361.

Schuss ins Blaue: er bewegt sich mit den ausgehenden connections am limit und wenn eine zusätzliche eingeht dann kracht es.

Code: Alles auswählen
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset
E 2009/09/30 07:25:01 SERVER receive interrupted - exception 2 = Connection reset

2. versuch der keinen timout liefert:
Code: Alles auswählen
E 2009/09/30 07:25:55 SERVER receive interrupted - exception 2 = Read timed out
D 2009/09/30 07:25:56 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 07:25:59 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 07:26:02 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 07:26:05 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 07:26:08 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 07:26:11 CRAWLER omitting de-queue/remote: stack is empty

E 2009/09/30 07:26:27 SERVER receive interrupted - exception 2 = Read timed out


Telnet-Test bei einem anderen Peer, der über seinen port nicht erreichbar ist:
Code: Alles auswählen
W 2009/09/30 08:19:00 SERVER * connections (400) exceeding limit (400), closing new incoming connection from /85.216.61.209:61565
D 2009/09/30 08:19:01 CRAWLER omitting de-queue/local: stack is empty


Zweiter Versuch
Code: Alles auswählen
W 2009/09/30 08:21:51 SERVER * connections (400) exceeding limit (400), closing new incoming connection from /85.216.61.209:61615
D 2009/09/30 08:21:51 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:52 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:52 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 08:21:53 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:54 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:55 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:55 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 08:21:56 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:57 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:58 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:21:58 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 08:21:59 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:22:00 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:22:01 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:22:01 CRAWLER omitting de-queue/remote: stack is empty
D 2009/09/30 08:22:02 CRAWLER omitting de-queue/local: stack is empty
D 2009/09/30 08:22:02 YACY HELLO #1 to peer 'dulcedo' at 85.216.61.209:8080
W 2009/09/30 08:22:02 SERVER * connections (400) exceeding limit (400), closing new incoming connection from /85.216.61.209:61617
W 2009/09/30 08:22:02 SERVER * connections (400) exceeding limit (400), closing new incoming connection from /85.216.61.209:61618
I 2009/09/30 08:22:03 YACY yacyClient.publishMySeed thread 'PublishSeed_dulcedo' contacted peer at 85.216.61.209:8080, received 9916 bytes, time = 612 milliseconds
I 2009/09/30 08:22:03 YACY yacyClient.publishMySeed: Peer 'dulcedo' reported us as junior.
D 2009/09/30 08:22:03 YACY connect: updated KNOWN direct senior peer 'dulcedo' from 85.216.61.209:8080
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'geo-snap'
D 2009/09/30 08:22:03 YACY connect: updated KNOWN principal peer 'sixcooler' from 85.178.80.95:8080
D 2009/09/30 08:22:03 YACY connect: updated KNOWN principal peer '4o4' from 188.40.74.66:8080
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'ImageProcWurst'
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'vega-1' from tokeek.homedns.org:8080
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'Jeanne-de-Belleville' from 87.172.175.54:8110
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'apfelmaennchen' from yacy.kicks-ass.net:8080
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'tichys_yacy'
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'KIT01-02-20090918' from 141.52.175.12:8080
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'yacystats-de-02'
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'yacystats-de-01'
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'KIT01-09-20090917' from 141.52.175.26:8080
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'ZZZ' from 194.204.0.26:8000
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'yacy98-telemedia'
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'Spezies-8472'
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'TortugaOnline'
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'Hasenjagd'
D 2009/09/30 08:22:03 YACY connect: updated KNOWN senior peer 'KIT-n023' from 141.52.175.16:8080
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'debian-suche'
D 2009/09/30 08:22:03 YACY connect: rejecting old info about peer 'KIT01-01F-GEOCACHING'
I 2009/09/30 08:22:03 YACY publish: handshaked senior peer 'dulcedo' at 85.216.61.209:8080
D 2009/09/30 08:22:03 YACY DBSize before -> after Cleanup: 9 -> 8
I 2009/09/30 08:22:03 YACY PeerPing: I am accessible for 0 peer(s), not accessible for 8 peer(s).
I 2009/09/30 08:22:03 YACY PeerPing: myType is junior


Er fühlt sich als Junior, das passt ja ins Bild.
dulcedo
 
Beiträge: 1006
Registriert: Do Okt 16, 2008 6:36 pm
Wohnort: Bei Karlsruhe

Nächste

Zurück zu Fragen und Antworten

Wer ist online?

Mitglieder in diesem Forum: Yahoo [Bot] und 1 Gast

cron