Spende für eine Bug-Behebung.

Forum for developers

Spende für eine Bug-Behebung.

Beitragvon ds810 » Di Okt 11, 2016 4:29 pm

Hallo zusammen,

ich würde gerne für die Behebung eines Bugs gerne an das Projekt, bzw. an einen der Entwickler spenden.

Es handelt sich um die 100% CPU Auslastung bei der Blacklist-Überprüfung. Erreicht Occurrences einen Wert von 100, so bleibt es in einer Endlosschleife - so mein Eindruck. Auch wenn YACY nichts mehr zu tun hat, läuft der java-Prozess auf 100% CPU Last bis man YACY wieder neustartet.

Code: Alles auswählen
Occurrences: 100
at java.util.regex.Matcher.matches(Matcher.java:604)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:577)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:480)
at net.yacy.peers.Protocol.remoteSearchProcess(Protocol.java:677)
at net.yacy.peers.Protocol.primarySearch(Protocol.java:545)
at net.yacy.peers.RemoteSearch.run(RemoteSearch.java:104)

bzw.
Code: Alles auswählen
Occurrences: 100
at java.util.regex.Matcher.matches(Matcher.java:604)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:526)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:480)
at net.yacy.crawler.CrawlStacker.checkAcceptanceChangeable(CrawlStacker.java:451)
at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:314)
at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101)
at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Den Bug kann man sehr leicht reproduzieren.
-> Importiert die Blacklisten von meinem Peer (UbuntuServer)
-> Wenn ihr verwerfe übertragene URLs, die zu Ihrer Blacklist passen. unter ConfigNetwork_p.html aktiviert habt, dann wartet einfach ein paar Minuten ab.
-> Soll die Sache jedoch etwas beschleunigt werden, so feuert zusätzlich ein paar Suchanfragen ab.

Negative Nebeneffekte:
* CPU - Temperatur erreicht 72-75°
* Energieverbrauch steigt um ca. 10 Watt.
Wohl bemerkt - ohne dass YACY irgendwas macht!


Würde mich über eine Rückmeldung freuen.

Gruß
dS810
Zuletzt geändert von ds810 am Mi Okt 12, 2016 11:17 pm, insgesamt 1-mal geändert.
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mi Okt 12, 2016 11:09 am

Hi ds810,
if it is not a problem for you to continue this discussion in English (I am sorry to still have a so bad German level) I am ok to check what can be done.

I already imported your blacklist (UbuntuServer) on a YaCy peer running in peer-to-peer mode, but I could not reproduce the 100% CPU burn you report... Can you give some examples of search terms that trigger this behavior? Do you also have this problem when using your blacklist on a freshly installed YaCy (with an empty local index)?

Best regards
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 12, 2016 12:13 pm

Have you activated imported blacklists? After import they are disabled.

I tryed it on following systems
* HP N54L (8GB (4GB for yacy)/SSD) - my current system
* DELL Latitude (i5/8GB (4GB for yacy)/SSD)
* Tuxedo (5i/16GB (8GB for yacy)/SSD)
* Raspberry PI 1-3
* vServer - Provided by euserv.de

all systems are on Debian or UbuntuServer with default Java from repository
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mi Okt 12, 2016 12:50 pm

Yes it is activated and effectively performing filtering (I even checked in debug that BlackList.isListed() sometimes return true).

I can try to review code and run some performance measurements, but it would be helpful to know what kind of search make the CPU burn on your peer.
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 12, 2016 8:27 pm

I can create an account for you on my yacy instance... You can test it online...
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mi Okt 12, 2016 8:34 pm

Ok ds810, if you wish you can send me the login details by private message and I will have a try ... But to my mind we will really be able to improve something if we have a reproductible scenario on a development environment.
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 12, 2016 8:36 pm

login: see pm

if you want I can show it via teamviewer (or whatever).
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 12, 2016 8:39 pm

for reproduce: activate "Remote Index" on /ConfigNetwork_p.html
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Do Okt 13, 2016 7:13 am

I indeed had already enabled "Accept remote Index Transmissions" in the Network config.

By the way, finally I could reproduce a similar issue on my development peer : it hanged at more than 100% CPU after running many random search queries, becoming totally unresponsive.
In my case I found in the logs the following errors : "java.lang.OutOfMemoryError: GC overhead limit exceeded" and then "java.lang.OutOfMemoryError: Java heap space".

Can you check your log and see if you also have these kind of errors?

Edit : After restarting my peer and waiting some time I finally also obtain a continuous CPU usage over 100% without searching anything. I guess my initial dev index data was not large enough to see the problem occurring...
Some profiling indeed reveals the Hot Spot as expected : the transferURL servlet is spending all its time in the call of net.yacy.repository.Blacklist.isListed(). I will report here as soon as I know a little more about what to optimize.
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Do Okt 13, 2016 10:00 am

As I suspected, some specific URLs are really long to process against YaCy BlackList patterns. After modifying my peer to trace long processing times in BlackList.isListed(), I found some examples :
- http://molodezhnaja.ch/../../../../../. ... ndyman.htm is processed in about 15 seconds
- http://molodezhnaja.ch/../../../../../. ... bambi2.jpg is first processed in about 60 seconds, and then this time increase to several minutes the next times it is encountered.

So I think there is definitely something that can be done, maybe fixing the URL normalizing method for this kind of path, I will try to see what is the most appropriate.
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Do Okt 13, 2016 3:25 pm

I committed a fix to handle properly these long '../' URLs (https://github.com/yacy/yacy_search_ser ... 3c4d2bc9d6). On my peer they now pass the Blacklist check in a reasonable amount of time.

ds810, can you test from the latest GitHub sources and check if this fix alone solve your problem?
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Do Okt 13, 2016 6:09 pm

Sure, I will check it when I'm back at home.
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Fr Okt 14, 2016 9:47 pm

hey luc,

it looks very good. I will test it over the night. since 1h CPU doesn't get 100%: CPU-Temp ~45° 8-)
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Fr Okt 14, 2016 9:49 pm

:cry:
just now

CPU: 100%

Code: Alles auswählen
Occurrences: 100
at java.util.regex.Matcher.matches(Matcher.java:604)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:577)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:480)
at net.yacy.crawler.CrawlStacker.checkAcceptanceChangeable(CrawlStacker.java:451)
at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:314)
at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101)
at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Sa Okt 15, 2016 11:23 am

If I disable proxy (in my case Prefetch Depth: 2) it works fine.
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Sa Okt 15, 2016 11:40 am

Ok, at least you have some improvements... As I did on my dev peer, I propose to add some log information that would be activated only with log level = FINE on BlackList, and would thus allow to detect what remaining kind of URL is using too much CPU power.
I will have some time to do it in the coming days.

See you later
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mo Okt 17, 2016 7:19 pm

@ds810 , are you sure to run your peer from the latest compiled sources? The last trace you reported matches 1.90 sources : see Blacklist.java line 577 in the 1.90 release , vs line 577 in the latest sources which is a comment...
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mo Okt 17, 2016 8:16 pm

I think so.

I have replaced modified files. After then I have executed "ant clean all".

additional:
On my notebook I have cloned current git repository. Version 1.92
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mo Okt 17, 2016 9:55 pm

Ok. Indeed the right way to test from latest sources is to clone or pull the git repository or get the latest sources zipped. Only replacing two source files is likely to produce unpredictible behavior as they are extracted from the context they were coded and tested.

By the way, I commited a modification that allows to trace long BlackList processings ( over 10 seconds) : you just have to add a line with "Blacklist.level = FINE" in your yacy.logging conf file (and recompile the latest sources of course), and long running Blacklist.isListed() calls will be traced in the log, with lines such as "D 2016/10/17 22:28:15 Long processing : nn seconds. URL : http://host/path"

See you later
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Di Okt 18, 2016 4:27 pm

two processes on 100% at the same time :(
Code: Alles auswählen
Occurrences: 100
at java.util.regex.Matcher.matches(Matcher.java:604)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:577)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:480)
at net.yacy.peers.Protocol.remoteSearchProcess(Protocol.java:677)
at net.yacy.peers.Protocol.primarySearch(Protocol.java:545)
at net.yacy.peers.RemoteSearch.run(RemoteSearch.java:104)


Occurrences: 100
at java.util.regex.Matcher.matches(Matcher.java:604)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:577)
at net.yacy.repository.Blacklist.isListed(Blacklist.java:480)
at net.yacy.peers.Protocol.solrQuery(Protocol.java:1173)
at net.yacy.peers.RemoteSearch$2.run(RemoteSearch.java:349)
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mi Okt 19, 2016 10:40 am

Ok ds810, I had some time to let a peer running with your blacklist and it also hanged on another URL causing 100%CPU burn : http://www.chemgapedia.de/vsengine/tra/ ... vscml.html

I will try to find the reason, and this time I will let my peer run longer after a fix to check everything really works fine.

See you later
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Mi Okt 19, 2016 1:16 pm

Hey ds810, this is this kind of pattern ".*.*/(.*/)*abmw?\.asp.*" ( with a "(.*/)*" capturing group) which is highly CPU consuming against URLs having many segments in their path (see previously mentioned sample URL).

I propose to replace all occurrences of "(.*/)*" in your blacklist file, with "(.*/|)", which do the same job but appears to be much much more efficient (at least with JDK 7). I updated the Blacklist unit test with some more examples to confirm this pattern works as expected.

After modifying your blacklist file, you can also immediately check performance by using the /BlacklistTest_p.html page.

Example of concerned patterns :
- your current version :
Code: Alles auswählen
.*.*/(.*/)*abmw?\.asp.*
.*.*/(.*/)*ads/.*
.*.*/(.*/)*adv/.*

- modified version :
Code: Alles auswählen
.*.*/(.*/|)abmw?\.asp.*
.*.*/(.*/|)ads/.*
.*.*/(.*/|)adv/.*


I have been running my peer for two hours with the modified version of your blacklist, and until now the issue didn't occurred again.
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 19, 2016 7:44 pm

I will replace all patterns im my blacklists. I will report about the result.
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mi Okt 19, 2016 9:09 pm

Code: Alles auswählen
.*/.*\.exe
doesn't work.
Code: Alles auswählen
.*.*/.*\.exe
works as expected

test link
https://github.com/getgauge/gauge/relea ... x86_64.exe
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Do Okt 20, 2016 7:32 am

Ok but this kind of pattern
Code: Alles auswählen
.*.*/.*\.exe
was not a problem. The performance problem is really with patterns containing
Code: Alles auswählen
(.*/)*


Does your peer has a better behavior now?
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Do Okt 20, 2016 7:01 pm

Not really. In the morning CPU had 100% and 78°.

I have replased all (.*/)* to (.*/|)*
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Fr Okt 21, 2016 7:15 am

Hello ds810, apparently you didn't apply the fix correctly : you must use (.*/|) and not (.*/)* .It is important to remove the * after the capturing group, because this is what makes the performance issue with URLs having many path segments ( by segment I mean each /nnn/ part).
You can check yourself how (.*/|) is sufficient. You do not need to add the * after the parenthesis. Detailed explanation of the new capturing group (.*/|) :
- .* : captures any characters including /
- / : ensures the capturing group effectively ends with a / character
- | : ensures this capturing group can be empty : we either have .*/ or nothing in the group

So this new capturing group effectively captures the same things that (.*/)* but in a more efficient manner with current JDK Pattern implementations.

Best regards
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Sa Okt 22, 2016 10:32 am

ah.... thanks.

I have replaced it in all regex-items.

I will testing it now.
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon ds810 » Mo Okt 24, 2016 9:38 pm

since more than 24h

CPU: ~1-10 %
Temp: ~38-42°C

:D
ds810
 
Beiträge: 52
Registriert: Mo Mär 16, 2015 8:03 pm
Wohnort: Fürth

Re: Spende für eine Bug-Behebung.

Beitragvon luc » Sa Okt 29, 2016 1:21 pm

Great, a happy ending!
luc
 
Beiträge: 294
Registriert: Mi Aug 26, 2015 1:04 am


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron