Yacy won't re-start

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Yacy won't re-start

Beitragvon eros » Fr Mai 12, 2017 2:50 pm

Hi,

I've installed Yacy on a rather powerful server (48 cores, 256 GB RAM, Ubuntu 14.04, Java 1.7).

It ran smoothly for about two days, and I started a large crawl of the Italian web (I used about 10.000 URLs as seeds, I'm working on a Corpus Linguistics project at the University of Bologna).

I launched the crawl yesterday afternoon, but this morning I realized that maybe the 16GB of RAM I had assigned to Yacy was too small, so I decided to stop the server, increase the amount of RAM in the configuration file and then start the service again).

Here's what I did:

- I stopped the server using the web interface
- I waited for a few minutes for the Java process to terminate gracefully
- I changed this setting in /etc/yacy/yacy.conf: javastart_Xmx=Xmx32768m (the original amount was 16384)
- I restarted the service using: service yacy start

Now Yacy won't start. I waited for 30+ minutes and then I manually killed the Java process and tried again but to no avail. From what I can see, the problem seems to be here:

Code: Alles auswählen
java.io.FileNotFoundException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/toni.org-#gctLmQ.80/0003.stack (Too many open files)
   at java.io.FileInputStream.open(Native Method)
   at java.io.FileInputStream.<init>(FileInputStream.java:146)
   at net.yacy.kelondro.table.ChunkIterator.<init>(ChunkIterator.java:65)
   at net.yacy.kelondro.table.Table.<init>(Table.java:161)
   at net.yacy.kelondro.index.OnDemandOpenFileIndex.getIndex(OnDemandOpenFileIndex.java:61)
   at net.yacy.kelondro.index.OnDemandOpenFileIndex.size(OnDemandOpenFileIndex.java:153)
   at net.yacy.kelondro.index.BufferedObjectIndex.size(BufferedObjectIndex.java:152)
   at net.yacy.crawler.HostBalancer$1.run(HostBalancer.java:101)
W 2017/05/12 14:30:22 ConcurrentLog net.yacy.kelondro.util.kelondroException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/toni.org-#gctLmQ.80/0003.stack (Too many open files)
net.yacy.kelondro.util.kelondroException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/toni.org-#gctLmQ.80/0003.stack (Too many open files)
   at net.yacy.kelondro.table.Table.<init>(Table.java:228)
   at net.yacy.kelondro.index.OnDemandOpenFileIndex.getIndex(OnDemandOpenFileIndex.java:61)
   at net.yacy.kelondro.index.OnDemandOpenFileIndex.size(OnDemandOpenFileIndex.java:153)
   at net.yacy.kelondro.index.BufferedObjectIndex.size(BufferedObjectIndex.java:152)
   at net.yacy.crawler.HostBalancer$1.run(HostBalancer.java:101)


But I don't know how to fix it.

The directory /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/ contains 180330 files, that seems like a lot, but it shouldn't be a problem on a EXT4 filesystem.

I tried changing the RAM value back to the initial setting but it didn't help. I tried moving the data directory but in that case Yacy won't start.

During one (just one!) of the various page refreshes I did, I got this error page on the web browser:

Code: Alles auswählen
HTTP ERROR 500

Problem accessing /. Reason:

    Server Error
Caused by:

javax.servlet.ServletException: /usr/share/yacy/htroot/index.html
   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:895)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:312)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:595)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:191)
   at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:72)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:349)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:745)


Any suggestions?

Thanks in advance,
Eros
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Sa Mai 13, 2017 8:41 am

Hi eros,
the error you report suggests there might be a leak somewhere in YaCy (maybe some missing InputStream.close() or OuputStream.close() instructions) or a too low "file-max" setting on you machine (I guess you already checked "/proc/sys/fs/file-max" and ulimit, but checking "/proc/sys/fs/file-nr" when YaCy is running could also be interesting).

I am not sure this is directly related to your new memory setting. Maybe it was rather caused by the fact that you stopped the server while running a large crawl.

Which version do you use : 1.92/9000?

Did you try to remove part or all files from your "/usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks" folder (after eventually backing them up for later re-insert)?

There is also a setting that might help you in yacy.conf : "crawler.onDemandLimit". Its default value is 1000, meaning that when more than 1000 hosts are in the crawler stack, each queue backing file will be open, read and closed each time it is needed rather than loading in memory only once. Maybe increasing this value could help you, moreover with the large memory available you have.

Edit : if you can check your logs around the "HTTP ERROR 500" log trace there may be additional details that could be aslo interesting.

Best regards
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Sa Mai 13, 2017 8:28 pm

Thanks for your reply Luc.

Code: Alles auswählen
cat /proc/sys/fs/file-max
26251184

cat /proc/sys/fs/file-nr
1248   0   26251184

ulimit
unlimited


it looks like the crawl was within the limits. In any case I deleted the "freeworld" directory in the index and I was able to start yacy. I then restarted the crawl from scratch, but I'm afraid it will happen again in case yacy crashes or if I need to reboot the machine. I'll try and terminate (or at least pause) the crawl before shutting down yacy, but what worries me is that the whole application doesn't seem to be very robust, if a system crash can compromise days of crawling.

Also: does the crawl ever really end on it's own? I suppose it could go on potentially forever, do I need to stop it manually after a while?

I'll experiment some more.
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » So Mai 14, 2017 9:52 am

For sure it is very annoying to loose your work. At least if this would happen again, you should try keeping or backing up your /freeworld/SEGMENTS/ folder which holds the data you already indexed in your local Solr and RWI indexes.

The main parameter to control the end of a crawl is the "Crawling depth", i.e. the number of links followed in the webgraph starting from your crawl starting point. So if you set a too high value (over 8 according to Yacy help), it is likely you are indeed trying to crawl the whole Internet, and you should certainly stop it manually after some time.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » So Mai 14, 2017 10:50 am

Yup, it happened again.

I let the crawler run for about 14 hours, then I paused it before shutting down the server. It didn't help and I got the same error message:

Code: Alles auswählen
java.io.FileNotFoundException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/toni.org-#gctLmQ.80/0003.stack (Too many open files)


I moved the /INDEX/freeworld/QUEUES/ directory: the system started in no time and the index seems to be safe.

BTW, this is the size of QUEUES:

Code: Alles auswählen
root@amelia:/storage# ls -R QUEUES/| wc -l
536442


I'll try increasing the "crawler.onDemandLimit" value to 50000, any suggestions on suitable values? Currently I assigned 64 GB of RAM to Yacy, but I could bump it up to 96 GB.

I'm using the default Crawling depth of 3 for now, so that shouldn't be the problem. I don't need to restart the server that often, but I'm experimenting to see if Yacy is a viable solution to the problems I'm having with my research project, so I need to know that I can rely on it.

Unfortunately, most of the documentation is is German and I don't speak German, so I need to do a bit of trial and error and use Google Translate... ;)

Thanks for your help Luc!
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Mo Mai 15, 2017 6:15 am

I'll try increasing the "crawler.onDemandLimit" value to 50000, any suggestions on suitable values?

With the amount of RAM you have I guess you can even increase this value, but that's only a supposition : personally I always run YaCy on mid-range desktop machines or on low-end Virtual Machines so I am not really experienced with large scale crawls on high performance servers...
But I can say that some months ago I ran successfully some crawls with 100 000 links by crawl start file, needing some days to end, but working rather well on a machine with only 2GB RAM. I have to admit that I didn't try to restart YaCy while these crawls where running. And I was not aware that so many files could be created in the QUEUES folder... It looks like there is definitely something to do, at least at start to avoid exhausting the file descriptors made available by the OS.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Mo Mai 15, 2017 1:48 pm

Increasing "crawler.onDemandLimit" to 50000 didn't work: I started getting "Too many open files" error in the web frontend, I decreased it to 5000 and again I got errors, so I reverted the value to 1000 since I figured that probably that wasn't the problem anyway.

I tried terminating the crawler before restarting the service and this time it worked. I then started a new crawl using the same seed URLs but, after 24 hours no new pages had been indexed, so I removed the index and I started over. Now I'm going to let the crawler run for a few days.

I noticed that Yacy uses *a lot* of memory: I assigned 96 GB to stay on the safe side, after 24 hours it was running steadily on 20 GB and now (after a restart and 3 hours of crawling) it's already using 19 GB. I guess it's normal for Java to use large amounts of memory when it's available, I just wanted to let you know my experience.

Another question: I installed Yacy on my laptop too, and I tried submitting the same query on the server and on the laptop. The results are different (I get less results on the laptop) even though theoretically the server's index should be reachable since it's running in "senior mode" (i.e. port 8090 is open on the firewall and the server claims to be "senior"). Is that a behaviour to be expected?
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Mo Mai 15, 2017 6:16 pm

Thanks for sharing your experience. It looks like there is really something to dig regarding this problem with file descriptors... I will try to reproduce your scenario when having some time.

I guess it's normal for Java to use large amounts of memory when it's available, I just wanted to let you know my experience.

For sure Java lets more easily fill the available memory when using some structures without limitations. But in the end to my mind this really depends on how the application code is organized an is not so much related to Java. In YaCy there are many places where controls are performed on the available memory, so it is not a surprise to me that it uses much when much is available... which doesn't mean that YaCy necessarily needs so large amounts of memory to run fine.

I installed Yacy on my laptop too, and I tried submitting the same query on the server and on the laptop. The results are different

Yes, in peer-to-peer mode this may sound a bit surprising but it is the expected behavior, at least with the current YaCy architecture. When performing a search in p2p mode, some peers among the network are selected to be requested and to aggregate results from, and the selection rules even include some Randomness. Thus we can not obtain a deterministic behavior (like in Solr Cloud) in this p2p mode because each node has its own index and its own blacklisting and crawling rules, and is not supposed to obey to some master(s) node rules. The index distribution over the nodes accepting to receive remote index parts (DHT-in) is made to homogenize a bit the whole distributed index, but in the end the behavior is not deterministic... but after a long running time of your two YaCy peers I guess you should obtain more similar results.
I hope this answer a bit to your question.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon luc » Fr Mai 19, 2017 6:19 am

Hello eros, did you keep a copy of the full log file containing the "(Too many open files)" error? If you would like to share it it may be helpful for a deeper analysis of the issue.

Indeed I have run again some tests with large crawl queues (around 100000 to 220000 files in the QUEUES folder), on a Debian Jessie with its file-max system setting value default set at 404027. But until now, after many stops and restarts of YaCy, including change of the memory settings, I did not reproduce the error you had.

I was running with at max 2GB RAM memory dedicated to YaCy on a domestic DSL connection, so not really in the same conditions as you. But if you wish, I am still interested to have a look at your full log trace before the error to find some clue about what was going wrong.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Mo Mai 22, 2017 11:21 am

Hi Luc,

unfortunately I don't have the logs. I suspect the problem might be tied to the version of Java I was using (the server had Java 1.7)

So I tried running 2 crawls on a regular PC (Ubuntu 16.04.2, 4 GB RAM, Java 1.8.0_131) and I noticed that:

1. the crawl is running much faster
2. I didn't have any problems restarting the server

Caveats: for these new crawls I started from a single URL https://it.wikipedia.org/wiki/Portale:Portali for the first and https://en.wikipedia.org/wiki/Portal:Contents/Indices for the second. Both were set with a crawling depth of 6, they are still running (and much faster than the crawls on the server ever did) and have indexed about 1.1 million pages from wikipedia.

Now I upgraded Java to version Java 1.8.0_131 on the server (I obviously stopped the crawls before restarting Yacy) and it started without incident. The problem now is that I cannot export the large server's index to XML, whenever I try (using the default settings) I get this error message:

Code: Alles auswählen
HTTP ERROR 500

Problem accessing /IndexExport_p.html. Reason:

    Server Error
Caused by:

javax.servlet.ServletException: /usr/share/yacy/htroot/IndexExport_p.html
   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:895)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:312)
   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:374)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:542)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)

YaCy 1.92 - powered by Jetty -


Any ideas?
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon eros » Mo Mai 22, 2017 11:33 am

Update, these are the errors I get on the log file:

Code: Alles auswählen
I 2017/05/22 12:30:04 Fulltext HOT DUMP dump path = /usr/share/yacy/DATA/ARCHIVE
W 2017/05/22 12:30:04 ConcurrentLog java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:670)
   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:881)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:312)
   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:374)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:542)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
W 2017/05/22 12:30:04 org.eclipse.jetty.servlet.ServletHandler
javax.servlet.ServletException: /usr/share/yacy/htroot/IndexExport_p.html
   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:895)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:312)
   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:374)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:542)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)
I 2017/05/22 12:30:08 SWITCHBOARD dhtTransferJob: no selection, too many entries in transmission buffer: 17
I 2017/05/22 12:30:08 SWITCHBOARD dhtTransferJob: result from dequeueing: true


I see a NullPointerException, which is never good...
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Di Mai 23, 2017 6:50 am

Hi eros,
thanks for your feedback, this is interesting to know that the upgrade to Java 1.8 seems to solve your initial problem.

Regarding the error on index export, I recently fixed some issues with this feature notably a NullPointerException case (see commit e5858bc on the GitHub repository). So if you want to have a try... you just have to build and install your own deb package from latest source. Not very difficult, and you will also benefit from all the other fixes and improvements made since release 1.92/9000.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Di Mai 23, 2017 3:31 pm

I tried building a deb package but apparently something went wrong, do I need to use Java 7 to compile it?

Code: Alles auswählen
Buildfile: /home/eros/yacy_search_server/build.xml

buildGitRevTask:
   [delete] Deleting: /home/eros/yacy_search_server/libbuild/GitRevTask.jar
      [jar] Building jar: /home/eros/yacy_search_server/libbuild/GitRevTask.jar

determineGitRevision:
   [gitRev] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
   [gitRev] SLF4J: Defaulting to no-operation (NOP) logger implementation
   [gitRev] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

readBuildProperties:

init:
     [echo] YaCy Branch:
     [echo] YaCy Version number: 1.92
     [echo] YaCy Release number: 9216
   [delete] Deleting: /home/eros/yacy_search_server/classes/net/yacy/peers/operation/yacyBuildProperties.java
     [copy] Copying 1 file to /home/eros/yacy_search_server/classes/net/yacy/peers/operation
     [copy] Copying 1 file to /home/eros/yacy_search_server/classes

deb:
     [exec] dpkg-buildpackage: source package yacy
     [exec] dpkg-buildpackage: source version 1.92.9216
     [exec] dpkg-buildpackage: source distribution unstable
     [exec] dpkg-buildpackage: source changed by Michael Peter Christen <mc@yacy.net>
     [exec]  dpkg-source --before-build yacy_search_serverdpkg-buildpackage: host architecture amd64
     [exec]
     [exec] dpkg-checkbuilddeps: error: Unmet build dependencies: openjdk-7-jdk debhelper (>= 5) m4
     [exec] dpkg-buildpackage: warning: build dependencies/conflicts unsatisfied; aborting
     [exec] dpkg-buildpackage: warning: (Use -d flag to override.)
     [exec] Result: 3

BUILD SUCCESSFUL
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Mi Mai 24, 2017 7:50 am

Yes that's it : currently openjdk-7-jdk is the required dependency needed to build the Debian package. But this doesn't prevent you to then install and run with openjdk-8!
If this is a problem, you can modify this dependency for your own use in the file
Code: Alles auswählen
yacy_search_server/debian/control
.

And for the build, as stated in the message, you also need the "debhelper" and "m4" packages.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Mi Mai 24, 2017 10:00 am

Thank you Luc, I was able to compile the deb package and I started the server, but when I try to authenticate on the web interface I get this error message:

Code: Alles auswählen
Ops!

Message: null
java.lang.NullPointerException
   at org.eclipse.jetty.security.authentication.DigestAuthenticator$Digest.check(DigestAuthenticator.java:353)
   at net.yacy.http.YaCyLegacyCredential.check(YaCyLegacyCredential.java:68)
   at org.eclipse.jetty.security.MappedLoginService$KnownUser.authenticate(MappedLoginService.java:320)
   at org.eclipse.jetty.security.MappedLoginService.login(MappedLoginService.java:226)
   at org.eclipse.jetty.security.authentication.LoginAuthenticator.login(LoginAuthenticator.java:61)
   at org.eclipse.jetty.security.authentication.DigestAuthenticator.validateRequest(DigestAuthenticator.java:229)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:512)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)


Below you can see the log file on the server side (/var/log/yacy/yacy00.log)

Code: Alles auswählen
W 2017/05/24 10:52:21 ConcurrentLog java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:672)
        at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:883)
        at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:499)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at org.eclipse.jetty.security.authentication.DigestAuthenticator$Digest.check(DigestAuthenticator.java:353)
        at net.yacy.http.YaCyLegacyCredential.check(YaCyLegacyCredential.java:68)
        at org.eclipse.jetty.security.MappedLoginService$KnownUser.authenticate(MappedLoginService.java:320)
        at org.eclipse.jetty.security.MappedLoginService.login(MappedLoginService.java:226)
        at org.eclipse.jetty.security.authentication.LoginAuthenticator.login(LoginAuthenticator.java:61)
        at org.eclipse.jetty.security.authentication.DigestAuthenticator.validateRequest(DigestAuthenticator.java:229)
        at org.eclipse.jetty.security.authentication.DeferredAuthentication.authenticate(DeferredAuthentication.java:68)
        at org.eclipse.jetty.server.Request.isUserInRole(Request.java:1553)
        at net.yacy.cora.protocol.RequestHeader.isUserInRole(RequestHeader.java:353)
        at net.yacy.search.Switchboard.adminAuthenticated(Switchboard.java:3618)
        at net.yacy.search.Switchboard.verifyAuthentication(Switchboard.java:3710)
        at feed.respond(feed.java:28)
        ... 30 more
W 2017/05/24 10:52:21 org.eclipse.jetty.servlet.ServletHandler
javax.servlet.ServletException: /usr/share/yacy/htroot/api/feed.rss
        at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:909)
        at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:499)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:748)


I'm running version 1.92.9216
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Mi Mai 24, 2017 6:01 pm

Arrh sorry! I forgot to mention that the authentication method switched in the meantime from Basic HTTP to Digest HTTP, thus encoding differently the admin password.
So, after upgrading from 1.92/9000, you currently (this may be improved for the next official release, as well as the error message you got...) have to explicitely run
Code: Alles auswählen
dpkg-reconfigure yacy
and enter again your admin password. Then it should be ok!
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon eros » Do Mai 25, 2017 2:23 pm

OK, now it works, thanks!
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon smokingwheels » Sa Mai 27, 2017 12:39 am

Have you allocated a SWAP file in your VM?

https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04
I recommend a large one because you can load the CPU up to a higher average load and increase stability, from my experience.

Have you increased any of the settings on the /PerformanceQueues_p.html page?
smokingwheels
 
Beiträge: 137
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy won't re-start

Beitragvon eros » Mo Mai 29, 2017 10:20 am

Hi,

I'm not running Yacy on a Virtual Machine, the OS has 256 GB RAM (96 GB are dedicated to Yacy) and a 64 GB SWAP partition.

Since I installed version 1.92/9218 I noticed some improvements: I can now restart Yacy without messing up the index or the crawl, and the crawl seems to be somewhat faster, at least for some time after the restart.

Unfortunately there are still problems: I've been forced to restart the server a couple of times because, after about 20 hours, the crawls slow down considerably to 5-10 pages per minute until they basically stop and the web interface becomes unresponsive. Restarting the server every 10 hours or so seems to fix the problem. The log reported multiple Solr exception, unfortunately the logs got wiped when I restarted the server, I'll post them next time it happens.

I'm running multiple crawls simultaneously as suggested in the documentation and I get an indexing speed of about 100 to 800 pages per minute (at least after a yacy restart, then it slows down), is that a reasonable speed? Bandwidth shouldn't be a bottleneck since I'm on a fast gigabit fiber connection at the University.

BTW: the server I'm experimentig with is http://amelia.sslmit.unibo.it:8090/
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon smokingwheels » Mo Mai 29, 2017 6:18 pm

Network Bandwidth Guidelines?

Have you considered a RAM disk to do the initial crawl with scheduled task to backup contents (Yacy would have to be stopped and started)?
Do you have any logs lines that say similar HostQueue forcing crawl-delay of 673 milliseconds for http://www.zxyyy.net: minimumDelta = 250, flux = 0, host.average = 1683 ?
A lot of sites are protected I think by DDos protection providers this could be an issue for you.
What sort of increments do you get on the Traffic (Crawler) counter per update?
Did you have over 11 mb/s when cloning from github?

I have done a few stress tests and java spikes my CPU not sure for the reason. https://twitter.com/smokingwheels/status/868477414178865152
This one asks for a peak nearly 25 x the power of my CPU, open both pics you will see the scale the one on the right is the load averages. https://twitter.com/smokingwheels/status/868705075778117632
Searching was performed over the period.

You can define you own network and have lots of peers on it doing remote crawls I have never tried it.
smokingwheels
 
Beiträge: 137
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy won't re-start

Beitragvon eros » Mi Mai 31, 2017 10:07 am

I'm still having problems when the crawl gets too large, here's what I did:

- last Friday I did a fresh install of yacy, removing all configuration files and indexes

- I started 5 separate crawls (4 of them had a starting list of 3.000 URLs, the last one had 10.000 URLs)

- after about 20 hours the web interface became unresponsive, I restarted Yacy and the crawls resumed correctly

- I restarted Yacy every 10-15 hours because I noticed that the crawls had a tendency to slow down after a while

- last night (Tuesday) the crawl was "flatlining" (i.e. the index wasn't growing anymore, indexer consistently reported 0 pages per minute) so I tried restarting Yacy again

- restarting didn't work, I briefly got this error page on the web interface during the restart attempt (afterwards the web interface did not respond at all):

Code: Alles auswählen
javax.servlet.ServletException: /usr/share/yacy/htroot/index.html
   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:909)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:595)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:191)
   at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:72)
   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:351)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)


On the server side I got a bunch of errors:

Code: Alles auswählen
E 2017/05/31 01:37:13 org.apache.solr.handler.RequestHandlerBase org.apache.solr.common.SolrException: Exception during facet.field: coordinate_p
   at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:721)
   at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:706)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:660)
   at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:731)
   at org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)
   at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)
   at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
   at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
   at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:219)
   at net.yacy.http.servlets.SolrSelectServlet.service(SolrSelectServlet.java:251)
   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: unexpected docvalues type NONE for field 'coordinate_p' (expected=SORTED). Use UninvertingReader or index with docvalues.
   at org.apache.lucene.index.DocValues.checkField(DocValues.java:208)
   at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)
   at org.apache.solr.request.PerSegmentSingleValuedFaceting$SegFacet.countTerms(PerSegmentSingleValuedFaceting.java:269)
   at org.apache.solr.request.PerSegmentSingleValuedFaceting$1.call(PerSegmentSingleValuedFaceting.java:109)
   at org.apache.solr.request.PerSegmentSingleValuedFaceting$1.call(PerSegmentSingleValuedFaceting.java:106)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   ... 1 more


yacy07.log:java.io.FileNotFoundException: /usr/share/yacy/htroot/yacy/hello.html (Too many open files)
yacy07.log:   at java.io.FileInputStream.open0(Native Method)
yacy07.log:   at java.io.FileInputStream.open(FileInputStream.java:195)
yacy07.log:   at java.io.FileInputStream.<init>(FileInputStream.java:138)
yacy07.log:   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:1080)
yacy07.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
yacy07.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:376)
yacy07.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
yacy07.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
yacy07.log:   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
yacy07.log:   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
yacy07.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
yacy07.log:   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
yacy07.log:   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
yacy07.log:   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
yacy07.log:   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
yacy07.log:   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
yacy07.log:   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
yacy07.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
yacy07.log:   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
yacy07.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy07.log:   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
yacy07.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy07.log:   at org.eclipse.jetty.server.Server.handle(Server.java:499)
yacy07.log:   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
yacy07.log:   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
yacy07.log:   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
yacy07.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
yacy07.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
yacy07.log:   at java.lang.Thread.run(Thread.java:748)


yacy06.log:java.io.FileNotFoundException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/www.orinstarn.com-#i9ulCZ.80/0004.stack (Too many open files)
yacy06.log:   at java.io.FileInputStream.open0(Native Method)
yacy06.log:   at java.io.FileInputStream.open(FileInputStream.java:195)
yacy06.log:   at java.io.FileInputStream.<init>(FileInputStream.java:138)
yacy06.log:   at net.yacy.kelondro.table.ChunkIterator.<init>(ChunkIterator.java:65)
yacy06.log:   at net.yacy.kelondro.table.Table.<init>(Table.java:161)
yacy06.log:   at net.yacy.kelondro.index.OnDemandOpenFileIndex.getIndex(OnDemandOpenFileIndex.java:61)
yacy06.log:   at net.yacy.kelondro.index.OnDemandOpenFileIndex.size(OnDemandOpenFileIndex.java:153)
yacy06.log:   at net.yacy.kelondro.index.BufferedObjectIndex.size(BufferedObjectIndex.java:152)
yacy06.log:   at net.yacy.crawler.HostBalancer$1.run(HostBalancer.java:101)
yacy06.log:W 2017/05/31 01:37:30 ConcurrentLog net.yacy.kelondro.util.kelondroException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/www.orinstarn.com-#i9ulCZ.80/0004.stack (Too many open files)
yacy06.log:net.yacy.kelondro.util.kelondroException: /usr/share/yacy/DATA/INDEX/freeworld/QUEUES/CrawlerLimitStacks/www.orinstarn.com-#i9ulCZ.80/0004.stack (Too many open files)
yacy06.log:   at net.yacy.kelondro.table.Table.<init>(Table.java:228)
yacy06.log:   at net.yacy.kelondro.index.OnDemandOpenFileIndex.getIndex(OnDemandOpenFileIndex.java:61)
yacy06.log:   at net.yacy.kelondro.index.OnDemandOpenFileIndex.size(OnDemandOpenFileIndex.java:153)
yacy06.log:   at net.yacy.kelondro.index.BufferedObjectIndex.size(BufferedObjectIndex.java:152)
yacy06.log:   at net.yacy.crawler.HostBalancer$1.run(HostBalancer.java:101)


yacy05.log:javax.servlet.ServletException: /usr/share/yacy/htroot/yacy/query.html
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:909)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:376)
yacy05.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
yacy05.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
yacy05.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
yacy05.log:   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
yacy05.log:   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
yacy05.log:   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
yacy05.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
yacy05.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy05.log:   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
yacy05.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy05.log:   at org.eclipse.jetty.server.Server.handle(Server.java:499)
yacy05.log:   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
yacy05.log:   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
yacy05.log:   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
yacy05.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
yacy05.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
yacy05.log:   at java.lang.Thread.run(Thread.java:748)


yacy05.log:java.lang.reflect.InvocationTargetException: class /usr/share/yacy/htroot/yacy/query.class is missing:/usr/share/yacy/htroot/yacy/query.class (Too many open files):/usr/share/yacy/htroot/yacy/query.class
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.rewriteMethod(YaCyDefaultServlet.java:808)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:672)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:883)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:314)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:376)
yacy05.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
yacy05.log:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
yacy05.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
yacy05.log:   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
yacy05.log:   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
yacy05.log:   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
yacy05.log:   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
yacy05.log:   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
yacy05.log:   at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
yacy05.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy05.log:   at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33)
yacy05.log:   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
yacy05.log:   at org.eclipse.jetty.server.Server.handle(Server.java:499)
yacy05.log:   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
yacy05.log:   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
yacy05.log:   at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
yacy05.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
yacy05.log:   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
yacy05.log:   at java.lang.Thread.run(Thread.java:748)
yacy05.log:Caused by: java.lang.ClassNotFoundException: /usr/share/yacy/htroot/yacy/query.class (Too many open files):/usr/share/yacy/htroot/yacy/query.class
yacy05.log:   at net.yacy.server.serverClassLoader.loadClass(serverClassLoader.java:100)
yacy05.log:   at net.yacy.http.servlets.YaCyDefaultServlet.rewriteMethod(YaCyDefaultServlet.java:792)
yacy05.log:   ... 27 more


yacy05.log:W 2017/05/31 01:37:31 ConcurrentLog java.lang.NullPointerException
yacy05.log:java.lang.NullPointerException
yacy05.log:   at net.yacy.kelondro.io.CachedFileWriter.seek(CachedFileWriter.java:143)
yacy05.log:   at net.yacy.kelondro.blob.HeapReader.get(HeapReader.java:498)
yacy05.log:   at net.yacy.kelondro.blob.ArrayStack$BlobValues.next0(ArrayStack.java:701)
yacy05.log:   at net.yacy.kelondro.blob.ArrayStack$BlobValues.next0(ArrayStack.java:685)
yacy05.log:   at net.yacy.cora.util.LookAheadIterator.checkInit(LookAheadIterator.java:53)
yacy05.log:   at net.yacy.cora.util.LookAheadIterator.hasNext(LookAheadIterator.java:60)
yacy05.log:   at net.yacy.kelondro.rwi.ReferenceContainerArray.get(ReferenceContainerArray.java:308)
yacy05.log:   at net.yacy.kelondro.rwi.IndexCell.get(IndexCell.java:355)
yacy05.log:   at net.yacy.search.index.Segment$ReferenceReport.<init>(Segment.java:272)
yacy05.log:   at net.yacy.search.index.Segment$ReferenceReportCache.getReferenceReport(Segment.java:244)
yacy05.log:   at net.yacy.search.schema.CollectionConfiguration.postprocessing_references(CollectionConfiguration.java:1887)
yacy05.log:   at net.yacy.search.index.Segment.storeDocument(Segment.java:597)
yacy05.log:   at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3134)
yacy05.log:   at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3068)
yacy05.log:   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
yacy05.log:   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
yacy05.log:   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
yacy05.log:   at java.lang.reflect.Method.invoke(Method.java:498)
yacy05.log:   at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101)
yacy05.log:   at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82)
yacy05.log:   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
yacy05.log:   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
yacy05.log:   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
yacy05.log:   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
yacy05.log:   at java.lang.Thread.run(Thread.java:748)
yacy05.log:E 2017/05/31 01:37:31 BLOCKINGTHREAD Runtime Error in serverInstantThread.job, thread 'java.lang.reflect.Method.storeDocumentIndex.5': null


In the few weeks I've been experimenting with Yacy, I noticed that this tends to happen when the local index gets to about 2 million pages, the size of the index was this:

du -hcs freeworld/*
4.2M freeworld/NETWORK
36G freeworld/QUEUES
44G freeworld/SEGMENTS
79G total

Ultimately, I removed the QUEUES directory and I was able to restart Yacy preserving the index (even though the crawls have stopped).
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm

Re: Yacy won't re-start

Beitragvon luc » Do Jun 01, 2017 11:46 pm

It looks like in the end your upgrade to Java 1.8 didn't solve the problem related to the too many files open... At least I think this is the key problem in the trace you report, as for example the error "Exception during facet.field: coordinate_p" is normally not a blocking one.
I will try to dig again in that direction to find an eventual fix... unfortunately this kind of issue takes some non negligible time to test.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon smokingwheels » Sa Jun 03, 2017 12:48 am

I know its not recommend but you could manually tune the JAVA options here is an example running a Minecraft server http://www.minecraftforum.net/forums/archive/alpha/alpha-survival-multiplayer/823328-making-your-server-lag-less-by-tuning-java there are plenty more to look at as well on other sites.

36G freeworld/QUEUES is large your could split crawler lists up?
I have such a program but you would need to run QB64 and know what to change. https://github.com/smokingwheels/loklak_split.
I could edit the program suit and add it to Github if needed.

I think I have found a bug when the TOP setting the "VIRT" memory in the JAVA process skyrockets (9m to 9.9 GB) is slows the crawler down no end and there is nothing you can do.
The Top "VIRT" got up to 14GB and a system shutdown was causing constant Disk I/O for too long to where I hit the reset and started again.

I have also learned to restart yacy you perform a Shutdown of yacy, pkill java if needed and actually turn machine off with shutting it down in the normal manner and power up again.

Ubuntu 16.04 x64 java 1.8.131 oracle.
smokingwheels
 
Beiträge: 137
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy won't re-start

Beitragvon smokingwheels » Mo Jun 05, 2017 7:01 pm

If you still have the error to many files open this is a possible fix by increasing the number of open files in your system.
https://www.tecmint.com/increase-set-open-file-limits-in-linux/

I have not really tried it yet.
smokingwheels
 
Beiträge: 137
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy won't re-start

Beitragvon luc » Do Jun 08, 2017 6:43 am

@eros, I have made a big cleanup on all potential file handlers leaks I could find in YaCy codebase. I can not guarantee this will solve your specific restart issue, but it could help.

So if you have some time, do not hesitate to upgrade to latest GitHub sources.
luc
 
Beiträge: 301
Registriert: Mi Aug 26, 2015 1:04 am

Re: Yacy won't re-start

Beitragvon smokingwheels » So Jun 11, 2017 2:08 am

luc hat geschrieben:Indeed I have run again some tests with large crawl queues (around 100000 to 220000 files in the QUEUES folder), on a Debian Jessie with its file-max system setting value default set at 404027. But until now, after many stops and restarts of YaCy, including change of the memory settings, I did not reproduce the error you had.


My headless Debian 8 Jessie when you do cat /proc/sys/fs/file-max returns the value 100847.
This page has more information https://www.tecmint.com/increase-set-open-file-limits-in-linux/

If you set it to large it slows the file saving speed down.
smokingwheels
 
Beiträge: 137
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy won't re-start

Beitragvon eros » Mi Jun 14, 2017 8:51 am

@eros, I have made a big cleanup on all potential file handlers leaks I could find in YaCy codebase. I can not guarantee this will solve your specific restart issue, but it could help.

So if you have some time, do not hesitate to upgrade to latest GitHub sources.


Thanks Luc, I'll try it in a few days (I just started a new crawl and I think I'll let it finish before I start fiddling with that).

BTW: is there a way of automating a backup of the index? I tried looking at the "Process scheduler" but the index export operations don't appear there.
eros
 
Beiträge: 15
Registriert: Fr Mai 12, 2017 1:56 pm


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron