Yacy Performance steigern

Hier finden YaCy User Hilfe wenn was nicht funktioniert oder anders funktioniert als man dachte. Bei offensichtlichen Fehlern diese bitte gleich in die Bugs (http://bugs.yacy.net) eintragen.
Forumsregeln
In diesem Forum geht es um Benutzungsprobleme und Anfragen für Hilfe. Wird dabei ein Bug identifiziert, wird der thread zur Bearbeitung in die Bug-Sektion verschoben. Wer hier also einen Thread eingestellt hat und ihn vermisst, wird ihn sicherlich in der Bug-Sektion wiederfinden.

Yacy Performance steigern

Beitragvon netsearch » Mi Okt 30, 2013 11:06 am

Hallo

Wenn der Index grösser wird und die Performance nicht mehr optimal ist - was sind eurer Erfahrung nach die wichtigsten Faktoren um die Performance zu steigern?

Was kann bei der Konfiguration noch optimiert werden damit es bei grossen Indexes besser läuft? Habe bei den meisten Einstellungen vermutlich noch Standardwerte.

RAM, CPU, oder was?

Wo lohnt es sich am meisten zu investieren?

Danke!
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon sixcooler » Mi Okt 30, 2013 12:57 pm

Hallo.

ich denke Deine Frage kann man klar mit 'mehr Arbeitsspeicher' beantworten.

Wieviel Speicher hast Du denn deinem Peer schon zugewiesen?
Und wo merkst Du das es die Performance singt?

Cu, sixcooler.
sixcooler
 
Beiträge: 494
Registriert: Do Aug 14, 2008 5:22 pm

Re: Yacy Performance steigern

Beitragvon netsearch » Mi Okt 30, 2013 3:47 pm

Ich habe 16GB auf der Maschine und ca 8GB für Yacy freigegeben.

Das ist langsam:

- Bei den Suchanfragen tröpfeln die Resultate auf den Bildschirm.

- Bei wenig Resultaten werden diese oft nicht angezeigt, wenn dieselbe Suche noch einmal ausgeführt wird kommen die Resultate dann meistens.

- Das Neu-Starten von Yacy dauert ewig (wirklich ewig....).

- Im Backend ist auch nicht immer alles fix.

- Ab und zu reagiert Yacy für ein paar Minuten fast gar nicht.

Danke für weiteren Input!
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon smokingwheels » Di Nov 12, 2013 4:14 pm

Nun, mein P4 läuft gut. keine Eier in einen Korb.
Dont viel Speicher zuweisen versuchen DEFAULT Setting Erste und Build Data Base mehr PEERS ist, was gebraucht wird. Überqueren install Java. Löschen Java.exe Mit Symbol.
I how Google Translate..
smokingwheels
 
Beiträge: 136
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy Performance steigern

Beitragvon netsearch » Di Nov 12, 2013 4:23 pm

smokingwheels hat geschrieben:Nun, mein P4 läuft gut. keine Eier in einen Korb.
Dont viel Speicher zuweisen versuchen DEFAULT Setting Erste und Build Data Base mehr PEERS ist, was gebraucht wird. Überqueren install Java. Löschen Java.exe Mit Symbol.
I how Google Translate..


Can you pls post that in english? unfortunately the machine translation is not understandable....
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon smokingwheels » Di Nov 26, 2013 3:35 pm

netsearch hat geschrieben:
smokingwheels hat geschrieben:Nun, mein P4 läuft gut. keine Eier in einen Korb.
Dont viel Speicher zuweisen versuchen DEFAULT Setting Erste und Build Data Base mehr PEERS ist, was gebraucht wird. Überqueren install Java. Löschen Java.exe Mit Symbol.
I how Google Translate..


Can you pls post that in english? unfortunately the machine translation is not understandable....


Ok from memory after YaCa crash in windows P4.

Unless SSD Drives in YaCa server, Do not increase Java memory so large to improve performance.
Note: If have Normal hdd look at a reformat/partition with 512 byte sectors because most of YaCy disk activity is 1024 bytes. eg Normal Format is 4096 bytes per sector.

I have found that Java.exe with Icon is loading to perform an update, I deleted Java.exe with Icon Under a search window. YaCy not affected and no automatic update ICON Gone.

The CPU Speed is not the problem its the Hard disk access time. Typically an old P4 server with SCSI Disks can have approx 4 times less Latency time for the data than with a SATA disk drive So try to build your search index first, BACKUP then play with memory setting on JAVA.

Happy to be proven wrong.
smokingwheels
 
Beiträge: 136
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy Performance steigern

Beitragvon fherb » Mi Dez 18, 2013 8:14 pm

My experiences are similar.

When you have a very slow processor (like my ARM at RaspberryPi) so you see the processor load around 100% and this is the bottleneck. Otherwise look for your HDD io access. In Linux you can use iotop. Write access is not so intensive. But read.

When this HDD access brakes to much and the web reaction is in slow motion following can help to make the system more stable: (I tried it on a dual-core ARM system and a virtual server) Increase the busy-sleep time of crawling and DHT distribution and set performanceIO to a higher value. performanceIO is a %-value and said how much time is used for io processes. So the system gets more time to work with the hard disk and to answer on web front-end. I don't know if this is the best way, but it seems to help when the system starts to lame. But, what you not get, is a higher crawl and DHT distribution rate. Then you should think about a faster HDD (SSD of course) or HDD-Interface when it is not the newest one. But I'm not sure if the costs for changing the hardware are in a good relation to the benefit of YACY for your system.
fherb
 
Beiträge: 111
Registriert: Di Nov 26, 2013 10:02 am
Wohnort: Dresden (Germany)

Re: Yacy Performance steigern

Beitragvon netsearch » Do Dez 19, 2013 11:09 am

In the meantime I have bought a new server for yacy with 84GB ram and processor with 8 cores.

Performance when searching is now better.

For the harddisks - I don't know how large SSD are around, because the index is huge.

But crawling is not really faster, I still would like to speed that up...
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon smokingwheels » Do Dez 19, 2013 12:14 pm

netsearch hat geschrieben:In the meantime I have bought a new server for yacy with 84GB ram and processor with 8 cores.

Performance when searching is now better.

For the harddisks - I don't know how large SSD are around, because the index is huge.

But crawling is not really faster, I still would like to speed that up...


Wow whats your portal, I will use yours http://?
smokingwheels
 
Beiträge: 136
Registriert: Sa Aug 31, 2013 7:16 am

Re: Yacy Performance steigern

Beitragvon Orbiter » Do Dez 19, 2013 12:16 pm

when you have more RAM in the server, you must assign that in /Performance_p.html, otherwise YaCy does not use that.
Crawling cannot be enhanced much with RAM since crawling obeys speed rules applied by the response time of the crawled servers. You can speed up crawling if you crawl a lot of different remote servers at the same time. YaCy can index tenthousands of documents per minute, but only if the number of remote servers is large.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Yacy Performance steigern

Beitragvon netsearch » Do Dez 19, 2013 12:54 pm

Hi

Of course I have assigned the RAM to yacy ;-)

Regarding Crawling: that means by starting more crawing tasks at the same time it will be faster?

At the moment I have about 10 crawlers running - so should i increase that to 30 or so crawlers?

Thanks
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon fherb » Fr Dez 20, 2013 12:19 am

84GB???

Do you design your own motherboards? ;)

But, Orbiter, should YACY be really designed to work on such fat servers? When the power of YACY is the network of many users, YACY should get the most power when we get it from a lot standard PCs in a big network (and don't forget that more and more Laptops substitute PCs at home). Or what, when YACY could use the big amount of NAS which users installed the last years. That are really wide distributed servers!

I read across the forum the last 3 weeks. But what I could find was, that a big index don't need a big HDD. It needs a lot of RAM. Or is this a wrong cognition? Maybe it is a fast computing solution to have the index in a Java heap. But users have normally some 100GB HDD space free but needs the RAM for theirs work. I think that not more than 1/8...1/4 of RAM should be used in background from such services like YACY. And when we have a typical PC or Laptop so this is a range of 250MB...2GByte.

Maybe a special YACY search server can have the full 8...32GByte to use only for YACY. But when we need to update a computer to 84GByte to give the place that YACY can use HDD and LAN performance, ... That is not the right relation in my opinion.

Best greetings,
Frank
fherb
 
Beiträge: 111
Registriert: Di Nov 26, 2013 10:02 am
Wohnort: Dresden (Germany)

Re: Yacy Performance steigern

Beitragvon Orbiter » Fr Dez 20, 2013 9:43 am

While I constantly try to keep the memory and CPU demand low (I partly develop and test on a 2006 macbook) there might be use cases for very large memory demands. Memory may speed up search performance and it will ensure the capability for large indexes while the p2p architecture ensures unlimites scalability.
Which means: @fherb is right demanding a non-high-cpu-ram ability favoring p2p technology while @netsearch is right to enlarge his capability for high load/high document number.
I really like tests on such strong hardware because it will tighten the professional application of YaCy. It is costly to do such tests and it is very valuable to have such users and their experience reports here.
The other end of the performance options is hardware like RaspberryPi which I also would like to see as a YaCy plattform. But it is not right to demand that YaCy should be made for a single way of configuration and a 'typical' class of home computers.
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Yacy Performance steigern

Beitragvon netsearch » Fr Dez 20, 2013 12:26 pm

I started with yacy on a smaller server but I've reached limits very soon because I try to collect as many urls as possible (basically I don't set any limits).

I think for a restricted index with a few thousand pages/urls it will be no problem to run that on any regular machine.

I am running the peer called zerberos which has up to now indexed more than 42 million pages - there are not so many peers that have so many pages.

And yes, I have invested a few thousand for the server, just because I am interested to see what happens when the index grows bigger :D

Orbiter, if you have any questions about the system or suggestions for improvement of spidering speed then don't hesitate to contact me.

Also I posted a question some time ago - I wonder if a direct access to the solr server from java would be possible, maybe you know that?

And I would have some more technical questions on what information can be extracted....
netsearch
 
Beiträge: 11
Registriert: So Aug 28, 2011 7:57 pm

Re: Yacy Performance steigern

Beitragvon zottel » Do Jan 30, 2014 2:56 pm

Is it somehow possible to have yacy run reliably when the index becomes larger?

I've given yacy 3500 MB RAM on my VPS (need the rest for other stuff running on the server).

I allow remote crawls and DHT and have yacy index everything I browse using the Gresemaonky script, so the index will never stop growing. Currently, there are about 9 million documents and 31 million citations in the index.

I'm restarting yacy four times a day using a cron job. Plus, I have it delete documents older than 28 days every day.

This has worked well for a month or so, but not anymore. yacy generally works well for about half an hour or maximum an hour after it was started.

Then viewing pages of the admin interface still works, and search works initially, but when I try to switch to the second page of results, the first page is shown again.

Then more and more short memory cycles are showing up in the log, then the first exceptions for not enough Java heap space start coming up. After a while, the log consists almost exclusively of "W 2014/01/30 14:43:51 COLLECTION d[] is empty, iid=…" lines.

When the next restart is reached, in 75% of the cases a clean shutdown is not possible anymore, and the stop script kills yacy.

This has at least worked better until a week ago or so? Until that point, yacy used to stop the crawlers before memory became too low, so it at least stayed more or less usable. (Though searches often still didn't work anymore.) Now, yacy very seldomly stops the crawlers, and when it does, it's much too late.

What can I do to keep my index at a size my VPS can handle? I.e. as big as possible so that yacy can at least run without problems for six or seven hours or so? Is there a rule of thumb how many documents yacy can handle with a certain memory size?

Why is it always working well at the beginning, but stops doing so later? What is eating up so much memory over time? It seems to me that crawling is the biggest problem here, but why would I run a search engine if I can't have it crawl? :-)

Are there any optimizations I could do?
zottel
 
Beiträge: 51
Registriert: Mi Jan 16, 2013 3:04 pm

Re: Yacy Performance steigern

Beitragvon David » So Feb 02, 2014 11:32 pm

The bigger your index grows, the more ram it needs. There's nothing you can do about it. You probably should stop crawling new pages before it's getting too big, and uncheck "Index Receive" in the network configuration: http://localhost:8090/ConfigNetwork_p.html. If you want to index additional pages, you either have to upgrade your RAM or start a new index on another computer.

zottel hat geschrieben:I'm restarting yacy four times a day using a cron job.

Wow. This should not be necessary. Under normal circumstances a healthy peer should be able to run for several days, weeks or even months without the need for restarting.

zottel hat geschrieben:Is there a rule of thumb how many documents yacy can handle with a certain memory size?

I'm not 100% sure, but as far as I know, with 15GB of RAM you should be able to maintain an index with 50'000'000-60'000'000 links. So with 3.5GB you probably can run an index with 10'000'000-12'000'000 links. But as I said, I'm not really sure about this. It depends on many different factors.
David
 
Beiträge: 170
Registriert: Di Mär 05, 2013 5:35 pm

Re: Yacy Performance steigern

Beitragvon davidm » Mo Feb 03, 2014 4:30 am

Is there really no mechanism to automatically remove the oldest/least accessed data once we approach the upper limits of the server? One would think this would be the way to do things and that it would be important to keep the node stable. I can't see a use case for where we should prefer a crash or lockup over removing the oldest data... unless I miss something.

On a positive note yaCy is one of the most beautiful and well designed projects I have ever seen. The web admin is simply amazing. You can tell there are some people involved in the project who really care about it.
davidm
 
Beiträge: 6
Registriert: So Feb 02, 2014 6:27 am
Wohnort: US

Re: Yacy Performance steigern

Beitragvon anonufe » Mi Feb 05, 2014 10:37 pm

The requested option does not exist directly, but it is e.g. possible to remove all entries from the index which are older than a specified time. I guess that makes quite sense to drop search results older than one year, for example. This can be acchieved within two steps:

1.) Go to "http://localhost:8090/IndexDeletion_p.html" and delete entries older than a specified age.
2.) At "http://localhost:8090/Table_API_p.html" you find this deletion as a recorded action and you've got the possibility to set it as a repeated action, e.g. after every start.
anonufe
 
Beiträge: 2
Registriert: So Sep 15, 2013 10:50 am

Re: Yacy Performance steigern

Beitragvon zottel » Fr Feb 07, 2014 3:26 pm

I had done that already and even daily removed everything older than 28 days. :-)

But I'm not sure if it might have been a problem of config leftovers etc., because a few days after my post, I completely dumped my yacy installation and everything in it to start from scratch.

Of course, the index on my peer is not even half as large as it was before, but yacy has been running like a charm for a whole week now, and it's still fast and everything works.

When I first set up yacy a year ago or so, I immediately gave it a number of very big crawls, and the index was quickly large enough to make it unstable. So I gave up on it, later tried again, but kept my config and only deleted the index. I then set up periodical index cleaning, and that worked for a while, but still the index became too large to handle, it seems. And: At some point when I was originally trying to get it to work for more than a few hours, I had switched to Generation Memory Strategy and had also fiddled with a lot of other settings, hoping to make it work. These changes were never changed back, and when I tried to switch back to Standard memory Strategy in January, it didn't work, it always set itself back to Generation Memory Strategy.

So it might be that something was very wrong with my settings, too.

Now I'm back to defaults, and at least at the current index size of about 4 million documents, everything works very smoothly. I didn't allow remote crawls this time, so the index doesn't grow too much, too.

I'm now indexing the public parts of the Red Matrix ( https://redmatrix.me ) daily, have yacy crawl everything I visit, and accept DHT transfer. Let's see how large the index can become now that I'm back to defaults. And I think I'll add periodic deletion of old documents, anyway, maybe it will never grow too large then.
zottel
 
Beiträge: 51
Registriert: Mi Jan 16, 2013 3:04 pm

Re: Yacy Performance steigern

Beitragvon Orbiter » Fr Feb 07, 2014 4:50 pm

davidm hat geschrieben:One would think this would be the way to do things and that it would be important to keep the node stable.

I see that this feature is now really missing to operate YaCy on limited devices like a RPi. There is not a single reason that this feature is not yet realized but many:
- time for development
- unanswered deletion strategies (delete least accessed, oldest?)
- a missing architecture for the deletion (two databases - RWI and Metadata/Solr - must be cleaned in balance and efficiency for that is not easy), and
- the philosophical contradiction (a search engine which rejects censoring deletes it's own data).

There are also some administration questions, like
- should 'auto-delete' be a default setting if resources are not available (if not set by default, most people will not enable it so it does not work as peer-protection)
- should the limit be set by remaining space ("df ." does not work on all systems) or to-occupy-space (requires frequent counting of all file sizes in DATA)
Orbiter
 
Beiträge: 5792
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Yacy Performance steigern

Beitragvon davide » So Mai 24, 2015 11:39 pm

This is an idea I was actually going to propose. Independently of how big is the "muscle", the server will unavoidable hit its hardware limits sooner or later, unless the administrator undertakes the tedious task to periodically check memory / disk consumption and clear the index accordingly.

Has there been any progress with this, so far?
davide
 
Beiträge: 84
Registriert: Fr Feb 15, 2013 8:03 am

Re: Yacy Performance steigern

Beitragvon davide » So Mai 24, 2015 11:44 pm

@netsearch
Your hardware configuration is interesting. I'm going to buy something similar next week (already have bought 12 HDD :)

My heuristically risen concern is: I no longer see your host among the YaCy network. Did you take it offline for some problem or defect you found in YaCy?
davide
 
Beiträge: 84
Registriert: Fr Feb 15, 2013 8:03 am


Zurück zu Fragen und Antworten

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 5 Gäste