Multi-threaded access to SOLR index and RAID1 load balancing

Discussion in English language.
Forumsregeln
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Multi-threaded access to SOLR index and RAID1 load balancing

Beitragvon davide » Do Sep 24, 2015 1:25 am

We can expect that a consistent portion of the major high-end YaCy nodes out there with large indexes store their laboriously-crawled data into some sort of redundant RAID, to prevent data corruption worth months of crawling.

In my particular node, I have a medium-sized index with 21M records making up for 220GB of storage, mirrored on a two-disks software RAID1 driven by Linux md driver.
The md RAID1 driver is capable of splitting concurrent read requests across its component devices, thus increasing the read speed almost proportionally to the number of devices.
To take this advantage, however, md needs the requests to come from different threads. If this is the case, the amount of IOPS across the mirror can increase to appreciable values even for mechanical disks, maybe high enough for YaCy to provide responsive local results in a "realtime" delay.

However, running a YaCy search query on the local index does not appear to distribute the load across the RAID devices; one of the two disks receives 10 times more read requests than the other, as reported by `atop`. For this, it appears that YaCy (SOLR) performs most of the intensive index reads from a single thread, and doesn't take advantage of the full hardware potential, which could be multiple times higher on large RAID setups.

If this is correct, how could the issue be worked around?
davide
 
Beiträge: 78
Registriert: Fr Feb 15, 2013 8:03 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon smokingwheels » Fr Sep 25, 2015 4:20 pm

Have you tried pushing yacy into realtime
Experment with http://www.thegeekstuff.com/2013/08/nic ... -examples/
Create a syacy.sh to launch YACY
I think it puts it at a lower Priority than you set so just increase or decrease until desire level is found.
nice -10 ....
smokingwheels
 
Beiträge: 107
Registriert: Sa Aug 31, 2013 7:16 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon davide » Fr Sep 25, 2015 4:36 pm

Priority level is likely entirely unrelated to the amount of IO threads.
davide
 
Beiträge: 78
Registriert: Fr Feb 15, 2013 8:03 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon smokingwheels » Fr Sep 25, 2015 5:30 pm

PM the server address I will give you a bit of a load test please...
smokingwheels
 
Beiträge: 107
Registriert: Sa Aug 31, 2013 7:16 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon davide » Fr Sep 25, 2015 5:45 pm

No problem, here's a temporary address – will become public in a minute – tts.hwcharts.com:8090 .

Do to it whatever you desire, make it collapse if needed, so we can understand weaknesses.
davide
 
Beiträge: 78
Registriert: Fr Feb 15, 2013 8:03 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon davide » Fr Sep 25, 2015 6:01 pm

Nevermind, the server is offline for bugs in KVM.
I have no time for this now.

BTW, a moment ago it was online, and yacy refused connections from external IPs, only 192.* and 127.* were allowed.
davide
 
Beiträge: 78
Registriert: Fr Feb 15, 2013 8:03 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon davide » Fr Sep 25, 2015 7:47 pm

OK, I upgraded KVM and the VMs are up.

Still my question remains: does Yacy perform multi-threaded reads? Does it take advantage of software RAID?
Why is Flashcache so dear to some user (Botec, apparently), to even deserve a space in the documentation, if Yacy apparently isn't even able to properly use RAID?
Zuletzt geändert von davide am Fr Sep 25, 2015 7:52 pm, insgesamt 1-mal geändert.
davide
 
Beiträge: 78
Registriert: Fr Feb 15, 2013 8:03 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon smokingwheels » Fr Sep 25, 2015 7:50 pm

Try this...
nano crontab
*/1 * * * * apt-get update
*/whatever you think is a random number 1-59 * * * * apt-get update



ctrl x
Y
#acdc @smokingwheels
smokingwheels
 
Beiträge: 107
Registriert: Sa Aug 31, 2013 7:16 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon smokingwheels » Fr Sep 25, 2015 7:56 pm

davide hat geschrieben:OK, I upgraded KVM and the VMs are up.

Still my question remains: does Yacy perform multi-threaded reads? Does it take advantage of software RAID?
Why is Flashcache so dear to some user (Botec, apparently), to even deserve a space in the documentation, if Yacy apparently isn't even able to properly use RAID?


Hey my Dad was a Z80 programmer and found mistakes with rodney zakx..

When he found a new error he used to be like a little kid and pensel it in and tell me about it
smokingwheels
 
Beiträge: 107
Registriert: Sa Aug 31, 2013 7:16 am

Re: Multi-threaded access to SOLR index and RAID1 load balan

Beitragvon Orbiter » Sa Sep 26, 2015 11:11 pm

I don't know if I am doing that right, but this forum had two more postings which I just deleted because I believe their content was too emotional. One poster requested to continue in another thread which is now here: viewtopic.php?f=8&t=5684
Orbiter
 
Beiträge: 5778
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main


Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron