We can expect that a consistent portion of the major high-end YaCy nodes out there with large indexes store their laboriously-crawled data into some sort of redundant RAID, to prevent data corruption worth months of crawling.
In my particular node, I have a medium-sized index with 21M records making up for 220GB of storage, mirrored on a two-disks software RAID1 driven by Linux md driver.
The md RAID1 driver is capable of splitting concurrent read requests across its component devices, thus increasing the read speed almost proportionally to the number of devices.
To take this advantage, however, md needs the requests to come from different threads. If this is the case, the amount of IOPS across the mirror can increase to appreciable values even for mechanical disks, maybe high enough for YaCy to provide responsive local results in a "realtime" delay.
However, running a YaCy search query on the local index does not appear to distribute the load across the RAID devices; one of the two disks receives 10 times more read requests than the other, as reported by `atop`. For this, it appears that YaCy (SOLR) performs most of the intensive index reads from a single thread, and doesn't take advantage of the full hardware potential, which could be multiple times higher on large RAID setups.
If this is correct, how could the issue be worked around?