I deliberately blacklisted a few German (.de, .ch) hosts from "/CrawlResults.html?process=3" which resulted in my index size dropping from 4M to 3.9M, releasing about 1GB of hard disk space. These URLs were not fetched by my crawler, which has a whitelist filter to only allow 5 .com domains.
Now I wonder: what is stored in my hard disks? How many GB am I wasting for storing documents I don't want?
Since I drew up a table outlining the costs of progressive hardware upgrades, including hard drives purchases, storing documents I don't need is an extra parasitical cost.
Specifically, I need to know if there is a chance that the same happens the opposite way: that is, my documents being stored on others' computers. I don't trust others' equipment as much as I trust mine for protection against data loss. If my documents do end up on others' hard disks, will they still be preserved on my own disks? Or may they just be moved out of my server?