Planet - Solar System - Deep Space

Ideen und Vorschläge sind willkommen.

Beitragvon tinkerphone » Do Okt 16, 2014 10:04 pm

Since this is a wishlist, those are my wishes for the future:

1. Planet:
a) Augmented Crawling, with private & open mode. Private= protected private index. Open = can be distributed to the network
b) Cache of the most prominent searches + results within the yacy network.
c) Search frontend for the cache and to yacy peers
(no "real" searchengine!)

2. Solar System:
An extension to the planet. It adds the crawler & solr and db features. Planet + Solar System = What we know as YaCy - but in components

3. Deep Space:
A special index for the most rare results. You can launch a deep space probe which will search for specific information on the net. Its crawler which evaluates every visited page but indexes only those pages which hold the search pattern. A webgraph is build and used to determine which "galaxies" do not contain the search pattern. The starting vector towards the "dead galaxy" is blacklisted. If more then one probe is send, the blacklist can be used to avoid dead galaxies. Each probe is given a certain lifespan (number of links to follow)

Along with some other dings and dangs, this comes pretty near to my idea of a p2p search machine - and yacy is the nicest foundation i have found so far. :)
Re: Planet - Solar System - Deep Space

Beitragvon tinkerphone » Fr Okt 17, 2014 10:59 am

Oh ha!:
Orbiter hat geschrieben:Die Funktion ist schon da!
Bitte im Expert Crawl Start unter dem Abschnitt "Document Filter" den regulären Ausdruck für "Filter on Content of Document" benutzen.

Wenn du nur ein Wort "wort" filtern willst, dann ist der Ausdruck dort ".*wort.*". Wenn es zwei Wörter "wort1" und "wort2" sind, dann ist der reguläre Ausdruck ".*wort1.*|.*wort2.*". Reguläre Ausdrücke kannst du zur Vorbereitung eines geeigneten Filters unter /RegexTest.html testen.

This means that the function to "send deep space probes" is already there! If you use the document filter in the expert crawling section the crawler will only send those urls to the indexer which contain the phrase. It would be great if we could have separated indexes for those cases.

This could have following benefits:
1. Your "special" queries can stay confident, the separated index can be excluded from DHT
2. Very simple to verify results from this special query - they are in a separate index
3. Easy and understandable workflow if you want to crawl the complete specific domain where the hit was.
4. A mysterious sounding and very useful feature for the frontend
-> Sorry, no results found
-> "Send Deep-Net Probe".
You can get a notification to your email when the probe returns.
() Keep my query confident.
(x) Send me a link to my query with the notification.
