Some questions about using Yacy as a local indexer

Discussion in English language.
You can start and continue with posts in english language in all other forums as well, but if you are looking for a forum to start a discussion in english, this is the right choice.

Some questions about using Yacy as a local indexer

Beitragvon frderi2 » Mi Mär 09, 2016 5:52 pm

I came across Yacy recently and I must say what the project has achieved sofar is quite impressive for me. I'm trying to evaluate Yacy as a local file / local web indexer in an intranet scenario, because I see a very valid use case for such a project where you can mix both index data from internal web-based resources like websites, wiki pages, problem trackers, etc, and unstructured data from fileservers and present the results in one consistent UI.

Firstly, I understand that this use case is probably not the first focus of this project, so I can understand that some features, like data access rights for specific content and the subsequent omission from the results for certain users, will be unavailable from this search engine. So I'd like to limit the scope of the indexed data to information that should be available for all users of the implementation.

But even with this in mind, Yacy doesn't seem to be able to do what I want it to do, for two simple reasons :
-> When it comes to indexing the content of file servers in the LAN, I suppose I should use an smb:// link. However, no sane person ever makes file servers available without using a password, and I haven't found any way yet to make yacy authenticate to a remote file server with a login and password combination.
-> When it comes to internal wiki pages, hardly anyone ever setting up an internal information system in a multi-user environment will set this up without using some kind of authrorization to the content. Again, I've yet to find a way to make handle this kind of authorization.

Maybe this functionality can be implemented in a future version? It's probably not very hard to do since all other indexers that have this kind of use case do it and would mean the difference between Yacy being usable or not in an intranet setup. Again, if there's another way to do this, I'd like to learn about it, the Wiki wasn't any help in this regard.

Thank you!
Beiträge: 1
Registriert: Mi Mär 09, 2016 5:27 pm

Re: Some questions about using Yacy as a local indexer

Beitragvon luc » Di Mär 22, 2016 3:28 pm

Hello, as far as I know, YaCy is currently designed to index resources in a given network (internet, intranet, custom...), and to provide access and search trough its index to all users inside this network.

But inside an intranet, I guess the following configuration example should work (I did not tested it) and could fit your needs :
- one wiki instance
- one issue tracker
- one YaCy instance having full access without authentication to wiki and issue tracker resources
- wiki, tracker and YaCy instances access is restricted to intranet users by the same CAS SSO server

But of course things become more complicated if you want multiple YaCy peers to run on each intranet user computer, or if full access can not be open between resources servers and YaCy server, or if different credentials have to be applied to each resource... I think adapting YaCy to these needs represent a non negligible amount of work.
Beiträge: 313
Registriert: Mi Aug 26, 2015 1:04 am

Re: Some questions about using Yacy as a local indexer

Beitragvon Orbiter » Mi Mär 30, 2016 9:30 am

Hi frderi2,
authorization/authentication questions regarding smb and/or wiki servers have been discussed many times in the past. The problem is, that any information that YaCy would be able to crawl using an authentication method would be leaked to users without that authentication, if they are able to use YaCy to find at least the link to those resources. There would be the need that YaCy provides the same authentication methods as the resources which had been accessed by YaCy. Thats a problem and not easy to solve.

However, there are solutions which you can build around YaCy: i.e. you can mount disk drives on the machine which is running YaCy you you can then crawl the paths to these mounted drives using the file:// url. Providing access to your server with YaCy on it would be equal to granting access to the drives. That is something you can set up as an administrator around YaCy. Users with the same disk drives mounted to the same path would then be able to access the content which they find in YaCy. That works also on windows (i.e. file://z:\data..). In linux the path starts with three '/', like file:///media/disk3
Beiträge: 5799
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Zurück zu English

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast