yacybot indexing my own data?

Keine Scheu, hier darf alles gefragt und diskutiert werden. Das ist das Forum für YaCy-Anfänger. Hier kann man 'wo muss man klicken' fragen und sich über Grundlagen zur Suchmaschinentechnik unterhalten.
Forumsregeln
Hier werden Fragen beantwortet und wir versuchen die Probleme von YaCy-Newbies zu klären. Bitte beantwortete Fragen im YaCy-Wiki http://wiki.yacy.de dokumentieren!

yacybot indexing my own data?

Beitragvon oneaty » Di Apr 07, 2015 3:57 pm

Hi,
Is Yacy indexing my own data?

I'm not an expert, but that's the impression I got from this Apache log:

Code: Alles auswählen
localhost:80 127.0.0.1 - - [07/Apr/2015:09:15:36 -0300] "POST /yacy/query.html HTTP/1.1" 404 377 "-" "yacybot (/global; amd64 Linux 3.13.0-48-generic; java 1.7.0_75; America/en) http://yacy.net/bot.html"
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Di Apr 07, 2015 5:58 pm

Hi,

no, it's trying to contact another YaCy instance at this hostname/IP-Adress. The

Code: Alles auswählen
POST /yacy/query.html

is specific for YaCy <=> YaCy communication.

Greetings from germany

Steve
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » Di Apr 07, 2015 7:18 pm

Thanks for the quick answer, Steve.

But I don't have another Yacy instance running on my network. Keep note that my only Yacy instance runs 24 x 7 and this is the first time I get this kind of entry in Apache (well, since I started monitoring Apache, which is since some 5 days ago). Shouldn't Apache be logging this attempt all the time?

Also, I don't understand two things from that entry.

Why is it on port 80 if my Yacy instance is listening to another port?

And what is this url http://yacy.net/bot.html referenced in that log entry about?

Browsing that url shows some instructions as to how to avoid yacybot to crawl "my website". But what website exactly is this talking about? My Yacy peer?
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Mi Apr 08, 2015 1:41 pm

>>> Shouldn't Apache be logging this attempt all the time?

definitely!

>>> Why is it on port 80 if my Yacy instance is listening to another port?

Maybe it's a bug. I've seen these connections in some of my logfiles too, although my YaCy-Instance is running on Port 6070.

>>> Browsing that url shows some instructions as to how to avoid yacybot to crawl "my website". But what website exactly is this talking about? My Yacy peer?

Exactly - It's some informational content on that page for those who don't know YaCy and how to stop the Bot crawling your pages.
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » Do Apr 09, 2015 10:44 pm

When I ask my Yacy instance to crawl someone's website, wouldn't it try to do that on that host's port 80? That host has probably nothing to do with Yacy, and it probably listens to the standard port 80.

According to this, that Apache log entry would be the result of someone else's Yacy instance crawling "my website".

But... what website?

Although I recently created a website - it has only a week - it is still not public, in the sense that not even Google, who knows everything, has already indexed it, neither do I advertised it in whatever means, just I and two persons know about it for testing purposes. Besides, it is not even listening to port 80.

So, this yacybot log entry is still puzzling me: who was trying to crawl my host and why?
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Sa Apr 11, 2015 3:12 pm

Look for the IP-Adress in your Apache Log where the traffic originates from. 127.0.0.1 :-)

It's definitely your own Machine unless you are using port forwarding tools who don't hand over the original IP-Adress from the source such as rinetd or WinGate in Non-NAT mode.
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » Sa Apr 11, 2015 8:37 pm

:shock: You're right, I haven't noticed that, I'm still getting used to Apache...

So, my own Yacy instance is crawling my own network?

Is there somewhere where I can learn more about this feature?

I want to know how to configure or even disable it, and how deep it goes within the network.
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Sa Apr 11, 2015 10:38 pm

I believe it's a bug. Feel free to open a ticket at our Bugtracker
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » Mo Apr 13, 2015 2:34 pm

LA_FORGE hat geschrieben:Look for the IP-Adress in your Apache Log where the traffic originates from. 127.0.0.1 :-)

It's definitely your own Machine unless you are using port forwarding tools who don't hand over the original IP-Adress from the source such as rinetd or WinGate in Non-NAT mode.

I'm not sure if it still originates from my machine.
By looking at Apache's modsecurity log, the entries regarding the same yacybot events refer to different hostnames.
For example,

954bbe540b813c9059.yacyh
af6eca1f9eeb987775.yacyh
a802157d2faa32b74d.yacyh

to name a few.

So, I repeat my question: what those hosts ( other yacy peers) are trying to do in my machine?
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Sa Apr 18, 2015 5:36 pm

These addresses are used as an yacy-internal identifier and are not resolvable/reachable on the internet. The modsecurity module "sees" this kind of traffic because the YaCy P2P-Communication isn`t encrypted. I think the devs are on it to implement secure YaCy <=> YaCy communications in future releases.

I`m working as an security engineer and I'm participating over 6 years in the YaCy Community. I`m able to assure that YaCy doesn't have any spy or phonehome features built in. When you sceptical about your security I recommend to install an intrusion detection system such as Snort.
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » Mi Apr 22, 2015 3:42 pm

LA_FORGE hat geschrieben:These addresses are used as an yacy-internal identifier and are not resolvable/reachable on the internet.


However, they do uniquely identify a host within yacy network, right? So what I'm trying to understand is what are those hosts trying to do, and this does not necessarily mean I'm suspecting of someone's bad behaviour. I would just like to understand it, maybe know which yacy functionality from other peer results in those kind of connections. If I want to customize modsecurity, I have to make sure I know which are the good connections, which aren't. Just that.

LA_FORGE hat geschrieben:I`m working as an security engineer and I'm participating over 6 years in the YaCy Community. I`m able to assure that YaCy doesn't have any spy or phonehome features built in. When you sceptical about your security I recommend to install an intrusion detection system such as Snort.


By no means was I sugesting the existence of such features in yacy. My line of thought goes more in the direction of realizing which tweaks I need to do either in my yacy peer and modsecurity.
I'm very glad to hear that you and others are working on to improve yacy even more, and security, those days, is certainly an area that deserves attention.
Please take note that I'm a strong believer of yacy and it's not for other reason that I'm running 24 x 7, for more than six months now, my own peer in senior mode.
Finally, thanks for the tip on Snort. I didn't know that tool and will consider using it.
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Sa Apr 25, 2015 5:22 pm

Sorry for the misunderstanding. You're welcome. I'm a strong believer of yacy, too. Where are you from? I'm from germany.
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm

Re: yacybot indexing my own data?

Beitragvon oneaty » So Apr 26, 2015 2:50 pm

No need to be sorry, I guess.

I just want to understand what's going on, so if you could answer my last questions, that I repeat below, I would appreciate:

However, they do uniquely identify a host within yacy network, right? So what I'm trying to understand is what are those hosts trying to do, and this does not necessarily mean I'm suspecting of someone's bad behaviour. I would just like to understand it, maybe know which yacy functionality from other peer results in those kind of connections. If I want to customize modsecurity, I have to make sure I know which are the good connections, which aren't. Just that.
oneaty
 
Beiträge: 66
Registriert: Mo Feb 04, 2013 12:47 pm
Wohnort: Rio de Janeiro

Re: yacybot indexing my own data?

Beitragvon LA_FORGE » Do Apr 30, 2015 11:38 am

Those hosts are communicating over a distributed hashtable algorithm. Out-of-the-box every installation of yacy participates in our global network 'Freeworld' unless you configure another profile via the page http://localhost:8090/ConfigBasic.html
Since the traffic isn't encrypted at all, you can simply monitoring this traffic with Tools like TCPDump or Wireshark to understand what's going on.
LA_FORGE
 
Beiträge: 542
Registriert: Sa Okt 11, 2008 5:24 pm


Zurück zu Hilfe für Einsteiger und Anwender

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 3 Gäste