same sku diff id (you too)?

Forum for developers

same sku diff id (you too)?

Beitragvon reger » Di Jan 24, 2017 11:35 pm

Recently I notice redundant search results in list.

I'm curiouse if you notice similar?

Description of my observation:
- URL = sku is the same
- ID is different (otherwise it wouldn't show up as redundant result)
- all belong to dht collection (so not my own crawls)

on search and afterwards looking at the metadata on test peer 1 brought
Code: Alles auswählen

ID                     SKU
r6aMoSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=10&calView=2
AhblkSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=11&calView=2
WbA7gSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=14&calView=2


same search on a 2nd peer
Code: Alles auswählen

ID                    SKU
r6aMoSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=10&calView=2
AhblkSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=11&calView=2
WIIk_SiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=13&calView=2
69ojMSiLDDWi  https://www.uni-stuttgart.de/hkom/termine/index.html?calYear=2014&calMonth=11&calDay=12&calView=2


P.S. local recrawling brings ID=AhblkSiLDDWi
reger
 
Beiträge: 45
Registriert: Mi Jan 02, 2013 9:23 am

Re: same sku diff id (you too)?

Beitragvon luc » Do Jan 26, 2017 9:46 am

Hi reger for now I didn't notice that behavior on my peers. I also tried with search terms related to Stuttgart Universität website, in P2P and then local searches but did not get duplicates...
luc
 
Beiträge: 276
Registriert: Mi Aug 26, 2015 1:04 am

Re: same sku diff id (you too)?

Beitragvon reger » Do Jan 26, 2017 11:34 pm

After more debugging I found, differences in new calculated hash and received hash happens. Not very frequent but one or two occurence approx. during 4 to 5 searches (from different peers and versions).

Most often it is only the difference in the last hash character coming from hashing different protocols (http instead of the actual https) like 1st example
Code: Alles auswählen
newCalculated    received Hash                URL example
ZJRfs4eCdSU8 ZJRfs4eCdSU4  https://www.land.nrw/de/landesregierung/staatssekretaerinnen-und-staatssekretaere/ludwig-
hecke

But also found/received hashes differing from the supplied URL in the beginning part, like the next 2 examples
Code: Alles auswählen
kSxEbrYaIN6S 5fARtrYaIN6S  http://permaculturenews.org/forums/index.php?threads/hello-from-northern-spain.15648/=


Code: Alles auswählen
TD48LEckm6GY BGXH9Eckm6GY http://forum.detik.com/ridwan-kamil-lelang-kaus-di-twitter-untuk-bantu-bobotoh-t1064476p2.html?s=1d4ae3ab77d03b574cc833038032f231


So far the findings from the test session. With this it looks random and spread over Numeros peers. Looking at the URLs I see recognize only all have at least one = in search part (at least the 15 I checked....)

P.S. Some time ago I added already a check in "URIMetadataNode(final SolrDocument doc)"... looks like that didn't prevent this.
reger
 
Beiträge: 45
Registriert: Mi Jan 02, 2013 9:23 am


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste

cron