Commit 136e8f6 and 1-char queries

Forum for developers

Commit 136e8f6 and 1-char queries

Beitragvon davide » Sa Okt 17, 2015 12:18 am

Thanks Orbiter for your continued efforts.

I was having a glimpse at the latest commits and I found a dubious one: commit c737ff2 (source/net/yacy/search/query/QueryGoal.java @ line 187) shows that, by now, if queries are composed of both multiple-chars strings and 1-char strings, then all 1-char strings are stripped off the query.
I never learned java and I don't know Yacy, so pardon me. But is this new algorithm going to be unconditionally applied to every query, or is it somewhat more selective / specific? In the first case, it would no longer be possible to search for, eg, "Pentium 4" or "OS X" or "The incredibles 4".
davide
 
Beiträge: 84
Registriert: Fr Feb 15, 2013 8:03 am

Re: Commit 136e8f6 and 1-char queries

Beitragvon Orbiter » Sa Okt 17, 2015 9:49 am

That change was made by me because one of my customers are running YaCy in their intranet for file search and they had truble finding files by their file name using an exact copy of the file name. The problem is, that the file indexing stripes aways 'rubbish' from file names (i.e. single numeric characters) and it was not possible to find the same files again with a copy of the file name.

davide hat geschrieben:more selective / specific? In the first case, it would no longer be possible to search for, eg, "Pentium 4" or "OS X" or "The incredibles 4".

No, this is actually less selective, you would be able to find "The incredibles 4" but also "The incredibles 3", "The incredibles 2" and "The incredibles" by just searching for "The incredibles 4". This change applies a special way of fuzzyness and I believe a wanted one.
Orbiter
 
Beiträge: 5798
Registriert: Di Jun 26, 2007 10:58 pm
Wohnort: Frankfurt am Main

Re: Commit 136e8f6 and 1-char queries

Beitragvon davide » Sa Okt 17, 2015 10:23 am

Thanks for pointing out that this behavior is not all due to commit 136e8f6, as the indexer itself already stripes 1-char elements, too. Still, the overall behavior is that Yacy yields irrelevant results for those queries where a 1-char element is indispensable to the meaning of the query.
Whoever would expect to receive relevant results by searching for queries like those reported below would not get relevant results.

Examples:
  • Y chromosome
  • P S waves
  • World War 2
  • 9 11
  • Ford T
  • Pentium 4
  • OS X
  • X factor
  • The incredibles 4
davide
 
Beiträge: 84
Registriert: Fr Feb 15, 2013 8:03 am


Zurück zu YaCy Coding & Architecture

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 2 Gäste