Help: We need a big web-site for test purposes!

Ereignisse, Vorschläge und Aktionen

Help: We need a big web-site for test purposes!

Beitragvon fherb » Fr Jan 10, 2014 8:57 pm

Dear colleagues!

I plan a test of YaCy on ARM hardware in a special test environment. The aim is to get reproducible results independing on provider and server load. YaCy on ARM boards should show time performance and critical situations.

What I need is the content of some big web-sites. In order to hold this simple, these web sites should not come from a content management system or similar dynamic systems. It should be consist of static content.

The second way is to crawl a dynamic content system in order to produce a static mirror content for these test purposes.

Can anybody help?

Either with an image of some static content or with some acceptance to crawl his content via Internet for one time to get such an image.

Best regards, Frank.

Reason for this: Read this: http://forum.yacy-websuche.de/viewtopic.php?f=15&t=3363&start=50#p29424
fherb
 
Beiträge: 111
Registriert: Di Nov 26, 2013 10:02 am
Wohnort: Dresden (Germany)

Re: Help: We need a big web-site for test purposes!

Beitragvon Yududi » Fr Jan 10, 2014 10:37 pm

1 HTML page with a lot of content or several pages with a lot of content?
Maybe just create a textfile based on random words.
Yududi
 
Beiträge: 64
Registriert: Di Dez 10, 2013 12:30 pm

Re: Help: We need a big web-site for test purposes!

Beitragvon fherb » Fr Jan 10, 2014 10:58 pm

:D

Thanks! Yes. This is a cheap possibility. ;)

But this would not be a copy of a typical search, parse and index process.

At first, we need not only documents of sensless content, we need also links between this content. So the program to produce this must have more intelligence in order to produce typical links as in blogs or forums. The test should use typical content with typical content structures. Not a structure by accident.

And, I hope, that the search result of a modern search machine is not only a cheap statistic result of words in an ocean of words in an ocean of pages. (1)

I think, we should test with human content. :ugeek:

(1) Maybe YaCy is not more in the moment. So we should optimize the indexing, the kind of distribution between Peers and the kind of interpretation of human search strategy and what kind of result the human user expects.
fherb
 
Beiträge: 111
Registriert: Di Nov 26, 2013 10:02 am
Wohnort: Dresden (Germany)

Re: Help: We need a big web-site for test purposes!

Beitragvon fherb » Fr Jan 10, 2014 11:06 pm

An additional thought to (1): What I know about YacY is, that YaCy collects an index of words of pages. But not yet an index of relevance. Or? This is a big difference to Google. And I think: This is the most difficult part of a modern search engine.
fherb
 
Beiträge: 111
Registriert: Di Nov 26, 2013 10:02 am
Wohnort: Dresden (Germany)


Zurück zu Mitmachen

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 1 Gast