Parametric Search Appliance

 

Thunderstone Search Appliance Manual

Date Bias

Syntax: group of drop downs and an optional date-picker

The Date Bias settings control how the relative date (age) of a document affects its result ranking. The older (or farther into the future) a document is, the lower its rank will be.

Weight is the importance of date for ranking, relative to other rank factors. It defaults to off, i.e. date will have no significance. Note that search users can override this setting on the Advanced search form.

Half-life is a measure of how fast the rank "decays" with document age. It is the time it takes for this rank factor to decrease to half of Weight. (The factor will reach 0 for an "infinitely" old document.) This can be tuned according to the profile's data set: an often-crawled news profile that has new articles appear hourly or daily, for example, might benefit from a Half-life of 1 day, since its documents "age" over that time frame. On the other hand, a crawl of a company-wide document archive going back a decade or more might work best with a Half-life of 1 year or even 5 years. The default is 1 year.

Field is the field to use for computing a document's age. It defaults to Modified (the document's Last-Modified date according to the server), but can also be set to id (the last time the crawl saw the document change). It can also be set to use a Parametric Date field. Note that using a Parametric field will require that field be set Sortable (here). The Field chosen should also be set as one of the Compound Index Fields (here) for best results; this will help ensure faster searching and more accurate result counts.

Anchor is the reference point or "best" date for the age of a document: documents with this date get the full Weight applied to their rank, while documents older or newer than this get less. It defaults to Current Date, i.e. right now.

Sometimes Current Date is not the best choice, however, because it is a moving target. For example, a daily crawl of news articles would see date biasing change throughout the day: an 8am article would rank higher when searched at 9am than when searched at 5pm. Setting Anchor to Last Walk Finished may help in this case: it uses the date of completion of the last successful walk - which will be fixed from search to search, yet still update with each walk.

In other cases, Current Date is not appropriate because the dataset is fixed. For example, a crawl of an unchanging historical archive from the 1990s - whose most recent document is from 1999 - should not see date biasing change for the same documents searched next year vs. now. Nor should it treat 1995 documents nearly the same as 1998 documents (because both are 20+ years old now). In this instance, it might help to set Anchor to Fixed Date and choose a date of e.g. 1999-12-31 in the Fixed Date date-picker that appears: this will treat 1995 documents as significantly older (4x) than 1998 documents.


Copyright © Thunderstone Software     Last updated: Nov 8 2024