Friday, September 30, 2005

Search Index Rules Syntax

SharePoint does a pretty good job of handling long, complex or unique URLs within the interface; many lists use relative pathing, Area and Site URLs preserve spaces for easy readability, and so on. One instance where this is definitely NOT the case is the application of rules for inclusion and exclusion of content sources.

When creating a rule that references a URL path, such as http://servername/Area, you must replace any spaces and special characters (ampersands, apostrophes, commas, etc.) with the proper URL encoding (%20, %26, %2C, etc.). Otherwise, the MSSearch process will ignore the rule and continue evaluating results against valid rules.

Also, remember that the order of the rules is important. If you are trying to exclude the above URL, the Portal_Content index will have a default inclusion rule of http://servername/; any new rules you create will be processed after this rule is satisfied. Since the default rule includes all content, any antecedent rules will be ignored. To address this issue, simply move your exclusions to precede the blanket inclusion and the search service will properly exclude the specified content source.