Tuesday, August 30, 2005

SharePoint Blogsearch Expands to Mexico

For our friends from Latin America, Luis Du Solier's Spanish-language blog has just been added to the SharePoint Blogsearch index. As I grew up in El Paso (which is right on the Mexican-American border), I am quite fond of our neighbors to the south and glad to see Luis is making a contribution to the SharePoint community. If I continue to get multilingual submissions I'll create a content source for each language to make native searches a bit easier.

Para todos nuestros amigos latinos, bienvenido por favor Luis Du Solier a la familia de SharePoint Blogsearch. Como soy de EL Paso, Tejas y orgullosa de nuestros vecinos al sur. Estoy alegre ver que Luis está haciendo una contribución a la comunidad de SharePoint. Su blog es muy bueno, Luis!

Thursday, August 25, 2005

Database Permissions

I'd been wondering for a long time how the ACL's in the SharePoint database work. In fact, I was just discussing the issue this morning with a colleague when - voila! - Paolo posts the answer.

Now that's what the blogosphere is all about!

Tuesday, August 23, 2005

Working with SharePoint Lists

A colleague of mine is working on a custom application which makes extensive use of SharePoint lists to store data. He has written a webpart that rolls-up the XML data from various lists, transforms it with XSL, and renders it in a way-cool tabbed control. While putting this together, he found this link on Using Data from Sharepoint 2003 Lists. Very cool and a big thanks to Paul Ballard for his post.

Perhaps if we needle him enough, Scot will be nice enough to post his nifty code and show us how it's done. Or, he could just get a blog and start sharing his wizardry with the rest of the world, eh???

Hiding the Site Settings Link

One of the most common user complaints in new implementations is links that are exposed to users who don't have rights to access the content - like the 'Site Settings' link at the top of every page. Heather Solomon has figured out how to hide that pesky critter for good.

Where does she find this stuff? More importantly, how on earth would I make it through the week without her blog???

Heather for MVP!!!

Sunday, August 21, 2005

SharePoint Templates Article

While following up on a comment from Claudia I found her article on WSS themes. Good stuff - go check it out.

Friday, August 19, 2005

Extreme SharePoint Design Series

Site definitions and custom templates are essential to delivering a customized SharePoint experience. While there are several good resources for information on how to deploy a custom site definition, such as the SDK, MSDN, and Heather Solomon's excellent site, there aren't many in-depth examples of custom code for UI modifications. It's time to change that, don't you think?

Over the next few months I'll be presenting "Extreme SharePoint Design", a regular blog series with advanced tips and tricks for designers. Topics won't be in any particular order, just cool stuff as I come across it with lots of code to copy and paste. Watch this space for updates, often daily but sometimes weekly, depending upon my travel schedule.

As always, feel free to share your thoughts or suggest ideas.

UPDATE: I'll be posting a lengthy article next week on customizing various context menus (Actions, Views, etc.) - it's a bigger task than I thought at first but well worth the effort. Stay tuned!

Tuesday, August 16, 2005

Extreme SharePoint Design: Site Definition File Differences

Creating a custom site definition from scratch can be a daunting task. Each file in the folder structure serves a particular purpose and many of them vary slightly from folder to folder - allitems.aspx is not the same in DOCLIB as it is in VOTING (see Heather's post for a complete list of files in each folder). Due to slight variances, there is no way to do a global search and replace to copy your customizations. Getting the correct code in each file is essential to maintaining SharePoint's functionality. Isn't there an easy way to solve this problem?

The answer is - sort of. First, begin with a blank template file that has all the necessary components for that type (allitems.aspx, for example). Remember that many page elements require specific registrations and script files, so make sure your [HEAD] tags include all the necessary code blocks. Once you have a working template file, copy it into each directory, then add the code necessary for each list type.

I have compiled a list of each file type along with the code elements that are unique to each list (folder) below, using the SPS site definition as a baseline (60\templates\1033\SPS). The code goes in various places but I tried to keep it sequential; that is, the first code block (individual blocks are separated by ***) goes first, then the second, and so on. Some are contextual (they contain more than just the unique code) to make searching and replacing easier. I also tried tgo keep all script elements together to insure that no code gets orphaned. Once you've modified a couple of files you'll know exactly where to place each block.

Site Definition Original Code (SPS)

Please note that I did not include files that are specific to a certain list type (such as calendar.aspx in EVENTS) only those that recur in multiple folders. In a subsequent post, we'll look at how these files differ from those in STS to show how a custom deployment can seamlessly incorporate both SPS and WSS in a unified design.

NOTE ON CODE SAMPLES: Few things frustrate me more than not being able to cut and paste HTML code into Blogger's editor. You would think they could solve this with a special tag like many discussion forums do, but if they have I sure don't know about it. Until such a thing comes along, all <> tags will be displayed as [ and ]. Sorry for the inconvenience.

Friday, August 05, 2005

Optimizing External Site Crawls

After launching SharePoint Blogsearch, I discovered (with some help from Greg) that the SharePointPSSearch (SPSSearch) service needs a few tweaks to work well on external web sites. After doing some digging around in the documentation, looking into the packets with a protocol analyzer, and generally scratching my head in confusion, I learned a few things:

1. SPSSearch does not always honor robots.txt files (this is a text file placed in the root directory of a web site that tells crawlers how to behave). Yes, the documentation says it does, and you can modify the id string in the registry, but it doesn't always seem to work. I'm still trying to come up with an answer to this one.

2. By default, the crawler will request as many documents from the target site as it can fit into the available threads or until it starts receiving TCP errors; in other words, it will hammer an external site into submission. Fortunately, you can control this errant behavior. Go to SharePoint Central Site Administration Manage Search Settings. In the 'Site Hit Frequency Rules' section, click on 'Manage Site Hit Frequency Rules'. Click 'Add Update Rule' on the toolbar. In the Site Name field, enter "*" for all web sites (you can also set rules by explicit name, domain, etc.). Next, click the 'Limit number of documents requested simultaneously' radio button and enter a small number (minimum is 1, max is 999, I used 5) in the Number of Documents field. This will significantly reduce the load on target servers.

3. Incremental updates are SUPPOSED to ignore content that has not been changed, crawling only those docs that have changed. In reality, this is not the case. I noticed that all documents were being processed even though the content was static. I tested a full update and incremental update on a static site and the exact same load was generated (packet count, bytes, etc.) in both a full and incremental update. This could be a bug or it could be some hidden registry setting somewhere that I haven't found yet. Any ideas would be appreciated.

4. Adaptive updates only appear to work on SPS/WSS sites. According to the docs, adaptive updates make an educated guess, based on historical patterns, as to what content may have changed on a site. In theory, this should greatly reduce the load on a crawled site and it may very well work that way in SPS/WSS. It doesn't seem to have any effect on other types of sites but my test configuration was limited so I may not have all the data. Again, if anyone has any ideas, please share.

More on this topic as I continue to tweak the settings.

Update: If you want to learn more about search optimization, here is a KB article to get you started.

Update 2: Here's another tip. When creating an inclusion rule for a subdirectory on a site the default behavior is to also include the parent site. For example, a new inclusion rule for http://www.theegroup.net/blogs/ would create two entries - one for the full path and one for the parent http://www.theegroup.net. This means that any links to other URL's on the same site would be processed which greatly expands the scope of the crawl. To restrict this behavior, change the parent rule to an exclusion and leave the child URL as an inclusion. This will restrict the crawler to links under the child URL.