Related articles
MOSS 2007: Tales & tips from the trenches
Roundtable highlights: Google & SharePoint, social bookmarks
Using a thesaurus to enhance MOSS 2007 search
Who creates SharePoint sites?
Why so much coverage of SharePoint?
Monthly table of contents
SharePoint Search: An Enterprise Contender?
February, 2008
by Jean Graef
This article was originally published in the Enterprise Search Sourcebook, February 6, 2008. See also SharePoint Thesaurus Web Part
Is the search component of Microsoft’s SharePoint suite a viable option for enterprise search? Some of our members have already chosen it, some have tried and rejected it, and many more are considering it as a serious contender. Gartner lists it along with Google in its “Challenger” category.(1) The reason is that, with the 2007 release of the product (now called Microsoft Office SharePoint Server or MOSS), SharePoint search now has most of the basic features we’ve come to expect in enterprise search along with low cost and tight integration with existing SharePoint installations and other Microsoft applications. As one person put it, “It isn’t the best in class, but it’s good enough.”
Whether you deploy MOSS for enterprise search depends on your technology strategy and budget, how much you’ve invested in metadata and taxonomies, and how you plan to search multiple content repositories. If you use SharePoint for collaboration and content management but choose another product for enterprise search, you’ll need to consider two kinds of complimentary products:
Either way, you’ll need a strategy that integrates SharePoint’s bottom-up (decentralized) publication and management model with the top-down (centralized) enterprise search deployment model. You want users to be able to find resources – documents, Web sites, people – regardless of company location or technology yet not be overwhelmed by the minutiae of documents generated by local collaboration.
MOSS 2007: A big improvement MOSS 2007 search, a big improvement over the 2003 version, provides the basic functionality we’ve come to expect from search engines such as:
In addition, MOSS 2007 search has two new features:
MOSS vs Google How effective are these new and improved features? One large, global organization compared MOSS 2007 search with Google using a test collection of half a million documents. From a relevancy standpoint, both gave similar results without using metadata cues. Three-fourths of the 500 users enrolled in the test said that MOSS 2007 search was better than what they had before (a combination of SharePoint 2003 and a well known enterprise search engine). Testers especially liked:
1. More informative results. Document summaries enabled users to tell what they were about – a big time saver. 2. Simplicity. People didn’t need to learn how to search. They got reasonably good results by typing a word or phrase in the search box. 3. Integration with desktop applications. SharePoint search is available in the upper right hand corner of the Internet Explorer 7 browser and is integrated with Windows desktop search. Users only see results that they are permitted to access.
1. More informative results. Document summaries enabled users to tell what they were about – a big time saver.
2. Simplicity. People didn’t need to learn how to search. They got reasonably good results by typing a word or phrase in the search box.
3. Integration with desktop applications. SharePoint search is available in the upper right hand corner of the Internet Explorer 7 browser and is integrated with Windows desktop search. Users only see results that they are permitted to access.
The search manager also reported that MOSS 2007 is easier to administer and maintain, though he said that the index update process is still too time consuming. He liked the variety of usage reports, especially the one that shows the most popular search terms that have no Best Bets assigned to them (i.e. the editors have not selected one or more documents or sites to display at the top of the results list).
Room for improvement Even those who like MOSS for search point out that there’s still room for improvement. Features they would like to see include:
Many, if not all, of these features are available through third-party add-ons from vendors such as Coveo and Mondosoft Ontolica. Unlike other search engine vendors, who provide new features exclusively through the upgrade process, Microsoft encourages its customers to purchase enhancement packages created by independent developers. These add-ons, however, increase the total cost of MOSS search deployment.
Influence of strategy and budget MOSS search is especially compelling for those organizations that have standardized on Microsoft products as a way to reduce the costs of systems integration and support – or because Microsoft is a major business partner for software consulting services. On the other hand, MOSS is less appealing to organizations that subscribe to a best-of-breed strategy where products from multiple vendors believed to be best at what they do are purchased and then integrated.
Moreover, companies selecting MOSS tend to look at search as part of a single system in which:
In other words, MOSS search is well suited to organizations that have standardized on the Microsoft technology platform, use SharePoint for collaboration, have a decentralized organization structure, and are in knowledge-intensive industries (e.g. R&D, software consulting).
Investments in metadata and taxonomies Organizations that have invested in populating content with metadata and creating extensive taxonomies naturally want to leverage this effort to enhance enterprise search. MOSS search can use existing metadata in documents as well as some relationships from an external thesaurus.
The MOSS search crawler will discover metadata embedded within documents, then use it to filter search results and display options in Advanced Search. However, the administrator must first map the crawled metadata elements to “managed properties” (attributes such as author, title, and URL that can be used in search scopes and queries). The Dublin Core metadata library comes with MOSS out of the box.
Some common metadata elements are mapped by default, but it’s also possible to create new managed properties for such attributes as customer name, customer service rep, or customer service region. Managed properties can be incorporated into document and site templates to make it easier to add metadata values at creation time, but MOSS provides no auto-categorization program to add metadata retrospectively to an existing document collection.
Using a thesaurus with MOSS search With MOSS keywords and synonyms it is possible to use some thesaurus data and relationships to expand a search or influence the order of documents in the results list. Keywords in a search engine context are somewhat different from terms in a thesaurus that is used for classification or browse purposes. In a traditional thesaurus, there are preferred terms, non-preferred (USE) terms, broader terms, narrower terms, and related terms. In the MOSS “thesaurus” file (used to expand or redirect a query), there are only three kinds of relationships:
In MOSS you can also associate definitions with keywords.
It’s not possible to simply import a traditional thesaurus into the MOSS thesaurus XML format because they’re two different animals. For one thing, a search thesaurus (i.e. a list of synonyms) should contain words that real users will type in the search box (from search logs) – not terms created by a professional indexer (though there will be some overlap). For another, a traditional thesaurus may contain phrases such as “packaging law & legislation,” while a search thesaurus should contain single words or, at most, two-word phrases. Finally, there’s no way to show broader/narrower relationships in search results (e.g. as “see also” links or an expandable hierarchy of related topics).
At least two organizations we know of have bumped into size and performance limitations with the MOSS thesaurus (Microsoft says there’s a 10 mb limit).
Changing the order of search results One of the major uses of a thesaurus is to classify documents (i.e. assign terms to them). Organizations that have used a thesaurus in this way, either by using human indexers or an auto-categorization program, can leverage some of this work in MOSS through Best Bets and Authoritative Pages.
With Best Bets, MOSS administrators can associate keywords with specific Web pages or sites. When a user types the keyword into the search box, MOSS displays those sites designated as Best Bets at the top of the results list (or in a sidebar) and marks them with an icon, such as a star (see below).
With Authoritative Sites, administrators increase or decrease the relevance of content within search results by assigning one of four levels to a Web page or site: most authoritative, second-level authoritative, third-level authoritative, or sites to demote in the ranking. Sites that are not assigned an Authoritative Page level are weighted based on their “click distance” from an authoritative site. Click distance refers to the number of links between a page and an authoritative page linking to the content item.
So, while it’s possible to tweak MOSS search results using a variety of techniques along with some data from an existing thesaurus, it’s a labor-intensive endeavor. For this reason, some organization with large, complex taxonomies opt to purchase third party thesaurus management software that integrate with SharePoint – an approach which Microsoft endorses. Examples of MOSS-compatible taxonomy management tools include Factica Synaptica, Data Harmony Machine-Aided Indexer, Schemalogic SchemaServer, and Interse I-box.
A consistent search experience Users want a simple, effective way to search all available content collections – whether they reside in SharePoint, on the company’s intranet, in databases, or in external information services. The ideal is a single search box, a results page that contains relevant listings without duplicates, and a way to match user security profiles with content access levels in each content source.
Within MOSS, an administrator can create a Shared Services Provider (SSP) and instruct it to crawl all the content sources deemed necessary for a particular business function. Sources can include SharePoint content, the company intranet, database applications such as SAP and Oracle, and external information services such as FindLaw. The crawl results are stored in a single index, which makes the search relatively fast and efficient.
However, large organizations typically have multiple SSP’s. To allow a user to search all of them from a single user interface, you can purchase a third-party application such as Mondosoft’s Ontolica (see the federated search option on the Ontolica Web site). Or, you can select an enterprise search engine that can crawl and index SharePoint content. Examples include Autonomy, FAST, Longitude (BA-Insight), Oracle, Recommind, Vivissimo, and others.
Is MOSS 2007 right for you? Organizations that use SharePoint for collaboration and content management should consider MOSS for enterprise search. Its tight integration with Microsoft applications, especially Office, low cost, and new search features make it a serious contender. Because MOSS is designed for bottom-up implementation, it’s important to get input from business units as well as the enterprise search team and taxonomy manager, if there is one.
Several of our members have mentioned the effort needed to customize MOSS search and set up interfaces to other business applications through the MOSS Business Data Connector. Added to that is the cost of purchasing third-party programs for enhanced search features and taxonomy management. We suspect that for many organizations, the question is not “Should we use MOSS as our enterprise search engine” but rather “What’s the best way to integrate our non-Microsoft enterprise search engine with MOSS?”
(1) See “Magic Quadrant for Information Access Technology,” September 5, 2007. The Gartner Group.
Created on February 6, 2008 l Updated on January 4, 2010