hanyong

Searching ProcessBook Files in Windows and SharePoint

Blog Post created by hanyong Employee on May 12, 2010

Windows Search

Windows Search (successor of the Windows Indexing Service, formerly known as Windows Desktop Search or WDS on Windows XP and Windows Server 2003) is an indexeddesktop search platform released by Microsoft for the Windows operating system.

Search for ProcessBook file contents in Windows Search

In Windows Vista, Windows 7 and Windows Server 2008, Windows Search is integrated into the operating system. Upon installation, Windows Search (and Windows Desktop Search) builds a full-text index of the files on a user's hard drive. Once a file's contents have been added to this index, Windows Search is able to use the index to search results more rapidly than it would take to search through all the files on the computer. Searches are performed not only on file names, but also on the contents of the file (provided a proper handler for the file type is installed) as well as the keywords, comments and metadata the file might be tagged with.

 

Windows Search by default includes handlers for common filetypes, including Word documents, Excel spreadsheets, PowerPoint presentations, HTML documents, text files, MP3 and WMA music files, WMV, ASF and AVI videos, JPEG, BMP and PNG images, among others. It would seem as if there is no available handler for PI ProcessBook file format, unless we take a step further and look how the underlying file content is saved. We can see that there are alternatives to allow us to search for contents in PDI or SVG without a custom handler or IFilter

 

.PDI format is an extension for ProcessBook display files. The .pdi file content is hashed and hence if we open a .pdi file using a text editor like notepad, we can expected to see something like:
Indexing_5F00_WS_5F00_001.jpg

 

However if we take a closer look at the file content, we can see that not all information is hashed. We can see that some key information is still readable in text form, like tagnames and symbol names. This means that we can index .pdi files as a text file to search for such information in .pdi files.

 

On the other hand, .svg format is a file type defined Scale Vector Graphics files. The graphical content of a .svg file is saved as an XML document which means that we can configure Windows Search to index .svg file content as a plain text file naturally.

 

Given that we can index both file types as plain text file, all we have to do next would be to configure Windows Search to index .pdi and .svg content.

 

In Windows Vista and Windows 7, this can be done pretty easily if you go into "Indexing Options" in "Control Panel". By going into Advanced Options and the File Type tab, you should see a dialog window like below:
Indexing_5F00_WS_5F00_002.jpg

 

If you choose the option to "Index Properties and File Contents" you should see that the "Filter Description" column for .pdi and .svg to be showing "Plain Text Filter".

 

Next thing that is required is to ensure that the location that is storing your ProcessBook files is an indexed location. If you refer to the screenshot above again, you can see the list of locations that are indexed as well. If your files are located in those locations, you can search it easily using Windows Search. Of cause, if you save your ProcessBook files in other locations, you can add that to the list of indexed locations.

 

So after the configuration is done, you can use the Windows Search feature is to search for files that includes tagnames like CDT158, or Dataset names like:
Indexing_5F00_WS_5F00_003.jpg

Indexing PDI, PIW and SVG in Microsoft SharePoint Search 

In most enterprise environment, there typically is a lot of information sharing among different users. So instead of just plainly searching within the local desktop, we would want to extend it to search of your SharePoint portal if possible. There are similar search features implemented on various versions of SharePoint technologies and hence it is possible to configure SharePoint search to index .svg and .pdi files based on the file content, allowing users to search for in the files that are shared on a SharePoint portal.

 

Let's see how we can configure Microsoft SharePoint Search to index the contents of .pdi, .piw and .svg files. Because there are different versions of SharePoint, configuration may differ on these different version. Another thing is that SharePoint servers can be deployed with different roles (Web Front-End server, Application server, Database server) in a server farm. I did my configuration in a Standalone installation of Microsoft SharePoint 2007 server (which runs every in a standalone box), but if you perform the steps on the appropriate servers, it should work.

 

Of cause, the 1st thing that you need is to configure Office SharePoint Search to work. You can find more reference online here: http://technet.microsoft.com/en-us/library/cc303425.aspx. Alternatively you can do a web search to find some blog posts or article on this. You should take note which servers have the role of Index servers and Query servers. Index servers are responsible for crawling (i.e. indexing) the SharePoint contents while Query servers handle search queries from users. If you are using Microsoft SharePoint 2007 server, you can find out your server roles from "SharePoint 3.0 Central Administration", under "Application Management > Search Service", you should be able to find out which are your Query and Index servers.

 

Indexing_5F00_SS_5F00_011.jpg

 

The configuration of Office Sharepoint Search to search custom file types using text filter is not as straightforward compared to doing it in Windows Search; we even need to edit the Windows Registry on the server in order to achieve this! This means that you need to run this as a user with sufficient privileges. With that, lets start to do this:

  1. .pdi and .svg are not indexed by default, hence we need to include them 1st. To do this, open up "Sharepoint 3.0 Central Administration" page. Under Shared Services Administration, select the instance that you want to configure.
    Indexing_5F00_SS_5F00_001.jpg
    Then go into "Search Settings"
    Indexing_5F00_SS_5F00_002.jpg
    And finally "File Types"
    Indexing_5F00_SS_5F00_003.jpg
    You should now see a list of file types that Office Sharepoint Search indexes. Click on "New File Type" that should be on the top left corner. Type in "pdi" as the extension, and click on "OK"
    Indexing_5F00_SS_5F00_004.jpg
  2. Just adding the file type is not enough because by default, Office Sharepoint Search will attempt to find an IFilter for the new .pdi extension to index the file content. Since a PDI IFilter is not available, Office Sharepoint Search will only be able to index based on file's property. What we need to do next is to make Office Sharepoint Search to regard the extension as a plain text file and use text IFilter to index the file. This is the part where we start to meddle with the registry. You can leave the Sharepoint Central Administration page alone for a while. 
    1. Create a .reg file to edit the registry in order Sharepoint Search to use text filter to index pdi file. A sample .reg file can be downloaded here.
    2. Run the .reg file and you will be shown a warning message that tells you that you are editing the registry. Confirm to make the changes required. Of cause it is always a good practise to backup the registry before doing this.
    3. Repeat step 1 and 2 on all Index and Query Server(s)
    4. There is still another part that we need to edit manually on all the indexing servers. Run "regedit" to go into Registry Editor. Find the registry key
      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<some guid>\Gather\Search\Extensions\ExtensionList
    5. Create new String Value Replace "New Value #1" with 38 (or the next number available) and press Enter
    6. Set the value as "pdi"
      Your output on the registry editor should be something like
      Indexing_5F00_SS_5F00_005.jpg
    7. Repeat step 4 to 6 for all Index servers
  3. Next thing to do is to reset all crawled contents (i.e. the index) on the server. So go back to Sharepoint 3.0 Central Administration and select Search Setting for the Shared Services again (similar to step 1). This time select the option to "Reset all crawled content"
    Indexing_5F00_SS_5F00_006.jpg
  4. Restart the search service for the configurations to take effect. One way to do this is in command prompt and run the following commands to restart the service:
    - net stop osearch
    - net start osearch
  5. Finally, after the service is restarted, initiate a full crawl for the Shared Services. So back to Search Settings again, and this time select the option "Content Sources and Crawl Schedule" instead
    Indexing_5F00_SS_5F00_007.jpg
    And in the next webpage ("Manage Content Sources"), Choose to start the crawls
    Indexing_5F00_SS_5F00_008.jpg
    The search service should start to crawl through the sharepoint files and indexing them, and this time round the configuration to crawl through contents of pdi and svg file is included.

After the crawl is done, it's time to test the configuration. You can type a tagname or a dataset name as a search term in any of the search boxes on your sharepoint sites to find the pdi or svg that contains the term.
Indexing_5F00_SS_5F00_009.jpg
Indexing_5F00_SS_5F00_010.jpg

Note: There are still some limitations when we attempt to search files using tagnames with some characters inside, like "BA:Level.1" will not find the pdi or svg file with the tag, but "BA:Level" will.

Acknowledgement

Thanks to Jay Lakumb for his idea of searching through ProcessBook file contents in Windows and SharePoint Search.

 

Thanks to Anne and Navdeep for their respective blog posts about configuring SharePoint search for .csv and .cs files. These served as a reference for me to come up with the steps in this blog post:
Anne Stenberg's Blog: http://blogs.technet.com/anneste/archive/2010/02/11/how-to-crawl-the-content-in-csv-files-in-microsoft-office-sharepoint-server-2007.aspx
Navdeep Madan's Blog: http://blogs.msdn.com/navdeepm/archive/2009/10/21/how-to-crawl-the-content-of-file-with-the-extension-cs-c-sharp.aspx

Outcomes