Sunday, February 19, 2012

Can I see what is inside Full-Text Search Catalog?

I have a problem whith knowing what is inside catalog, especially words that
are indexed in tables. I want to improve some search capabilities, and I need
to know structure of index (catalog). My improvements will probably be
approximate string matching, by word distance, and I need to know set of
indexed words. I work with SQL Server 2005.
Thankful in advance,
Marko Mladenovic
Have a look at cidump. You can find it in C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\Binn
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Marko Mladenovic" <Marko Mladenovic@.discussions.microsoft.com> wrote in
message news:ADAF0E16-D782-464D-A274-A23206C1940E@.microsoft.com...
> I have a problem whith knowing what is inside catalog, especially words
that
> are indexed in tables. I want to improve some search capabilities, and I
need
> to know structure of index (catalog). My improvements will probably be
> approximate string matching, by word distance, and I need to know set of
> indexed words. I work with SQL Server 2005.
> Thankful in advance,
> Marko Mladenovic
|||Marko,
The SQL Server 2005 Beta2 as well as Bet3 Books Online (BOL) have yet to be
updated on this most useful SQL FTS utility. In the meantime, you should
checkout the cidump utility. Use cidump /? to get the syntax as well as
example of use, located at
\MSSQL90\MSSQL.1\MSSQL\Binn:
cidump /?
-- edited output:
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
...
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
cidump /?
Advanced options:
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
Enjoy!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Marko Mladenovic" <Marko Mladenovic@.discussions.microsoft.com> wrote in
message news:ADAF0E16-D782-464D-A274-A23206C1940E@.microsoft.com...
>I have a problem whith knowing what is inside catalog, especially words
>that
> are indexed in tables. I want to improve some search capabilities, and I
> need
> to know structure of index (catalog). My improvements will probably be
> approximate string matching, by word distance, and I need to know set of
> indexed words. I work with SQL Server 2005.
> Thankful in advance,
> Marko Mladenovic
|||Thank You Hilary and John, Your answers were very helpful to me. If You can
answer me just one more question, please. How can I relate "wid" from a
catalog with real database records, i.e. table, row and column.
Thanks again,
Marko.
"John Kane" wrote:

> Marko,
> The SQL Server 2005 Beta2 as well as Bet3 Books Online (BOL) have yet to be
> updated on this most useful SQL FTS utility. In the meantime, you should
> checkout the cidump utility. Use cidump /? to get the syntax as well as
> example of use, located at
> \MSSQL90\MSSQL.1\MSSQL\Binn:
> cidump /?
> -- edited output:
> Use cidump.exe for:
> - Dumping the content of the catalog or a specified index
> cidump -dump <catalog_path> [options]
> - Checking the integrity of the catalog or a specified index
> cidump -check <catalog_path> [options]
> - Computing statistics on the content of the catalog or a specified index
> cidump -stats <catalog_path> [options]
> ...
> Display global statistics for the index 0001002A:
> cidump -stats c:\catalog -g -i 0x1002A
> cidump /?
> Advanced options:
> -x [<format>] - Dump the index
> <format> can be also:
> ks - statistics per key
> kw+ - keys and wids and wid metadata (all but
> occurrences)
> kp - list of keys with position in index
> kwp - list of keys and wids with position in index
> kph - dump phrases that contain a given key.
> Requires -kph and -pch options.
> ph - dump all phrases in more than a given nr of docs
> Requires -phc option.
> -dir [<format>] - Dump the index directory
> <format> can be:
> kbo - keys and BitOffset in index (default)
> kp - list of keys with position in the directory file
> -phk <key> - Used only with "-x kph" dump format.
> Dump phrases that contain the given <key>. Use also -phc.
> -phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
> Dump all the 2-3 word phrases that occur in at least <minOccCount>
> documents in the indexed corpus.
> -kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
> -alr - Display the allocated ranges for the master index.
> -fbs - Force binary search when dumping widsets.
> -rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
> Don't use the Index Table.
> <index> - the index id (e.g.) 0x1001C
> <type> - 0 - master index; 1 - shadow index.
> <maxWid> - the maximum workid in the index.
> <R/W> - 0 - complete index; 1 - incomplete index (write mode)
>
> Enjoy!
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
>
> "Marko Mladenovic" <Marko Mladenovic@.discussions.microsoft.com> wrote in
> message news:ADAF0E16-D782-464D-A274-A23206C1940E@.microsoft.com...
>
>
|||You can't, its a work identifier - it is changes with each population.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Marko Mladenovic" <MarkoMladenovic@.discussions.microsoft.com> wrote in
message news:5578E081-AE78-4FA1-8008-C20967BCD41E@.microsoft.com...
> Thank You Hilary and John, Your answers were very helpful to me. If You
can[vbcol=seagreen]
> answer me just one more question, please. How can I relate "wid" from a
> catalog with real database records, i.e. table, row and column.
> Thanks again,
> Marko.
> "John Kane" wrote:
be[vbcol=seagreen]
index[vbcol=seagreen]
docs[vbcol=seagreen]
file[vbcol=seagreen]
also -phc.[vbcol=seagreen]
formats.[vbcol=seagreen]
<minOccCount>[vbcol=seagreen]
range.[vbcol=seagreen]
I[vbcol=seagreen]
of[vbcol=seagreen]

No comments:

Post a Comment