Thursday, March 8, 2012

Can I use Top here and I'm stuck on how to get the results I'm looking for

The tables and queries are below with comments on what I'm trying to
accomplish. Thanks to all !
Copying from the next line to the end and pasting into QA does it all
(except what I'm looking for)
set nocount on
--drop table combinations
--drop table results
create table combinations (id int, combination int, innerindex int, value
int)
-- the combination field indicates a 2 number combination
-- although this just a sampling of my combinations table,
-- it holds 3,4 and 5 number combinations as well
-- innerindex is for my testing and is used below
-- value is the actual number from the 2 number combination
insert into combinations values(1,2,1,1)
insert into combinations values(1,2,2,2)
insert into combinations values(2,2,1,1)
insert into combinations values(2,2,2,3)
insert into combinations values(3,2,1,1)
insert into combinations values(3,2,2,4)
insert into combinations values(4,2,1,1)
insert into combinations values(4,2,2,5)
insert into combinations values(5,2,1,2)
insert into combinations values(5,2,2,3)
insert into combinations values(6,2,1,2)
insert into combinations values(6,2,2,4)
go
create table results (id int, testsetid int)
-- the results table lists which testsets (from another table)
-- have which combination. So for the first record
-- testset 50 contains the combination 1,2 as indicated
-- by the id of 1
insert into results values(5,50)
insert into results values(1,2)
insert into results values(6,50)
insert into results values(1,2)
insert into results values(4,450)
insert into results values(1,2)
insert into results values(3,150)
insert into results values(1,2)
go
-- this works nicely to give me a total of the results grouped by the
combinations
select id, count(*) as Total from results group by id order by count(*) desc
-- Is there a way to use the Top function to only
-- give me the top 1 or 5 or 5 percent
select a.id, count(*) as Total from results a group by a.id
having a.id in (select id from combinations where combination = 2)
order by total desc
-- here I am trying to get a Total and the actual combination listed
horizontally
-- but I don't see how. I was reading that this type of thing should be done
in the
-- presentation layer but I'm not sure how I can do it there either.
-- I'm using c# and can relate the tables via a dataset but I;m still stuck
-- any suggestions would be appreciated.
select a.id, count(*) as Total,
SUM(CASE [innerindex] WHEN 1 THEN value ELSE ' ' END)AS F1,
SUM(CASE [innerindex] WHEN 2 THEN value ELSE ' ' END)AS F2,
SUM(CASE [innerindex] WHEN 3 THEN value ELSE ' ' END)AS F3,
SUM(CASE [innerindex] WHEN 4 THEN value ELSE ' ' END)AS F4,
SUM(CASE [innerindex] WHEN 5 THEN value ELSE ' ' END)AS F5
from combinations a join results b on a.id = b.id
group by a.id
order by a.id
-- the ultimate goal is to have something that looks like
--
-- Total Combination
-- 55 1,3,8,55
-- 10 1,3
-- 8 1,9
-- and if I could say show me the top 3 number combinations or
-- top 5 percent of all combinations etc etc
-- thanks alot!!!!
drop table combinations
drop table resultsOn Tue, 17 May 2005 11:03:10 -0400, D wrote:

>The tables and queries are below with comments on what I'm trying to
>accomplish. Thanks to all !
>Copying from the next line to the end and pasting into QA does it all
>(except what I'm looking for)
(snip)
Hi D,
First: thanks for posting the CREATE TABLE and INSERT statements and the
requested output.
Unfortunately, it didn't help in this case - I am still at a loss when I
try to grok what you're trying to do.
It might help if your script included the constraints as well. The
tables you posted have no primary key, and there is no foreign key to
indicate the relationship either.
For a better understanding, a short explanation of the business problem
you're trying to solve might help lots too.
Finally, I really don't see how the required rersults you posted relate
to the sample data in the INSERT statements. Could you elaborate on
that?
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hey Hugo hows it going?
This code was actually based upon some of your examples that you showed me
in an earlier post I had. At the time I was trying to search for
combinations within a table and you had suggested I create a normalized
table of combinations and join in the data table to seek out the
combinations. I have attached it at the very end of this message in case you
wish to review, look for 'Start of old post' below.
I created the table of combinations (called combinations) and I have my
table of data then I was running the join that you gave me but I was
concerned that it was taking a long time to complete. I then changed the sql
select to 'SELECT w.combination, t.testset into results' to save the results
except I renamed combination to id and testset to testsetid.
So in the 'Query Analyzer Section' below, the table 'Results' is the actual
output from the join query in the section 'Start of old post' and there
isn't any keys set because I was just doing a select into
(I changed some names and text to make it more clear)
============== Start of Query Analyzer Section
====================================
set nocount on
--drop table Combinations
--drop table Results
create table Combinations
(CombinationId int, Combination int, InnerIndex int, CombinationValue int,
primary key(CombinationId, Combination, InnerIndex) )
-- CombinationId = Identity field
-- Combination = field indicates a 2 number combination
-- although this just a sampling of my Combinations table,
-- it holds 3,4 and 5 number Combinations as well
-- InnerIndex = for my testing and is used below
-- CombinationValue = is a number from the 2 number combination
insert into Combinations values(1,2,1,1)
insert into Combinations values(1,2,2,2)
insert into Combinations values(2,2,1,1)
insert into Combinations values(2,2,2,3)
insert into Combinations values(3,2,1,1)
insert into Combinations values(3,2,2,4)
insert into Combinations values(4,2,1,1)
insert into Combinations values(4,2,2,5)
insert into Combinations values(5,2,1,2)
insert into Combinations values(5,2,2,3)
insert into Combinations values(6,2,1,2)
insert into Combinations values(6,2,2,4)
go
create table Results (CombinationId int, TestsetId int)
-- The Results table lists the CombinationId's
-- which you could consider a foreign key
-- that originates from the Combinations table
-- and the Testsetid
-- which you could consider a foreign key
-- that originates from my data table not shown
-- For the records below
-- Testsetid 1 contains the combination 1,2 as indicated
-- by the Combinationid of 1
insert into Results values(1,1)
-- Testsetid 50 contains the combination 2,4 as indicated
-- by the Combinationid of 6
insert into Results values(5,50)
insert into Results values(6,50)
insert into Results values(1,2)
insert into Results values(4,450)
insert into Results values(1,3)
insert into Results values(3,150)
insert into Results values(1,4)
insert into Results values(2,1)
insert into Results values(2,2)
insert into Results values(2,3)
insert into Results values(2,4)
insert into Results values(2,5)
insert into Results values(2,6)
go
-- here I found the solution that I was looking for
-- **see comments in post about this
--
-- the top row of the Results is
-- 6 1 3 0 0 0
-- which reads there are 6 Combinations of 1 & 3
-- the next row reads there are 4 Combinations of 1 & 2
select
(
select count(*) from Results b where a.CombinationId = b.CombinationId
group by b.CombinationId
) [Total Combinations],
SUM(CASE [InnerIndex] WHEN 1 THEN CombinationValue ELSE ' ' END)AS [
First
Number],
SUM(CASE [InnerIndex] WHEN 2 THEN CombinationValue ELSE ' ' END)AS [
Second
Number],
SUM(CASE [InnerIndex] WHEN 3 THEN CombinationValue ELSE ' ' END)AS [
Third
Number],
SUM(CASE [InnerIndex] WHEN 4 THEN CombinationValue ELSE ' ' END)AS [
Fourth
Number],
SUM(CASE [InnerIndex] WHEN 5 THEN CombinationValue ELSE ' ' END)AS [
Fifth
Number]
from Combinations a
group by a.CombinationId
having a.CombinationId in
(
-- using this I can select the combination I want
-- this says only show me the Results from the 2 number Combinations
select a.CombinationId from Results a
group by a.CombinationId
having a.CombinationId in (select CombinationId from Combinations where
Combination = 2)
-- using this count I can control some of the Results
and count(*) > 1
)
order by [Total Combinations] desc
-- thanks alot!!!!
drop table Combinations
drop table Results
============== End of Query Analyzer Section
====================================
*** Comments about my large query
It works good for small sets like these but if I set it to search from a 4
or 5 number combinations from my results table
which contains 37960 records, it just slowly chugs away for a long long
time.
I was hoping to just get back the top 5 or 10 combinations and their totals
but sql has to run every combination to figure out
the top totals.
Thanks for your help as always!!!!
============== Start of old post ====================================
Hi D,
All the more reason to store the values you want to find in a table. Only,
you'll have to add another column, so you can store different combinations
at once. And you'll have to adapt the query, of course. See if the code
below helps.
create table test (testset int, testnumber int, value int,
primary key(testset, testnumber),
unique(testset, value))
insert into test values(1,1,1)
insert into test values(1,2,2)
insert into test values(1,3,3)
insert into test values(1,4,4)
insert into test values(1,5,5)
insert into test values(2,1,1)
insert into test values(2,2,2)
insert into test values(2,3,7)
insert into test values(2,4,8)
insert into test values(2,5,9)
insert into test values(3,1,2)
insert into test values(3,2,3)
insert into test values(3,3,6)
insert into test values(3,4,7)
insert into test values(3,5,8)
go
create table wanted (combination int not null,
value int not null,
primary key(combination, value))
insert into wanted (combination, value)
-- Testset 1: values 1 and 2
select 1, 1 union all
select 1, 2 union all
-- Testset 2: values 1 and 3
select 2, 1 union all
select 2, 3 union all
-- Testset 3: values 1, 2, and 3
select 3, 1 union all
select 3, 2 union all
select 3, 3
go
SELECT w.combination, t.testset
FROM test AS t
INNER JOIN wanted AS w
ON t.value = w.value
GROUP BY w.combination, t.testset
HAVING COUNT(*) = (SELECT COUNT(*)
FROM wanted AS w2
WHERE w2.combination = w.combination)
go
drop table wanted
drop table test
go
============== End of old post ====================================|||On Thu, 19 May 2005 11:15:42 -0400, D wrote:

>Hey Hugo hows it going?
Hi D,
I'm fine, thanks!
(BTW, sorry to have kept you waiting - I needed some time for this, and
real life kept interfering).

>This code was actually based upon some of your examples that you showed me
>in an earlier post I had. At the time I was trying to search for
>combinations within a table and you had suggested I create a normalized
>table of combinations and join in the data table to seek out the
>combinations. I have attached it at the very end of this message in case yo
u
>wish to review, look for 'Start of old post' below.
Thanks. I didn't recognise it from your earlier post, but now I do
recall having that discussion with you (though I don't recall what
exactly your original question was - but I think that's not really
relevant to this question, right?)

>I created the table of combinations (called combinations) and I have my
>table of data then I was running the join that you gave me but I was
>concerned that it was taking a long time to complete. I then changed the sq
l
>select to 'SELECT w.combination, t.testset into results' to save the result
s
>except I renamed combination to id and testset to testsetid.
>So in the 'Query Analyzer Section' below, the table 'Results' is the actual
>output from the join query in the section 'Start of old post' and there
>isn't any keys set because I was just doing a select into
I have to disagree - in a proper table, there ALWAYS are keys. They may
not be defined, and therefor unknown to the server, but LOGICALLY, the
keys exist.
Knowing the keys (the LOGICAL keys, that is) is important when querying
a table. Keys tell you what columns to use in filtering, joining, or
grouping. They tell you when to expect duplicates. Etc.
For the record - you can add keys to a table that is populated with
SELECT INTO. Whether or not you do that has no consequence for the logic
of the query. It will change the performance of all queries and
statements involving the table. It can also change the way unexpected
bad data is handled.
(snip)
>*** Comments about my large query
>It works good for small sets like these but if I set it to search from a 4
>or 5 number combinations from my results table
>which contains 37960 records, it just slowly chugs away for a long long
>time.
>I was hoping to just get back the top 5 or 10 combinations and their totals
>but sql has to run every combination to figure out
>the top totals.
>Thanks for your help as always!!!!
I must admit that I had some trouble understanding what you did. I'm
still unsure if I understand your query completely. I did manage to
write two alternatives that both produce the same results from the
sample data you provided, but do test it against some other test sets as
well, as I might have misinterpreted the logic of your query.
-- Alternative #1
SELECT Total AS [Total Combination],
First AS [First Number],
Second AS [Second Number],
Third AS [Third Number],
Fourth AS [Fourth Number],
Fifth AS [Fifth Number]
FROM (SELECT (SELECT COUNT(*)
FROM Results AS b
WHERE b.CombinationId = a.CombinationId) Total,
SUM(CASE InnerIndex WHEN 1 THEN CombinationValue ELSE
0 END) AS First,
SUM(CASE InnerIndex WHEN 2 THEN CombinationValue ELSE
0 END) AS Second,
SUM(CASE InnerIndex WHEN 3 THEN CombinationValue ELSE
0 END) AS Third,
SUM(CASE InnerIndex WHEN 4 THEN CombinationValue ELSE
0 END) AS Fourth,
SUM(CASE InnerIndex WHEN 5 THEN CombinationValue ELSE
0 END) AS Fifth
FROM Combinations a
WHERE Combination = 2
GROUP BY a.CombinationId) AS x
WHERE Total > 1
ORDER BY Total DESC
-- Alternative #2
SELECT MAX(b.Total) AS [Total Combination],
SUM(CASE a.InnerIndex WHEN 1 THEN a.CombinationValue ELSE 0
END) AS [First Number],
SUM(CASE a.InnerIndex WHEN 2 THEN a.CombinationValue ELSE 0
END) AS [Second Number],
SUM(CASE a.InnerIndex WHEN 3 THEN a.CombinationValue ELSE 0
END) AS [Third Number],
SUM(CASE a.InnerIndex WHEN 4 THEN a.CombinationValue ELSE 0
END) AS [Fourth Number],
SUM(CASE a.InnerIndex WHEN 5 THEN a.CombinationValue ELSE 0
END) AS [Fifth Number]
FROM Combinations a
INNER JOIN (SELECT CombinationId, COUNT(*) AS Total
FROM Results
GROUP BY CombinationId
HAVING COUNT(*) > 1) AS b
ON b.CombinationId = a.CombinationId
WHERE a.Combination = 2
GROUP BY a.CombinationId
ORDER BY SUM(b.Total) DESC
go
If my queries are correct, the next step is to do some performance
tests. You can use SET STATISTICS IO ON and SET STATISTICS TIME ON to
get a quick overview. But the best way to test is to use an "empty"
server (ie no other processes running), clear the cache, then measure
the time taken to run one of the queries. Like this:
CHECKPOINT -- Write dirty pages to disk
DBCC FREEPROCCACHE -- Free procedure cache
DBCC DROPCLEANBUFFERS -- Flush non-dirty pages from cache
go
DECLARE @.strt datetime, @.end datetime
SET @.start = CURRENT_TIMESTAMP
-- Insert query to be tested here.
-- Use SELECT .. INTO #Tmp if result set is large,
-- to eliminate network speed from the test
SET @.end = CURRENT_TIMESTAMP
SELECT @.strt AS started, @.end AS ended,
DATEDIFF(ms, @.strt, @.end) AS "Elapsed (ms)"
go
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi Hugo,
I can't thank you enough for your help.
I haven't fully tested the results but they look good initially. Two quick
tests showed that alternative #2 was under a second and #1 took roughly 30
seconds. I'll have more comments and perhaps some questions later but I just
wanted to stop in here and thank you again because those are both brilliant
solutions and have expanded my sql knowledge enormously.
Best Regards
D.
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:ke679196cmvjgji6026p83lmm5vesvck8k@.
4ax.com...
> On Thu, 19 May 2005 11:15:42 -0400, D wrote:
>
> Hi D,
> I'm fine, thanks!
> (BTW, sorry to have kept you waiting - I needed some time for this, and
> real life kept interfering).
>
> Thanks. I didn't recognise it from your earlier post, but now I do
> recall having that discussion with you (though I don't recall what
> exactly your original question was - but I think that's not really
> relevant to this question, right?)
>
> I have to disagree - in a proper table, there ALWAYS are keys. They may
> not be defined, and therefor unknown to the server, but LOGICALLY, the
> keys exist.
> Knowing the keys (the LOGICAL keys, that is) is important when querying
> a table. Keys tell you what columns to use in filtering, joining, or
> grouping. They tell you when to expect duplicates. Etc.
> For the record - you can add keys to a table that is populated with
> SELECT INTO. Whether or not you do that has no consequence for the logic
> of the query. It will change the performance of all queries and
> statements involving the table. It can also change the way unexpected
> bad data is handled.
> (snip)
> I must admit that I had some trouble understanding what you did. I'm
> still unsure if I understand your query completely. I did manage to
> write two alternatives that both produce the same results from the
> sample data you provided, but do test it against some other test sets as
> well, as I might have misinterpreted the logic of your query.
> -- Alternative #1
> SELECT Total AS [Total Combination],
> First AS [First Number],
> Second AS [Second Number],
> Third AS [Third Number],
> Fourth AS [Fourth Number],
> Fifth AS [Fifth Number]
> FROM (SELECT (SELECT COUNT(*)
> FROM Results AS b
> WHERE b.CombinationId = a.CombinationId) Total,
> SUM(CASE InnerIndex WHEN 1 THEN CombinationValue ELSE
> 0 END) AS First,
> SUM(CASE InnerIndex WHEN 2 THEN CombinationValue ELSE
> 0 END) AS Second,
> SUM(CASE InnerIndex WHEN 3 THEN CombinationValue ELSE
> 0 END) AS Third,
> SUM(CASE InnerIndex WHEN 4 THEN CombinationValue ELSE
> 0 END) AS Fourth,
> SUM(CASE InnerIndex WHEN 5 THEN CombinationValue ELSE
> 0 END) AS Fifth
> FROM Combinations a
> WHERE Combination = 2
> GROUP BY a.CombinationId) AS x
> WHERE Total > 1
> ORDER BY Total DESC
> -- Alternative #2
> SELECT MAX(b.Total) AS [Total Combination],
> SUM(CASE a.InnerIndex WHEN 1 THEN a.CombinationValue ELSE 0
> END) AS [First Number],
> SUM(CASE a.InnerIndex WHEN 2 THEN a.CombinationValue ELSE 0
> END) AS [Second Number],
> SUM(CASE a.InnerIndex WHEN 3 THEN a.CombinationValue ELSE 0
> END) AS [Third Number],
> SUM(CASE a.InnerIndex WHEN 4 THEN a.CombinationValue ELSE 0
> END) AS [Fourth Number],
> SUM(CASE a.InnerIndex WHEN 5 THEN a.CombinationValue ELSE 0
> END) AS [Fifth Number]
> FROM Combinations a
> INNER JOIN (SELECT CombinationId, COUNT(*) AS Total
> FROM Results
> GROUP BY CombinationId
> HAVING COUNT(*) > 1) AS b
> ON b.CombinationId = a.CombinationId
> WHERE a.Combination = 2
> GROUP BY a.CombinationId
> ORDER BY SUM(b.Total) DESC
> go
>
> If my queries are correct, the next step is to do some performance
> tests. You can use SET STATISTICS IO ON and SET STATISTICS TIME ON to
> get a quick overview. But the best way to test is to use an "empty"
> server (ie no other processes running), clear the cache, then measure
> the time taken to run one of the queries. Like this:
> CHECKPOINT -- Write dirty pages to disk
> DBCC FREEPROCCACHE -- Free procedure cache
> DBCC DROPCLEANBUFFERS -- Flush non-dirty pages from cache
> go
> DECLARE @.strt datetime, @.end datetime
> SET @.start = CURRENT_TIMESTAMP
> -- Insert query to be tested here.
> -- Use SELECT .. INTO #Tmp if result set is large,
> -- to eliminate network speed from the test
> SET @.end = CURRENT_TIMESTAMP
> SELECT @.strt AS started, @.end AS ended,
> DATEDIFF(ms, @.strt, @.end) AS "Elapsed (ms)"
> go
>
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment