Add Up Conditional Counts On Multiple Columns Of The Same Table

October 23, 2024 Post a Comment

I am looking for a 'better' way to perform a query in which I want to show a single player who he has played previously and the associated win-loss record for each such opponent. H

Solution 1:

Solution with correlated subquery:

SELECT*,
       (SELECTCOUNT(*) FROMmatchWHERE loser_id = p.player_id),
       (SELECTCOUNT(*) FROMmatchWHERE winner_id = p.player_id)
FROM dbo.player p WHERE player_id <>1

Solution with UNION and conditional aggregation:

SELECT  t.loser_id ,
        SUM(CASEWHENresult=1THEN1ELSE0END) ,
        SUM(CASEWHENresult=-1THEN1ELSE0END)
FROM    ( SELECT* , 1ASresultFROMmatchWHERE     winner_id =1UNIONALLSELECT    loser_id , winner_id , -1ASresultFROMmatchWHERE     loser_id =1
        ) t
GROUPBY t.loser_id

Solution 2:

Query

The query is not as simple as it looks at first. The shortest query string does not necessarily yield best performance. This should be as fast as it gets, being as short as possible for that:

SELECT p.username, COALESCE(w.ct, 0) AS won, COALESCE(l.ct, 0) AS lost
FROM  (
   SELECT loser_id AS player_id, count(*) AS ct
   FROMmatchWHERE  winner_id =1-- your player_id hereGROUPBY1-- positional reference (not your player_id)
   ) w
FULLJOIN (
   SELECT winner_id AS player_id, count(*) AS ct
   FROMmatchWHERE  loser_id =1-- your player_id hereGROUPBY1
   ) l USING (player_id)
JOIN   player p USING (player_id)
ORDERBY1;

Result exactly as requested:

username | won | lost
---------+-----+-----
alice    | 3   | 2
bob      | 1   | 0
mary     | 2   | 1

SQL Fiddle - with more revealing test data!

The key feature is the FULL [OUTER] JOIN between the two subqueries for losses and wins. This produces a table of all players our candidate has played against. The USING clause in the join condition conveniently merges the two player_id columns into one.

After that, a single JOIN to player to get the name, and COALESCE to replace NULL with 0. Voilá.

Index

Would be even faster with two multicolumn indexes:

CREATE INDEX idx_winner onmatch (winner_id, loser_id);
CREATE INDEX idx_loser  onmatch (loser_id, winner_id);

Only if you get index-only scans out of this. Then Postgres does not even visit the match table at all and you get super-fast results.

With two integer columns you happen to hit a local optimum: theses indexes have just the same size as the simple ones you had. Details:

Is a composite index also good for queries on the first field?

Shorter, but slow

You could run correlated subqueries like @Giorgi suggested, just working correctly:

SELECT*FROM  (
   SELECT username
       , (SELECTcount(*) FROMmatchWHERE  loser_id  = p.player_id
          AND    winner_id =1) AS won
       , (SELECTcount(*) FROMmatchWHERE  winner_id = p.player_id
          AND    loser_id  =1) AS lost
   FROM   player p
   WHERE  player_id <>1
   ) sub
WHERE (won >0OR lost >0)
ORDERBY username;

Works fine for small tables, but doesn't scale. This needs a sequential scan on player and two index scans on match per existing player. Compare performance with EXPLAIN ANALYZE.

Solution 3:

For a single 'subject' player, I would simply union the player in both the winning and losing roles, and sum up the wins / losses:

SELECT opponent, SUM(won) as won, SUM(lost) as lost
FROM
(
    select w.username AS opponent, 0AS won, 1as lost, m.loser_id asmefrom"match" m
     inner join"player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1AS won, 0as lost, m.winner_id asmefrom"match" m
     inner join"player" l on m.loser_id = l.player_id
) x
WHEREme = 1GROUPBY opponent;

For a set based operation, we can just left join the players to the same derived union table:

SELECT p.username as player, x.opponent, SUM(x.won) as won, SUM(x.lost) as lost
FROM"player" p
LEFT JOIN
(
    select w.username AS opponent, 0AS won, 1as lost, m.loser_id asmefrom"match" m
     inner join"player" w on m.winner_id = w.player_id

    UNION ALL

    select l.username AS opponent, 1AS won, 0as lost, m.winner_id asmefrom"match" m
     inner join"player" l on m.loser_id = l.player_id
) x
on p.player_id = x.meGROUPBY player, opponent;

SqlFiddles of both here

One small point - the names of the indices must be unique - presumably you meant:

create index idx_winners onmatch(winner_id);
create index idx_losers onmatch(loser_id);

Solution 4:

Something more readable than my original. Thoughts?

with W as (
    select loser_id as opponent_id,
    count(*) as n
    frommatchwhere winner_id =1groupby loser_id
),
L as (
    select winner_id as opponent_id,
    count(*) as n
    frommatchwhere loser_id =1groupby winner_id
)
select player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
from player
leftjoin W on W.opponent_id = player.player_id
leftjoin L on L.opponent_id = player.player_id
where player.player_id !=1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash LeftJoin  (cost=73.78..108.58rows=1224 width=48)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan onmatch  (cost=0.00..36.75rows=11 width=4)
                 Filter: (winner_id =1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan onmatch match_1  (cost=0.00..36.75rows=11 width=4)
                 Filter: (loser_id =1)
   ->  Hash LeftJoin  (cost=0.07..30.15rows=1224 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..25.38rows=1224 width=36)
               Filter: (player_id <>1)
         ->  Hash  (cost=0.04..0.04rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04rows=2 width=12)
   ->  Hash  (cost=0.04..0.04rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04rows=2 width=12)

The above has a performance killer with the player_id != 1. I think I can avoid that by only scanning the results of the joins, no?

explain with W as (
        select loser_id as opponent_id,
        count(*) as n
        frommatchwhere winner_id =1groupby loser_id
    ),  
    L as (
        select winner_id as opponent_id,
        count(*) as n
        frommatchwhere loser_id =1groupby winner_id
    )   
    select t.*from (
        select player.player_id, player.username, coalesce(W.n, 0) as wins, coalesce(L.n, 0) as losses
        from player
        leftjoin W on W.opponent_id = player.player_id
        leftjoin L on L.opponent_id = player.player_id
    ) t 
    where t.player_id !=1;

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Hash LeftJoin  (cost=73.78..74.89rows=3 width=52)
   Hash Cond: (player.player_id = l.opponent_id)
   CTE w
     ->  HashAggregate  (cost=36.81..36.83rows=2 width=4)
           Group Key: match.loser_id
           ->  Seq Scan onmatch  (cost=0.00..36.75rows=11 width=4)
                 Filter: (winner_id =1)
   CTE l
     ->  HashAggregate  (cost=36.81..36.83rows=2 width=4)
           Group Key: match_1.winner_id
           ->  Seq Scan onmatch match_1  (cost=0.00..36.75rows=11 width=4)
                 Filter: (loser_id =1)
   ->  Hash LeftJoin  (cost=0.07..1.15rows=3 width=44)
         Hash Cond: (player.player_id = w.opponent_id)
         ->  Seq Scan on player  (cost=0.00..1.05rows=3 width=36)
               Filter: (player_id <>1)
         ->  Hash  (cost=0.04..0.04rows=2 width=12)
               ->  CTE Scan on w  (cost=0.00..0.04rows=2 width=12)
   ->  Hash  (cost=0.04..0.04rows=2 width=12)
         ->  CTE Scan on l  (cost=0.00..0.04rows=2 width=12)

comprasconencanto1