code logs -> 2024 -> Tue, 23 Jan 2024< code.20240122.log - code.20240124.log >
--- Log opened Tue Jan 23 00:00:53 2024
01:48 Degi_ [Degi@Nightstar-kut0um.pool.telefonica.de] has joined #code
01:50 Degi [Degi@Nightstar-dd0lbr.pool.telefonica.de] has quit [Ping timeout: 121 seconds]
01:50 Degi_ is now known as Degi
03:03 ToxicFrog [ToxicFrog@ServerAdministrator.Nightstar.Net] has quit [The TLS connection was non-properly terminated.]
03:04 ToxicFrog [ToxicFrog@ServerAdministrator.Nightstar.Net] has joined #code
03:04 mode/#code [+ao ToxicFrog ToxicFrog] by ChanServ
06:02 Vornicus [Vorn@Nightstar-ivektl.res.spectrum.com] has joined #code
06:02 mode/#code [+qo Vornicus Vornicus] by ChanServ
06:04 Kindamoody [Kindamoody@Nightstar-pqh9gl.tbcn.telia.com] has quit [Ping timeout: 121 seconds]
06:40 Kimo|autojoin [Kindamoody@Nightstar-pqh9gl.tbcn.telia.com] has joined #code
06:41 mode/#code [+o Kimo|autojoin] by ChanServ
09:28 JustBob [justbob@Nightstar.Customer.Dissatisfaction.Administrator] has quit [NickServ (RECOVER command used by JustLurk)]
09:28 JustBob [justbob@Nightstar.Customer.Dissatisfaction.Administrator] has joined #code
09:28 mode/#code [+o JustBob] by ChanServ
10:56 Reiver [quassel@Nightstar-ksqup0.co.uk] has quit [[NS] Quit: Reblooting.]
11:30 Reiver [quassel@Nightstar-ksqup0.co.uk] has joined #code
11:30 mode/#code [+ao Reiver Reiver] by ChanServ
17:34 Emmy [Emmy@Nightstar-qo29c7.fixed.kpn.net] has joined #code
20:14
<@macdjord>
Mahal, Reiver: Re: My SQL question: Mahal's suggestion does not work:
20:14
<@macdjord>
SELECT machine_id, timestamp, user FROM scans WHERE CONCAT(machine_id, max(timestamp)) in (SELECT CONCAT(machine_id, max(timestamp)) FROM scans GROUP BY machine_id`);
20:14
<@macdjord>
Erm.
20:14
<@macdjord>
I mean:
20:14
<@macdjord>
postgres=# CREATE TABLE scans (machine_id INT NOT NULL, scan_time INT NOT NULL, usr VARCHAR NOT NULL, result VARCHAR, PRIMARY KEY (machine_id, scan_time));
20:14
<@macdjord>
CREATE TABLE
20:14
<@macdjord>
postgres=# INSERT INTO scans (machine_id, scan_time, usr) VALUES (1, 1, 'alice'), (1, 2, 'bob'), (2, 2, 'carol');
20:14
<@macdjord>
INSERT 0 3
20:14
<@macdjord>
postgres=# SELECT machine_id, MAX(scan_time), usr FROM scans GROUP BY machine_id;
20:14
<@macdjord>
ERROR: column "scans.usr" must appear in the GROUP BY clause or be used in an aggregate function
20:14
<@macdjord>
LINE 1: SELECT machine_id, MAX(scan_time), usr FROM scans GROUP BY m...
20:14 Kimo|autojoin is now known as Kindamoody
20:16
<@macdjord>
My solution with a subquery and concatenation *does* work, but as mentioned, is both awkward and inefficient:
20:17
<@macdjord>
postgres=# SELECT machine_id, scan_time, usr FROM scans WHERE CONCAT(machine_id, scan_time::VARCHAR) in (SELECT CONCAT(machine_id, MAX(scan_time)::VARCHAR) FROM scans GROUP BY machine_id);
20:17
<@macdjord>
machine_id | scan_time | usr
20:17
<@macdjord>
------------+-----------+-------
20:17
<@macdjord>
1 | 2 | bob
20:17
<@macdjord>
2 | 2 | carol
20:17
<@macdjord>
(2 rows)
20:18
<&Reiver>
hrm
20:19
<&Reiver>
Wish I had a SQL engine to play with for this one
20:19
<@macdjord>
(For anybody who missed my original statement of the problem: I am trying to find an SQL query against the table defined above, which will return, for each different machine, the time of the most recent scan, and the user who performed said most recent scan.
20:19
<@macdjord>
)
20:19
<&Reiver>
ohh
20:20
<&Reiver>
You don't want the most recent scan per user, you want the most recent scan per machine and which user did it
20:20
<@macdjord>
Yes.
20:20
<&Reiver>
two ways, now I need to decide which is most efficient
20:21
<@macdjord>
Though the question of 'what is the most recent scan per user, and which machine did they do it on?' is essentially equivalent~
20:21
<&Reiver>
Yes
20:21
<&Reiver>
We gave you most recent scan per user per machine
20:23
<&Reiver>
Does your log data have UIDs instead available?
20:24
<&Reiver>
*Can* it have UIDs? It would simplify life to have a key.
20:24
<@macdjord>
Hrm. Let me take a look...
20:29
<@macdjord>
Okay, yes, we do have a unique ID, which consists of an arbitrary 64-bit integer.
20:31
<@macdjord>
Table now looks like: CREATE TABLE scans (row_id BIGSERIAL NOT NULL, machine_id INT NOT NULL, scan_time INT NOT NULL, usr VARCHAR NOT NULL, result VARCHAR, PRIMARY KEY (row_id));
20:32
<@macdjord>
(Note: 'BIGSERIAL' is PostgreSQL's auto-incrementing 8-byte integer type with values starting at 1.)
22:02
<@macdjord>
Reiver: Found the solution: https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column
22:04
<&Reiver>
Ah, yes, that top solution was the one I was writing for you when I got a work call, sorry
22:05
<&Reiver>
(I would personally actually use the second one a lot, but I appreciate it's a harder one to get ones head around.)
22:05
<&Reiver>
If you find this has meaningful performance issues, try the join trick instead.
22:06
<&Reiver>
And apologies for not getting to it before you found the answer anyway. :)
22:06
<@macdjord>
NP
22:07
<&Reiver>
(But I can, at least, confirm that was the solution I was going to provide.)
22:07
<&Reiver>
There's an even better one using rank(), but that's engine dependent.
22:08
<@macdjord>
Would that be the next answer down? https://stackoverflow.com/a/38854846/1503005 ?
22:09
<&Reiver>
I have not syntax-checked it in my head, but that looks the correct shape, yes
22:10
<&Reiver>
ROW_NUMBER(), RANK() etc there's several that do similar-but-interesting-things
22:10
<@macdjord>
The INNER JOIN solution seems both simplest and sufficiently good.
22:10
<&Reiver>
Yes, it's the one I use if I don't have window functions
22:11
<&Reiver>
The top one is the most *maintainable*, as it's more human-readable as to what's going on, but I like the join trick (I am a sucker for join tricks, once you have the set theory in your head they work great and compilers love them)
22:11
<@macdjord>
When you say 'join trick' which one are you talking about? the LEFT OUTER JOIN one?
22:16
<@macdjord>
I have no problem understanding how that one works, but I don't immediately see why it should be substantially faster than the INNER JOIN solution, assuming there's an index on (machine_id, scan_time).
23:07 himi [sjjf@Nightstar-o4k.pal.170.103.IP] has quit [Connection closed]
23:11 himi [sjjf@Nightstar-o4k.pal.170.103.IP] has joined #code
23:11 mode/#code [+o himi] by ChanServ
23:37 Emmy [Emmy@Nightstar-qo29c7.fixed.kpn.net] has quit [Ping timeout: 121 seconds]
--- Log closed Wed Jan 24 00:00:54 2024
code logs -> 2024 -> Tue, 23 Jan 2024< code.20240122.log - code.20240124.log >

[ Latest log file ]