code logs -> 2017 -> Sun, 12 Nov 2017< code.20171111.log - code.20171113.log >
--- Log opened Sun Nov 12 00:00:28 2017
00:45 Vornlicious [Vorn@Nightstar-fbrltd.sub-70-211-132.myvzw.com] has quit [Connection closed]
00:45 Vorntastic [Vorn@Nightstar-1l3nul.res.rr.com] has joined #code
01:25 Kindamoody is now known as Kindamoody[zZz]
01:32 celmin|away is now known as celticminstrel
03:07 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving]
03:07 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code
04:18 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [Connection closed]
04:27 RchrdB [RchrdB@Nightstar-qe9.aug.187.81.IP] has quit [Connection closed]
05:01 Derakon is now known as Derakon[AFK]
05:07 Vornlicious [Vorn@Nightstar-h0nrf1.sub-70-211-140.myvzw.com] has joined #code
05:10 Vorntastic [Vorn@Nightstar-1l3nul.res.rr.com] has quit [Ping timeout: 121 seconds]
05:22 Soare [cute@Nightstar-gvt3mb.ip-164-132-106.eu] has joined #code
05:23 abilal [a@Nightstar-lgpmok.dfri.se] has quit [Ping timeout: 121 seconds]
06:00 celticminstrel is now known as celmin|sleep
07:44 Kindamoody[zZz] is now known as Kindamoody
09:05 Vornicus [Vorn@Nightstar-1l3nul.res.rr.com] has quit [Ping timeout: 121 seconds]
09:20 macdjord|dance is now known as macdjord
10:35 macdjord is now known as macdjord|slep
10:43
< Vornlicious>
Whee more date conversions. This one: "12L17" is today. (It skips "I" as a month letter)
10:49 Kindamoody [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Client exited]
10:52 Kindamoody|autojoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code
10:52 mode/#code [+o Kindamoody|autojoin] by ChanServ
11:09 Kindamoody|autojoin is now known as Kindamoody|out
11:19
< Vornlicious>
Whee. Code(letter)-index({64;65;96;97},Match(code(letter), {64;73;96;105}))
11:26
<@gnolam>
12L17? What kind of date format is that supposed to be?
11:29
< Vornlicious>
Some crazy bullshit that some beverage bottlers use in their manufacturing codes. Day, month (as a letter, a-m skipping I), two digit year
11:30 gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has quit [Ping timeout: 121 seconds]
11:31 Kindamoody|out [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds]
11:31
< Vornlicious>
And now he will never know
11:36 gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has joined #code
11:36 mode/#code [+o gnolam] by ChanServ
11:37
< Vornlicious>
Some crazy bullshit that some beverage bottlers use in their manufacturing codes. Day, month (as a letter, a-m skipping I), two digit year
11:38
<@gnolam>
(Power outage)
11:39
<@gnolam>
Ah.
11:39
<@gnolam>
justwhy.gif
11:40
< Vornlicious>
Also popular is 7315, ones digit of year and then day of year
11:42
<@gnolam>
... what is wrong with your bottlers
11:42 Kindamoody|autojoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code
11:42 mode/#code [+o Kindamoody|autojoin] by ChanServ
11:42
< Vornlicious>
A startlingly large number of things.
12:16 Soare [cute@Nightstar-gvt3mb.ip-164-132-106.eu] has quit [Ping timeout: 121 seconds]
13:11 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code
13:32 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code
14:14 VirusJTG [VirusJTG@Nightstar-42s.jso.104.208.IP] has quit [Connection reset by peer]
14:14 VirusJTG [VirusJTG@Nightstar-42s.jso.104.208.IP] has joined #code
14:14 mode/#code [+ao VirusJTG VirusJTG] by ChanServ
15:39 Jessikat` [Jessikat@Nightstar-r1fphs.dab.02.net] has joined #code
15:39 celmin|sleep is now known as celticminstrel
15:41 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [Ping timeout: 121 seconds]
17:05 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code
17:06 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [[NS] Quit: Leaving]
17:06 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has joined #code
17:12 macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code
17:12 mode/#code [+o macdjord] by ChanServ
17:12 macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds]
17:44 Kindamoody|autojoin is now known as Kindamoody
18:11 mac [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code
18:11 mode/#code [+o mac] by ChanServ
18:13 macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds]
19:13 RchrdB [RchrdB@Nightstar-qe9.aug.187.81.IP] has joined #code
19:25 macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code
19:25 mode/#code [+o macdjord|slep] by ChanServ
19:28 mac [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds]
19:42 Kindamoody is now known as Kindamoody|afk
19:51 KiMo|autorejoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has joined #code
19:54 Kindamoody|afk [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds]
20:01 Vornicus [Vorn@Nightstar-1l3nul.res.rr.com] has joined #code
20:02 mode/#code [+qo Vornicus Vornicus] by ChanServ
20:02 gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has quit [[NS] Quit: Computer maintenance]
20:11 Jessikat [Jessikat@Nightstar-bt5k4h.81.in-addr.arpa] has quit [Ping timeout: 121 seconds]
20:15 himi [sjjf@Nightstar-v37cpe.internode.on.net] has quit [Ping timeout: 121 seconds]
20:17 gnolam [lenin@Nightstar-ego6cb.cust.bahnhof.se] has joined #code
20:17 mode/#code [+o gnolam] by ChanServ
20:19 IRCFrEAK [GK-1WM-SU@Nightstar-820.c0d.45.5.IP] has joined #code
20:20 IRCFrEAK [GK-1WM-SU@Nightstar-820.c0d.45.5.IP] has quit [RecvQ exceeded]
20:23 IRCFrEAK [g_k_800k@Nightstar-ji9.phg.27.23.IP] has joined #code
20:24 IRCFrEAK [g_k_800k@Nightstar-ji9.phg.27.23.IP] has quit [RecvQ exceeded]
20:42 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving]
20:42 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has joined #code
21:14
<&[R]>
https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fb e <-- TIL POSIX locale stuff is completely screwy
21:16
<&McMartin>
"Everything uses UTF-8 for "char" and what doesn't is broken and terrible anyway."
21:16
<&McMartin>
This is an important PSA: NEVER USE UTF-8 INTERNALLY, ONLY AT THE EDGES.
21:16
<&McMartin>
Use UCS-4 internally.
21:17
<&McMartin>
UTF-8 strings are not indexable.
21:18
<&[R]>
UCS-4 is what?
21:19
<&McMartin>
32-bit integers, one per Unicode code point.
21:20
<&McMartin>
UTF-8 is a variable-length encoding, and one of the rather important string operations is "character at offset X"
21:20
<&McMartin>
You do not want that to be O(x).
21:21
<&McMartin>
You do not want your substring operations to contain half or a third of a code point in them.
21:27
<&McMartin>
UTF-8 is fantastic for exactly those cases where you can treat "string" as "opaque, immutable binary blob"
21:34 KM|autorejoin [Kindamoody@Nightstar-k1m8bj.mobileonline.telia.com] has joined #code
21:35 KM|autorejoin is now known as Kindamoody
21:35 mode/#code [+o Kindamoody] by ChanServ
21:37 KiMo|autorejoin [Kindamoody@Nightstar-eubaqc.tbcn.telia.com] has quit [Ping timeout: 121 seconds]
21:40 Soare [mm@Nightstar-vg7om8.danwin1210.me] has joined #code
21:56 Jessikat [Jessikat@Nightstar-mob28h.dab.02.net] has joined #code
21:58 Jessikat` [Jessikat@Nightstar-r1fphs.dab.02.net] has quit [Ping timeout: 121 seconds]
21:58 macdjord [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has joined #code
21:58 mode/#code [+o macdjord] by ChanServ
22:01 macdjord|slep [macdjord@Nightstar-a1fj2k.mc.videotron.ca] has quit [Ping timeout: 121 seconds]
22:27 Degi [Degi@Nightstar-8jctgl.dyn.telefonica.de] has quit [[NS] Quit: Leaving]
22:38 himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has joined #code
22:38 mode/#code [+o himi] by ChanServ
22:57
< RchrdB>
FWIW, just because your data is in UCS-4 or UTF-32, that doesn't by itself mean the operation "index into it" has a meaning that you'd like or expect. Unicode codepoints don't correspond 1:1 with characters on screen. The existence of things like combining characters means that if you index into a sequence of UTF-32 codepoints at a random position, you may actually be indexing into the middle of a grapheme cluster. (A grapheme cluster is defined
22:57
< RchrdB>
as "thing that looks like a character on screen, and which the text editing cursor should usually treat as an atomic unit for the purposes of selection with the mouse and the left and right and backspace keys.")
23:00
< RchrdB>
You kind of can index into a UTF-8 string at a random point; the encoding is designed so that if you index into a UTF-8 string at a random byte, you can, without any ambiguity, get from the byte you're looking at, which may well be in the middle of a codepoint, to the start/end of the next/previous codepoint boundary by looking at only a small constant number of bytes to the right or left of the one you're currently looking at (I think 6 bytes).
23:03
< RchrdB>
There's a programming language called "Emily" made by Andi McClure which gets unicode really, really right. I think the way she did this is that text strings are UTF-8 bytes in memory and they have a bunch of different methods that return different kinds of iterators; one for iterating by byte, one for iterating by codepoint, one for iterating by grapheme cluster.
23:03
< RchrdB>
I have a vague notion that at least one other programming language did about the same thing but I can't remembr.
23:03
<&McMartin>
Yeah, two clarifications on that
23:04
< RchrdB>
?
23:04
<&McMartin>
(a) I'm intentionally ignoring the issue of grapheme clusters/glyphs, because anything past "codepoints" is largely agreed to be something that machines should only have to deal with occasionally and at the human-interaction level
23:05
<&McMartin>
(b) Iterators are O(n) for access and that's part of what I'm considering bad
23:05
< RchrdB>
What meaningful processing can you implement with only indexing into codepoints?
23:06
<&McMartin>
substring, split
23:07
<&McMartin>
You *can* do it with bytes
23:07
<&McMartin>
But your life is harder unless you're doing one of the ones that UTF-8 was intentionally designed to make work just like it would for Latin-1
23:09
<&McMartin>
Also, if you're working in C
23:09
<&McMartin>
Which you are, because that's what the link was about
23:09
< RchrdB>
Substring on codepoints is kind of a weird buggy operation anyway, since it can rip grapheme clusters in half by accident.
23:10
<&McMartin>
In some cases that's even correct behavior!
23:10
<&McMartin>
But yes, the usual issue you solve with this is that when you allocate space for a string of at least N length you don't have to measure twice.
23:12 himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has quit [Ping timeout: 121 seconds]
23:12
< RchrdB>
I have a vague memory of hearing that one of the new fashionable compiled PLs like Go or Rust or something had a regex library where you ask it to do operations on unicode codepoints and it builds automata that implement the thing you're asking for but do it on the UTF-8 bytes instead of via a separate expensive decoding step.
23:13
<&McMartin>
Rust does that.
23:13
<&McMartin>
Rust strings are also, however, immutable blobs.
23:14
<&McMartin>
Rust also uses WTF-8
23:14
<&McMartin>
Which covers a certain infelicity introduced by UTF-16.
23:17
< RchrdB>
are wtf-8 and cesu-8 the same thing?
23:18
<&McMartin>
No. WTF-8 can losslessly send strings that UTF-16 rejects there-and-back.
23:18
<&McMartin>
WTF-8 is specifically because the only things that do UTF-16 these days actually accept arbitrary sequences of 16-bit numbers.
23:19
<&McMartin>
(Because they're all systems that embraced Unicode Too Soon while being standardized.)
23:19
< RchrdB>
I knew what WTF-8 is but I had the wrong definition in my head for what CESU-8 was.
23:19
< RchrdB>
Yeah, unlucky that.
23:21
<&McMartin>
Actually looking at how it works, CESU-8 is Just Really Awful Across The Board.
23:26 himi [sjjf@Nightstar-dm0.2ni.203.150.IP] has joined #code
23:26 mode/#code [+o himi] by ChanServ
23:26
< RchrdB>
I had to go look up the definition to check and yes it's pretty dumb.
23:26 Derakon[AFK] is now known as Derakon
23:26
< RchrdB>
WtF-8 is the one which has a sensible reason for actually existing. >_>
--- Log closed Mon Nov 13 00:00:29 2017
code logs -> 2017 -> Sun, 12 Nov 2017< code.20171111.log - code.20171113.log >

[ Latest log file ]