code logs -> 2019 -> Sun, 20 Jan 2019< code.20190119.log - code.20190121.log >
--- Log opened Sun Jan 20 00:00:52 2019
00:19
<&McMartin>
All right. Time to see if I broke everything
00:21 Derakon[AFK] is now known as Derakon
02:26
<&McMartin>
Heh, cute
02:27
<&McMartin>
ARMv4 has an Unsigned Multiply Long instruction UMULL destl, desth, op1, op2 - multiplies 2 32-bit numbers to produce one 64-bit result across two registers
02:27
<&McMartin>
op1 isn't allowed to be one of the dest operands (later chips make this allowed, and apparently no actual chips enforced this)
02:27
<&McMartin>
But op2 is, and the assembler I'm using will swap the order of the arguments if it can make that fit
02:33 * McMartin fiddles with his ARM code, realizes he can shave two bytes off his x86 code.
02:43 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code
02:46 Kindamoody is now known as Kindamoody[zZz]
02:49 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds]
03:01 * McMartin then realizes he can shave three more bytes, and two memory operations, off the x86 code.
03:01
<&McMartin>
Now the ARM and x86 code are the same size. >_>
03:18 Degi [Degi@Nightstar-72nf2v.dyn.telefonica.de] has quit [Connection reset by peer]
03:24
<&Reiver>
wut
03:25
<&McMartin>
ARM and x86 are different kinds of chips, built and designed in very different ways
03:25
<&McMartin>
So instructing them to do roughly the same thing will look, at the chip level, very different
03:25
<&McMartin>
However, for this rather messy function I am computing, the code for computing it is exactly the same size even though the two functions are organized completely differently.
03:26
<&McMartin>
They also both use the same number of registers, which is usually not something that happens, because ARM has lots of registers and uses that to make up for the fact that it has trouble with large hardcoded constants
03:26
<&McMartin>
But these all sort of balanced out here
03:26
<&McMartin>
The part I was improving was "I have two 64 bit numbers, and I want to multiply them together into a 128-bit number, but I only actually care about bits 32 through 63."
03:27
<&McMartin>
And I had a considerable amount of wasted work on the x86 code, including juggling some values around that I could instead simply refrain from trashing
03:28
<&[R]>
"ARM has lots of registers and uses that to make up for the fact that it has trouble with large hardcoded constants" <-- I'm kind of curious what x86 does to make that less of a problem. Wouldn't the resulting code still have to load the constant into a register to do something?
03:29
<&McMartin>
Nope! It feeds the constant from the instruction directly into the ALU.
03:29
<&McMartin>
It is completely legal to say IMUL EBX, 0x12345678
03:29
<&McMartin>
Which multiplies EBX by that value.
03:29
<&McMartin>
The instruction is, admittedly, ten bytes long
03:30
<&McMartin>
Er
03:30
<&McMartin>
six bytes long.
03:30
<&McMartin>
ARM, meanwhile, you have to basically say...
03:30
<&McMartin>
... well, OK
03:31
<&McMartin>
What you say is LDR r0, =&12345678; MUL r1, r0, r1
03:32
<&McMartin>
But LDR r0, ={whatever} puts {whatever} in a read-only data table somewhere else, computes the distance of that table entry from the instruction in question, and emits LDR r0, [r15+nnnn] with the nnnn as computed.
03:32
<&McMartin>
(r15 is the program counter)
03:32
<&McMartin>
(You can do computed GOTO by doing math with it, or virtual dispatch or function returns by assigning variables/table entries to it)
03:36
<&McMartin>
32-bit ARM (but not 16- or 64-bit) also has this incredibly wacky thing where you can arbitrarily bitshift one of the arguments on the way in to almost any instruction
03:36
<&McMartin>
So while the x86 code for what I was doing had a whole bunch of fancy multiprecision bitshift instructions in it, the ARM code did not...
03:37
<&McMartin>
... but it lost no space or time on this because I could fold the many more operations needed to execute multiprecision bitshifts "by hand" into the rest of the computation as it went. <3
03:38
<&McMartin>
(In each case that code ended up just about as tight as it could be made, with a little bit of slack on the x86 side that it needed in order to do bits of the later computation, so, no overall penalty paid.)
03:39
<&McMartin>
And even with that, in the end each architecture used exactly 4 32-bit registers to do all its work.
03:42
<&McMartin>
OTOH, a platform I'd *like* to have this routine for would be the Genesis, but its CPU, despite having 32-bit registers, not only lacks a "multiply two 32-bit numbers, get a 64-bit number" routine, like x86 has had since the 386 and ARM has had since the original Game Boy Advance...
03:42
<&McMartin>
... it doesn't even have a "multiply two 32-bit numbers, get a 32-bit truncated number" routine ;_;
03:42
<&McMartin>
(The 68000's multiply instruction is 16x16->32 and it is the only one it has)
03:44
<&McMartin>
The routine in question is a PRNG, and while it's not cryptographically strong, it's better than every random number generator in libc. On x86 and ARMv4 it's also only 100 bytes long (92 bytes ROM, 8 bytes RAM).
03:46
<&McMartin>
I suppose the 68k way to do it would be to just hit RAM harder.
06:46 Vorntastic [uid293981@Nightstar-6br85t.irccloud.com] has joined #code
06:46 mode/#code [+qo Vorntastic Vorntastic] by ChanServ
10:42 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code
11:56 Kindamoody[zZz] is now known as Kindamoody
12:18 Degi [Degi@Nightstar-qb8cbe.dyn.telefonica.de] has joined #code
12:33 Kindamoody is now known as Kindamoody|afk
12:35 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds]
15:23
<&[R]>
https://twitter.com/da_667/status/1086874402959097856
16:50 Kindamoody|afk is now known as Kindamoody
17:59 Degi [Degi@Nightstar-qb8cbe.dyn.telefonica.de] has quit [Ping timeout: 121 seconds]
18:14 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has joined #code
18:25 * McMartin has a glorious and terrible vision
18:41
<&McMartin>
also wat
18:41
<&McMartin>
"the Raspberry Pi, a powerful "micro-computer" that is used for digital-making and coding"
18:42 * McMartin gabefaces
18:42
<&McMartin>
I suppose this is technically correct
18:42
<&McMartin>
(the best kind of correct!)
18:42
<&[R]>
-*- McMartin has a glorious and terrible vision <-- a vision about Poettering dying, but everyone continues to use his shitware anyways?
18:43
<&McMartin>
No, this involved hard-coding working binary search trees as C89 literals
18:46 Vorntastic [uid293981@Nightstar-6br85t.irccloud.com] has quit [[NS] Quit: Connection closed for inactivity]
19:51 himi [sjjf@Nightstar-v37cpe.internode.on.net] has quit [Ping timeout: 121 seconds]
21:33 Reiv [NSkiwiirc@Nightstar-ih0uis.global-gateway.net.nz] has joined #code
21:33 mode/#code [+o Reiv] by ChanServ
22:01 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds]
22:02 himi [sjjf@Nightstar-1drtbs.anu.edu.au] has joined #code
22:02 mode/#code [+o himi] by ChanServ
22:10 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code
22:10 mode/#code [+o Alek] by ChanServ
22:40 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds]
22:43 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code
22:43 mode/#code [+o Alek] by ChanServ
22:49 Emmy [Emmy@Nightstar-9p7hb1.direct-adsl.nl] has quit [Ping timeout: 121 seconds]
23:27 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has quit [Ping timeout: 121 seconds]
23:31 Alek [Alek@Nightstar-o723m2.cicril.sbcglobal.net] has joined #code
23:31 mode/#code [+o Alek] by ChanServ
--- Log closed Mon Jan 21 00:00:53 2019
code logs -> 2019 -> Sun, 20 Jan 2019< code.20190119.log - code.20190121.log >

[ Latest log file ]