Finished my conversion to big-endian values, though now that it's done I see the benefits of having kept it; it makes more sense from a stack perspective to have the endianness increasing as the stack goes up. Oh well, I think I'll live with this decision for now, because it still makes sense for the other reasons I made the change for in the first place. Just goes to show that there's really not one better way to do it.
At this point, writing bulk data out to a device in a way that looks like it does in the source code works, so it's probably time to move onto the next step, which would be getting the CPU to receive data sent from a device.
---
Spoke too soon on the last one, I should actually first check to see if I can write multiple buffers out on the device with the send flag, *without* using the done flag at the same time, since there are many instances where device IO will be more than 64B at a time.
This might also call for more visualization in the debug inspector, to at least see the state of the device registers, if not an entire device.