Started adding source code “instructions” to the coins::Instruction set, and also started implementing translation between my representation structs and the coins byteco. Now I kind of want to wrap some of this work into ByteCo functions, to make it easier to translate between things.
The further I get into defining the internal representation with byteco, the more I wonder if I need any of the other parts. At this point I’ve been heavily refactoring for a while, and a lot of this feels redundant. I really don’t want to spend too much time on this because eventually I’d really like to write the compiler in co itself, so too much time here means that throws out more time later. But the reason I came over here to write is that the SourceToken now feels kind of bare, compared to the information available in byteco, like imports and the beginning/end of macros and routines. I think this will actually come in handy when we use this for persisting source code, though for now we can also use it for constructing macros and routines from the database.
I have all these haphazard blocks of internal representation, but that seems like it’s really bogging me down now. Really, there are two types of representation: a source module, and a library symbol. A source module is something that can be edited easily and has the full “programmer context” like human readable names and open/close semantics, while the library symbols have specific formats. The source module should use instructions that capture the entire set of co language semantics. A routine symbol is a block of machine instructions in addition to the `hash routine call` and `hash routine address` instructions. A macro symbol should start with a parameter count followed by any valid source tokens but should not use any source tokens that use names. The routine symbols should be completely “deterministic” and should have no “fluff” to throw off the hashes. What about macros though? It would be nice to have more info available for the macros, but you’d lose the “determinism”… this might be okay though because the duplication burden ends up being on developers, not end users? It might honestly be more correct too, since a macro is supposed to capture source symbols, not just machine instructions. So yeah, I think macro should use all source tokens, and also start with the actual parameter names not just a count.
What if instead of all the representation that I’ve already built out, I go from TextTokens directly into ByteCo? The source module would just directly be ByteCo, and we could run a validator on the document that makes sure the different contexts of source code are all semantically correct?
Another thing I’m thinking about is that I’d like to be able to have the “source code” available for routines. This means that we’d need a way to link from a compiled routine hash back to available sources for that machine code. At the same time, any routine would be inspectable directly as assembly, but you’d lose all of the nice parts of the SourceCo subset of the ByteCo. It sucks how much my vision of this system depends on like an integrated operating system/development environment, because it feels like it would be so cool, but is so far away.