I finally got enough of the project administration stuff in order to be able to come back to this. I just read through all of my notes from this year, and it seems I left off trying to figure out the specifics of source code validation. I think it would be helpful to start fresh with my exploration of this problem, especially since in the gap since my last entry I designed and implemented another little language that may have shifted my approach.
An author will start with a text document, writing routines and macros that are given a name. As part of the file, there can be some raw assembly that can reference those symbols by name. When the author runs the file through the assembler, it will create a local symbol dictionary and resolve those name usages from the locally defined symbols, rendering macros and appending routines as necessary to create an executable ROM.
The author can then “import” that file, which will compile the symbols into byteco and then insert them into the local library with the name they are defined with. Those symbols can then be used by other source modules using their name in the local library. When a file imports a symbol from the library, it looks up that path in the library and then adds the hash to the “local symbol dictionary” during any assembly process. This is how a library import gets resolved to a context name.
Aside from everything the author is doing, there is also the reader. This is someone who opens up the source code to compile or make changes outside of the original conditions of authoring, meaning they do *not* have the same library as the author and so their names will not resolve the same way. (The reader may even be the exact same person as the author… just later on, or on a different computer.) For the reader, there will need to be two components to a symbol identifier: the original unicode “name” (to show intention) and the concrete “hash id”. The name is always just passive information, to be used to relink to updated routines if you want, but is generally just there to show the original intent. The true identifier is always the hash. This means that each source code file must maintain a mapping from its imports to hashes, and that updating this mapping is an intentional change. There are many tooling-level processes that can automate or augment this, but at the lowest level this will always need to be there to translate between human identifiers and hashes.
---
Writing this all out like this makes me realize that, for the short term, I could actually not bother with a binary source code representation, and instead create a secondary index file for each source file with all of the name->hash mappings. This could even be in plain text, which would make the whole system work with git as well. Once CDB is up and running, this wont be necessary, but it might be nice in the mean time to rely on existing code sharing.
Some other things I noticed was that it would be nice to add additional features to the CLI specifically around managing those mappings; scan for discrepancies, update specific connections, force update all, etc.
---
To get moving on the implementation again, I need to figure out where to pick back up. Right now, I’m able to parse a text source into ByteCo. The first thing to do is probably assemble a binary.