January 16, 2023

My shower thoughts from last night were revolving around how “formal” the source code should be. Particularly: is it valid to write source code that doesn’t compile? This is specifically around the potential mismatch between “text” that references some named symbol and the potential lack of that symbol existing in the development environment. I kinda believe that you shouldn’t be able to generate valid byteco without being able to resolve symbols; at that point you just have text that implies the potential for valid byteco, but not an actual program. How does this work for macros that interpolate a name? I guess any given source module will be able to statically check if all invocations of a macro create compilable name. That means when we encounter a static use, we can use an index to a table of name/hashes, but for a dynamic one we’ll have to resolve the name and then search for the name.

So every source code file has a block of name->hash definitions. When we validate byteco, every symbol reference must either be static which resolves the name and inserts an index to the block of name/hashes, or if it is a symbol reference that is interpolated dynamically we “run” through the code to see what all the concrete names end up being and make sure all those names are defined.

Effectively, a valid module must have all of its names defined locally, which acts as an overlay of the host’s environment. The development environment should have tools to let the programmer know when the name->hash of the module and the local library are out of sync.

In addition to the imported symbols that map a module-local name to a hash, there can also be symbols defined with a name inside the module itself. In order to resolve these symbols, we might have to do an initial run through the code.

---

Been all over the place today, feeling like this linear text format of organizing my thoughts is not letting me quite wrap my head around the problems. I now have text being parsed into byteco, but my next step is still to figure out how to validate it and store the name mapping for the module, both of which I still don’t think I’ve sufficiently defined. What is “validation”? Technically, the parsing from text -> byteco does a good amount of validation, making sure that illegal operations don’t exist in different definition types, and that “chords” of text tokens are properly grouped. In this way, the parsed byteco is already valid, it’s just that names have not been resolved, and all references use labels instead of hashes. So the real thing that “validation” does is resolve names into hashes, and replace all static name references for symbols (routines and hashes) with the hash. It would also be helpful to have a local “list” of those symbols so that hash usage can refer to index instead of the full 32 byte hash every time. Another question is if there should be a validated layout or grouping of the file. I think it would be nice and clean if order didn’t matter, so that you don’t have to necessarily declare names in the order that they are used. However, this goes against the goal of simplifying implementation and “having things be intentional and mechanical”. The only thing is that it would force people to layout their files from smallest components to biggest, and not allowing biggest to smallest, and either way could arguably be the better way to do it. I think that’s reason enough alone to allow arbitrary order; the goal is for the system to be well designed, but it is ultimately a tool for creating other programs and it should give way to robustness in use instead of design.

Onto the other part: how do we actually represent the “name table”, and in particular the hashes of nested imports? Direct name -> hash list would be pretty straightforward, but I also want to know *which* co and *which* stack when importing from .co.stack in addition to the end symbol %macro. a simple way would be to say `.co => abc123 ; .co.stack => def456`, but this is strange because we are using paths as names. Does it make sense to have a separate section of the resolved header that documents path hashes? And then what happens to these import sections, are they just redundant? We’d have all this source code that describes to an environment the names and paths it wants to resolve, but we also want to store the hard results. “Here is what I intended, and here is what I got. You can try your hand at my intentions in your environment, but I’ve included my results for reproducibility.” So perhaps the validation/resolving stage will just add a new “resolved” block that includes all of the names/paths/hashes? Maybe we don’t even comb through the byteco and replace label uses with symbol index uses. We lose out on the compression factor, but it would make the source binary so difficult to follow. Though perhaps that’s not a problem? Without the import blocks, the names in the source code already don’t make much sense. When else would a snippet of a source code file make much sense if it was removed from its context? The other weird part is that if you are editing a file, you get this coupling between indexes and references. Might make things just too complicated. Can always just use a compression algorithm if you really need to.

So this feels like we’ve settled on the idea that validation will just make updates to some new definition block that maps names and paths to concrete hashes. The name-to-hash is dead simple, and we’ve explored an awkward option for paths. Is there a better way to encode the resolved paths?

Devlog

Devlog < 2023 < January < 16