Project < Co

Co is a concatenative programming language that compiles to COINS. It allows COINS assembly instructions in addition to parameterized macros that get resolved at assembly time and routines that are invoked at runtime. Because the language was designed in tandem with the COINS spec, Co source code tokens are actually integrated into the single-byte COINS spec, allowing for a simple and coalescent binary representation.

Source Code

Design Goals

Coming Soon.

Installation

From Source

To comple from source, you'll need Rust installed on your system. Assuming you have that, clone the repo and install like so:

						$ git clone https://git.sr.ht/~jakintosh/co
$ cargo install --path co

There are very minimal dependencies and source files, this should take only a few seconds.

Download

If you don't mind downloading executable binaries off the internet (or don't have Rust installed), you can download the latest Linux binary below and drop it in a directory on your $PATH.

co v0.2.4 (Linux)

To do the same from the command line:

						$ sudo curl http://coalescent.computer/downloads/co -o /usr/local/bin/co
$ sudo chmod 755 /usr/local/bin/co

To verify:

						$ shasum -a 256 `which co`
> 2dc2e18307690e36ed4788bd55eb626f13ab7e2d8d18c20f8d906b4af067e43e

You can now use the co CLI via the command line. The program binary as of 0.2.4 using Rust 1.74.0 is 670kb on Arch Linux.

Usage

Installing Co will add the co executable to your path, which is the entire interface to the language. The co program manages both assembly of executable ROM files, and the export to and management of the local symbol library.

To start exploring the CLI on your own:

					$ co --help

co assemble

$ co assemble <source> <output>

Used to assemble a Co source file into an executable rom.

For example, co assemble coffee.co coffee.rom will read the ./coffee.co source file, assemble it into a ROM, and write that ROM to ./coffee.rom.

co library import

$ co library import [-n/--name] <source>

Used to import all of the named symbols in a Co source file to the symbol library.

For example, co import -n .co.stack stack.co will read the ./stack.co source file, and then import all of its named symbols into the .co.stack namespace in the symbol library.

co library resolve

$ co library resolve <source>

Used to resolve all of the imports in a co source file to concrete hashes in the symbol library.

For example, co import resolve stack.co will read the ./stack.co source file, parse all of its imports, and then resolve those names to hashes from the symbol library.

co library list

$ co library list <namespace>

Used to browse the symbol library namespaces.

For example, co library list . will list all names at the root of the library. This includes both named symbols and other namespaces. Using co library list on a namespace table will show another list of names, and using it on a symbol will output the textual bytecode representation of that symbol.

Build System

Co includes a small build system that helps automate the library import, symbol resolving, and rom assembly functions of co. This command is used to run a build.

$ co build <build-file>

This is an example of what a co build script looks like, and the current commands it supports.

						resolve   ./stack.co
import    ./stack.co     into   .co.stack
resolve   ./loop.co
import    ./loop.co      into   .co.loop
resolve   ./reverse.co
import    ./reverse.co   into   .co.stack
assemble  ./reverse.co   to     ./reverse.rom

Executing Programs

The co binary only deals with transforming source code into COINS assembly. To actually run the generated bytecode, you'll need a virtual machine that executes against the COINS spec. The current reference implementation is called Cohost (installation|usage|source code).

Language Features

Coming Soon

Language Specification

Co has three fundamental rules.

First is that tokens are whitespace delimited with either spaces, tabs, or newlines.

Second is that there are five types of tokens:

Runes, which are a specific set of single-character symbols.
Commands, which are a text label prefixed by a single-character symbol.
Names, which are plain-text at the beginning of a symbol definition.
Opcodes, which are the plain-text representations of the machine opcodes.
Number Literals, which are decimal or hex numbers that render to binary.

Third is that there are four types of definitions:

Routines, which are named units of assembly that are jumped to and executed at runtime.
Macros, which are named units of source code that are rendered at assembly time.
Imports, which make routines and macros from the library referencable in a source file.
Assembly, which is any code that exists outside of the previous definition types.

Runes

There are 8 runes in total:

+ denotes the beginning of an import definition.
: denotes the beginning of a routine definition.
% denotes the beginning of a macro definition.
; denotes the end of any of the prior definitions.
[ denotes the beginning of a list of macro parameters.
] denotes the end of a list of macro parameters.
( denotes the beginning of a comment.
) deontes the end of a comment.

Remember that all tokens are whitespace delimited, meaning that (invalid comment) will parse as an error, and that ( valid comment ) is correct.

Some examples of Runes in source code:

						% push-one LIT8 1 ;                ( Push 8-bit '1' on the stack )
: one-plus-one LIT8 1 DUP8 ADD8 ;  ( Push 1, duplicatei it, add )

Again, notice the usage of spaces around the Runes.

Commands

A command looks like this: >send. Broken down, it is composed of {marker}{label}, which in the previous example would have the marker be '>' and the label be "send".

There are 9 types of markers:

> denotes a routine call.
@ denotes the address of a routine.
~ denotes a macro usage.
' denotes a macro parameter.
| denotes an absolute padding value.
$ denotes a relative padding value.
# denotes an anchor definition.
* denotes the absolute address of an anchor.
& denotes the relative address of an anchor.

There are 3 rules for labels:

>, @, and ~ must refer to a known symbol.
| and $ must parse to 16-bit unsigned integers.
* and & must refer to a defined anchor in the file.

Some examples of commands in action:

						% push-one LIT8 1 ;        ( macro: push 1 on stack )
: add-one ~push-one ADD8 ; ( routine: add 1 to byte on top of stack )
				
|0x0000 #start             ( set padding to 0, create 'start anchor' )
    ~push-one              ( push 1 on the stack )
    >add-one               ( call 'add-one' routine )
    &start JPR16           ( jump to the 'start' anchor )

Import Commands

There are a special set of command markers that are only valid inside an Import definition. There are 3 types of import markers:

. denotes that the label is a Path, and also separates its components.
: denotes that the label is a Routine name.
% denotes that the label is a Macro name.

We will cover how these are used in an import definition later on.

Names

When creating a new Routine or Macro definition, you must give it a local name so that it can be referenced by other Commands. Name tokens are position dependent, and are required after : and % Runes that mark the beginning of a Routine and Macro, respectively.

Examples of names:

						% macro-name ( macro body goes here ) ;
: routine-name ( routine body goes here ) ;

Opcodes

To actually issue commands to the CPU, you write Opcodes. All of the other features of the language exist to create helpful abstractions around rendering useful sequences of opcodes, but ultimately the opcodes are the only part of the language that makes the CPU do anything at all. Co is designed specifically to work with the "Coalescent Instruction Set", abbreviated to COINS.

COINS Opcodes look like this: LIT8 ADD32 SWP16R DUP64. Each opcode has a 3-letter all-caps identifier (with the exception of OR), followed by an 8, 16, 32, or 64 to specify the bit-width of the instruction. Finally, the stack manipulation opcodes can optionally have an R appended to the end to specify that they operate on the return stack, instead of the data stack.

For a deeper dive on what all the opcodes are, and their function, check out the COINS repository.

Number Literals

The final type of token is a number literal. There are two types of number literal:

0 the decimal number literal, which is used with LIT opcodes.
0x00 the hex number literal, which can be used in all number contexts.

Decimal literals can only be used after a LIT8, LIT16, LIT32, or LIT64 opcode. These literals will automatically adapt to the specified literal size, without requiring any padding. For example, LIT8 1 will render to binary 00000001 while LIT16 1 will render to binary 0000000000000001. When used outside of a LIT context, decimal literals will render to a 64-bit unsigned integer, making them of limited use.

Hex literals are prefixed by 0x and must include all padding. They can include _ characters as visual separators. Like decimal literals, they can also be used with LIT opcodes, but must match the opcode with full padding. LIT8 0x01 is valid, but LIT16 0x00 is not; LIT16 0x0001 must be used. Hex literals are also required for padding commands. Furthermore, Hex literals can be used anywhere to render directly to bytes in-place. This can be helpful if you want to define binary data.

Examples of number literals in use:

						|0x0000 #program
	LIT8 0
	LIT16 0xC0DE
	LIT32 1337
				
#data
	0x00010203_04050607
	0x08090A0B_0C0D0E0F

Routines

Routines are the primary way of abstracting executable code in Co. A routine begins with the : rune, followed by its name, a set of source code tokens, and ends with a ; rune. To call a routine from somewhere else in code, use a command compsoed of the > routine call marker and the routine's name.

Because a routine must render out all of its internal calls to generate a content hash, a routine cannot call itself using the `>` command. In a future version of the COINS spec, there will be a new Opcode that allows you to push the address of the current routine on the stack, and the Co toolchain will automatically detect that when trying to use > recursively. As of this writing, recursion is not implemented.

Routines cannot contain Commands that use absolute addresses, since a Routine in practice may end up anywhere in memory. It can only use relative addresses.

An example of some routines (shown earlier):

						: sip      DUP8 >swallow SWP8 SUB8 ;
: swallow  >extract >absorb ;
: extract  LIT8 4 MUL8 LIT8 10 SWP8 DIV8 ;
: absorb   LIT8 0xC0 SWP8 DVW8 ;
				
LIT8 250 LIT8 10 >sip

Macros

Macros allow the reduction of source code size through their ability to create reusable blocks of source code. A macro command begins with % rune, followed by a name, a set of source tokens, and ending with a ; rune. The source code in a macro will be rendered directly in place of a ~ macro use during assembly.

An example of macros being used to reduce a while loop boilerplate:

						% while-start #while-start DUP8 LIT8 0 EQU8 &while-end JCR16 ;
% while-end &while-start JPR16 #while-end ;
				
: interesting-routine
	~while-start
		( do some work )
	~while-end
;

Parameterized Macros

After declaring the name of a macro, you can also choose to provide a list of named parameters, which you can then interpolate into labels, using { and } inline of the label.

These parameterized macros are powerful, but limited in their scope. Since they interpolate strings, they don't get the benefit of their symbol references being linked to content hashes, and so they can't be validated and canonicalized on import. However, they provide interesting and useful flexibility in certain use-cases.

An example of a paremeterized macro running an infinite loop based on its input:

							% infinite-loop [ macro param ]
	#{macro}-loop-start
		~{macro} '{param}
	*{macro}-loop-start JMP16
;

Import

Earlier, we covered some of the Commands that are unique to the Import definition. An import definition begins with a + rune, followed by a Path, a set of import commands, and ends with a ; rune.

A Path begins with the . marker, and then contains a set of path names separated by more . symbols. The root path is just ., but a deeper namespace might look like this: .co.stack.

An actual import command looks like this: %stash8=stash or %stash8. The structure of this command is {symbol-type}{name}={local-name}, where the ={local-name} is optional if you want to keep the {name} for local use. In the example, the % means we're importing a Macro, stash8 is the name of the symbol being imported, and stash in the first example is the override name to be used in the file.

A full example of an import block with its special commands might look like this:

						+ .co.stack
	%stash8=stash %unstash8=unstash
;

Related Devlogs