CG
Author:
Latest Version: | |||
---|---|---|---|
Release Date: | |||
Company: | Sierra | ||
Publication Status: | Unpublished | ||
Developer(s): | |||
Interface: | {{{Interface}}} | ||
Language: | C | ||
Open Source: | Closed | ||
Source Availability: | No | ||
License: | None | ||
Platform: | DOS | ||
Type: | Compiler, In-house, AGI Development Tool, Development Tool | ||
Localization: | English | ||
Website: | www.sierra.com |
Description
CG was Sierra's in-house AGI Game Compiler.
General compiler behavior: In general, CG.EXE is intentionally designed to allow for changes/additions to the action and test commands without needing to rebuild the compiler itself. The compiler only manages the syntax used to access the commands; the commands themselves all need to be declared for the compiler every time it is run. The compiler only has a small number of keywords that are hard coded in the program. Because of this, the compiler was able compile all versions of AGI source code, without needing to update the compiler every time the AGI interpreter was changed.
The compiler processes each source file separately. The sequence of actions is
- parse input file name, build output file name
- reset compiler parameters
- open source file and assign to buffer space
- create output file (overwriting any existing file) and assign to buffer space
- initialize the hash table, adding predefined symbols
- allocate space for output, and for AGI messages
- compile the source (errors are displayed as they are encountered)
- close the output file
- display results, including total number of errors encountered
- repeat for next source file
The compiler uses a single pass when compiling source (the term 'preprocessor' is used occasionally throughout this article to describe some aspects of the compiler, but keep in mind that unlike more modern compilers, all input is compiled linearly, in a single pass; so all declarations and defines need to be listed BEFORE they show up in source code).
The compiler converts all text fields (symbols) it encounters into a hash value (by summing the ascii values of all characters in the symbol, and then returning that value MOD 203). This value is then compared against entries in the compiler's symbol hash table to determine what it represents. If more than one symbol has the same hash value, the compiler creates linked lists for each hash value to avoid conflicts. If the number of symbols is too large such that all memory is used up to hold them, the compiler will throw an error and quit. For obvious reasons, no duplicate symbols are allowed; if a duplicate is detected, the compiler throws an error. Symbols are case sensitive.
On startup, the following symbols are added to the hash table: %include, %tokens, %test, %action, %flag, %var, %object, %define, %message, %view, #include, #tokens, #test, #action, #flag, #var, #object, #define, #message, #view, goto, if, else, FLAG, OBJECT, MSG, WORD, NUM, MSGNUM, VIEW, VAR, ANY, WORDLIST. These can be broken down into three groups:
Preprocessor symbols:
The symbols starting with '%' or '#' are compiler commands that add symbols to the hash table which tell the compiler how to handle other symbols it will encounter later. Note that there are two versions of each, with either the '#' or '%' character to start. There is no difference between them; the compiler just offers the flexibility to use either format.
%include: This symbol is used to include another source or header file. Syntax is:
%include "filename.ext"
The filename must be enclosed in double-quotes. It can include path information, but not wildcards. Included files can be nested, but there is a limit of 5 layers of nesting. More than that will cause an error.
%tokens: This symbol is used to load the WORDS.TOK file. Syntax is:
%tokens "WORDS.TOK"
The filename doesn't have to be WORDS.TOK, but it must be a valid AGI word file. It must be enclosed in double-quotes. It can include path information, but not wildcards. If you do not assign a WORDS.TOK file with the %tokens command, the compiler will enter an infinite loop if it tries process a 'said' command.
%test/%action: These symbols are used to declare AGI command symbols. None of the AGI commands are hard coded into the compiler; they must be declared at the beginning of the source code, typically in a header file that is included with the %include command. Syntax is:
%test testname([arg1, arg2, ...]) cmdnum
%action actionname([arg1, arg2, ...]) cmdnum
The names of commands are well established by precedent (and by example from released original game code), but you can assign any name you want for any command. The arguments must be enclosed in parentheses, separated by commas. If no arguments, empty parentheses must be included. Arguments must be one of the pre-defined argument types (see below). The number of the command (the byte value that goes into the compiled logic) must follow the command declaration. It is imperative that the arguments assigned match exactly with the AGI command associated with the byte command number; the compiler will happily create code that has any combination of command byte values and arguments, but if they don't match what the interpreter expects, the code won't run.
%flag/%var/%object/%view: Flags, variables, screen objects and views can be assigned symbols using the respective designated preprocessor command. Note that there isn't a symbol command for numbers; the compiler doesn't convert numbers into a symbol. Syntax is:
%flag flagname flagnum
%var variablename varnum
%object objname objnum
%view viewname viewnum
These symbol types correspond to the expected argument types that are used in command declarations. For example, if a command is declared with %action as '%action assignn(VAR, NUM)', a symbol of type %var must be passed as the first argument, and a number must be passed as the second argument. Note that these types are completely arbitrary, meaning as long as the command declaration matches the argument type passed in source code, the compiler won't have any problem. For example:
%action assignv(VIEW, OBJECT) 3
%view a_view 5
%object an_object 10
assignv(a_view, an_object);
will compile, and when run in AGI it would assign variable 10 to variable 5. Of course, there is little practical value in using constructs such as this.
The OBJECT argument type is also used in most original Sierra game source for both screen objects and inventory objects. Care must be taken by the programmer to keep them straight, because the compiler won't do it for you.
%define: The generic define command allows you to assign a symbol to a number, text, or another symbol. Syntax is:
%define define_name define_value
Because the define value can be another symbol, you can do things like:
%var v0 0
%define currentroom v0
or
%var v200 200
%define lvar1 v200
%define counter lvar1
addn(counter, 1); [ same as addn(lvar1, 1) or addn(v200, 1)
Symbols can be nested in this manner as deep as you want. As long as each symbol is defined before it is referenced, the compiler will continue substituting define values until it reaches a non-text symbol type (var, num, flag, msgnum, etc.)
%message: message symbol declarations are a bit different from other symbol declarations. Syntax is:
%message msgnum "message text"
Note that the number precedes the text value, unlike other symbols, where the number is last. The message text must be included in quotes. It cannot be split, i.e. the compiler will not automatically concatenate multi-line strings. The message text can be a symbol that was previously %defined to be a string value. For example
%define aboutmsg "AGI Game, by Author"
%message 1 aboutmsg
will compile with no errors.
Keywords:
There are only three key words that the compiler recognizes - 'if', 'else' and 'goto'. The syntax for 'if' and 'else' used by the compiler is identical to the AGI 'canon' syntax. 'if' statements must be in parentheses, curly brackets separate code blocks, etc. The 'and' and 'or' features are the same ('&&' and '||' as operators, and 'or'ed tests must be in parentheses). The exclamation point '!' is used for negation of test commands.
The goto command does not use parentheses. Using parentheses will cause an error. The syntax is:
goto label
Labels are defined with a colon followed by the label name, with no space between them. If there is a space between them, the compiler will create a label using a null string (""), and the following text will be interpreted by the compiler as the next command symbol. Argument types:
Arguments used in action and test commands must be designated as one of ten different argument types. When declaring a command, the compiler expects a symbol with a type that matches the argument type, and it must be one of the predefined argument types for each argument. They are case sensitive, so 'var' is not the same as 'VAR'. With the exception of numbers and vocabulary words (from WORDS.TOK), all arguments must be passed as a symbol that was previously declared using the appropriate 'preproccessor' command. For example, if an argument is of type FLAG, you must pass a symbol declared with the '%flag' command.
- FLAG: Use this argument type when you want a command to use a flag argument.
- OBJECT: Use this argument type when you want a command to use a screen object argument.
- MSG: This argument type is not used by CG.EXE or AGI; it appears to be a legacy type that no longer works. The internal value assigned to it will actually create an error if you try to use this argument type in a command declaration.
- WORD: This is another legacy argument type. If used, the compiler will expect a single word from the WORDS.TOK file. There are no AGI commands that take a single word as an argument. (Earlier versions of AGI did use a 'said' command that took a single word as an argument.)
- NUM: Use this argument type when you want a command to use a numeric argument value.
- MSGNUM: Use this argument type when you want a command to use a message number as an argument value. The message must be properly declared with %message before it is used in a command.
- VIEW: Use this argument type when you want a command to use a view argument.
- VAR: Use this argument type when you want a command to use a variable argument.
- ANY: This argument type should not be used in command declarations. It is an internal type that is used when the compiler is handling shorthand syntax such as 'v0 = v1;'. Technically, you could use this in a command declaration, in which case, any valid symbol would be compiled. But using strict argument typing helps prevent bugs by forcing the game programmer to use correct argument types.
- WORDLIST: This argument type is only used by the 'said' command. It acts as a placeholder for one or more words from the WORDS.TOK file. Its placement at the end of the list suggests it was added after the 'said' command was changed from having a single word argument to a variable number of arguments. Unlike modern AGI compilers (WinAGI and AGIStudio for example), argument values are not passed as double-quoted strings; instead, they are passed quote free, and the dollar sign ($) is used in place of spaces. For example, if your word in WORDS.TOK is 'save game', it would look like this in a 'said' command:
said(save$game)
Shorthand Syntax:
The compiler provides limited support for shorthand syntax in lieu of command names. But instead of just recognizing the shorthand command and directly adding the appropriate byte code, the compiler actually inserts the matching command symbol into the data stream, as if it had been typed in the source code and then compiles that symbol. This means the declarations of shorthand commands must exactly match the internal spelling. For example, you can't create a custom action command for the assignn function (byte code 3); you must declare it as 'assignn'. (You could create a #define value to assign another different command text value to assignn though.) The supported shorthand commands are:
- '++' and '--' can be used as shorthand for increment/decrement. The operators must precede the variable being modified. So
++variable1; [ OK
will compile, but
variable1++; [ syntax error
will throw an error.
- assignn/assignv, addn/addv and subn/subv can be replaced with 'v# = #/v#', 'v# += ##/v#' and 'v# -= ##/v#'. For addition and subtraction, you can't use the longer notation 'v# = v# + ##'. For example:
v1 = v2; [ OK, same as assignv(v1, v2);
v1 += 1; [ OK, same as addn(v1, 1);
v1 -= v3; [ OK same as subv(v1, v3);
v1 = v1 + 2; [ NOT OK, syntax error
- left and right indirection can be replaced with '@v# = #/v#' and 'v# = @v#'. Note this is different from modern compilers that use the asterisk/star character (*). For example:
@v1 = 1; [ OK, same as lindirectn(v1, 1);
@v2 = v3; [ OK, same as lindirectv(v2, v3);
v4 = @v5; [ OK, same as rindirect(v4, v5);
*v1 = 1; [ NOT OK, wrong indirect symbol
- for test commands, '==', '>', and '<', can be used in place of equaln/equalv, greatern/greaterv and lessn/lessv. '!=', '>=' and '<=' can be used in place of the negated versions of these commands.
- flags can be tested by their name; i.e. 'if(flag1)' will compile as if it were 'if(isset(flag1)'.
- variables can also be tested by name; 'if(var1)' will compile as if it were 'if(greatern(var1, 0)'
Miscellaneous Syntax Information:
Commas and semi-colons are completely interchangeable. You can use either to separate arguments in a command, or to mark the end of a line.
The compiler does not require the end of line marker. Commands can be separated by one or more spaces, a line feed (not a carriage return), a semi-colon, or a comma. Line feeds (ascii value 10) but not carriage returns (ascii value 13) mark new lines. Carriage returns are completely ignored.For example:
assignn(v1, 1); [ OK
assignn(v2; 2) ,, assignn(v3;3) [ OK
assignn
(
v1
,
) [ OK; same as first line
For numeric arguments, the compiler does not enforce unsigned byte values. If a number value is greater than 255, the compiler uses number MOD 256. Negative numbers will also compile without error; the compiler converts them to 2s-complement (and will also MOD it if result is > 8 bits).
The only supported comment tag is the open square bracket ([). Double-slash (//) is not supported by the compiler, nor are block comments.
A 'return' command (byte code 0) is automatically added by the compiler. If the source code also ends with a 'return' command, the resulting compiled logic will in fact end with two return byte codes.
Version History
Version 3.14:
- Original File Date/Time: August 1, 1986 (10:29:58 AM)
- CRC32: 022E711B
- MD5: A70C533DCBDAA58491DD09A10AC1F251
- SHA-1: 4700B789548A53FBE3600ADDB55E4B1C492436E0
Usage
cg room [room...] [-o output_directory] [-b buffer_size]
- -o: Sets the output directory (where the compiled logic files will be saved). The default is the current directory. The output directory can be any legal MSDOS directory name, including relative paths. If the path is not valid, the compiler will quit with an error. If the argument following the -o switch is missing, the current directory is used. If the argument is a valid directory name, but does not exist, the compiler will run, but will raise an error at the end when it can't write the output files.
- -b: Adjusts buffer size. The default is 1000h (4K). If the argument following the -b switch is non-numeric or missing, a value of zero is used, which will cause the compiler to fail immediately. A larger value means disk writes are minimized, but it will also take up more memory.
- -v: Sets 'verbose' mode. This switch does not take an argument. When used, it causes the output to display additional information, including the number of symbols and messages found in the source file, and information on the symbol hash table.
Arguments (command switches and source files) can be in any order. Each argument, including source code files ('room') must be separated by one or more spaces. Filenames have to be valid MSDOS filenames (no long-filenames) and can include path info and wildcard characters ('?' and '*').
If no arguments are passed, the usage information is displayed. If no source files are passed (only one or more command switch) the program switches to console mode, and source files can then be typed in separately. Pressing ENTER adds a file to the list, pressing CTRL+Z and then ENTER sends that list to the program, which are then processed.
Source filenames without an extension are assumed to have the '.cg' extension. If specified, any extension will work just fine. You can also pass a list of files by preceding the file that has the list with the '@' symbol; i.e. 'cg @roomlist.txt' will open the file 'roomlist.txt' and read each line as an input source file name. You can also specify DOS environment variables 'HEAD' and 'TAIL', which are added to the beginning and end of the file list passed on the command line.
Source filename (not extension) must include a number, or the compiler will throw an error. The output file for a source is always the sourcefile truncated at the first number, with the number as the extension. For example, 'logic1.cg' output is 'logic.1', 'log2a3.txt' becomes 'log.2', etc.
The maximum number of source files (including those specified by name, by wildcard and in file lists) is 200, which seems odd since AGI allows up to 256 logics in a game.
References
https://sciprogramming.com/community/index.php?topic=2034.0