266b9d49bfa3d2d16b4111378b1f9794373ee141
The Go Annotated Specification
This document supersedes all previous Go spec attempts. The intent is to make this a reference for syntax and semantics. It is annotated with additional information not strictly belonging into a language spec.
Open questions
- how to delete from a map
- how to test for map membership (we may want an 'atomic install'? m[i] ?= x; )
- compound struct literals? StructTypeName { a, b, c }
- array literals should be easy/natural to write [ 1, 2, 3 ] ArrayTypeName [ 1, 2, 3 ]
- map literals [ "a" : 1, "d" : 2, "z" : 3 ] MapTypeName [ "a" : 1, "d" : 2, "z" : 3 ]
- are basic types interfaces / do they define interfaces?
- package initialization?
Design decisions
A list of decisions made but for which we haven't incorporated proper language into this spec. Keep this section small and the spec up-to-date instead.
- multi-dimensional arrays: implementation restriction for now
- no '->', always '.' - (*a)[i] can be sugared into: a[i] - '.' to select package elements
- arrays are not automatically pointers, we must always say explicitly: "*array T" if we mean a pointer to that array - there is no pointer arithmetic in the language - there are no unions
- packages: need to pin it all down
- tuple notation: (a, b) = (b, a); generally: need to make this clear
- for now: no (C) 'static' variables inside functions
- exports: we write: 'export a, b, c;' (with a, b, c, etc. a list of exported names, possibly also: structure.field) - the ordering of methods in interfaces is not relevant - structs must be identical (same decl) to be the same (Ken has different implementation: equivalent declaration is the same; what about methods?)
- new methods can be added to a struct outside the package where the struct is declared (need to think through all implications) - array assignment by value - do we need a type switch?
- write down scoping rules for statements
- semicolons: where are they needed and where are they not needed. need a simple and consistent rule - we have: postfix and -- as statements
Guiding principles
Go is an attempt at a new systems programming language. [gri: this needs to be expanded. some keywords below]
- small, concise, crisp - procedural - strongly typed - few, orthogonal, and general concepts - avoid repetition of declarations - multi-threading support in the language - garbage collected - containers w/o templates - compiler can be written in Go and so can it's GC - very fast compilation possible (1MLOC/s stretch goal) - reasonably efficient (C ballpark) - compact, predictable code (local program changes generally have local effects) - no macros
Syntax
The syntax of Go borrows from the C tradition with respect to statements and from the Pascal tradition with respect to declarations. Go programs are written using a lean notation with a small set of keywords, without filler keywords (such as 'of', 'to', etc.) or other gratuitous syntax, and with a slight preference for expressive keywords (e.g. 'function') over operators or other syntactic mechanisms. Generally, "light" language features (variables, simple control flow, etc.) are expressed using a light-weight notation (short keywords, little syntax), while "heavy" language features use a more heavy-weight notation (longer keywords, more syntax).
[gri: should say something about syntactic alternatives: if a syntactic form foreseeably will lead to a style recommendation, try to make that the syntactic form instead. For instance, Go structured statements always require the {} braces even if there is only a single sub-statement. Similar ideas apply elsewhere.]
Modularity, identifiers and scopes
A Go program consists of one or more files compiled separately, though not independently. A single file or compilation unit may make individual identifiers visible to other files by marking them as exported; there is no "header file". The exported interface of a file may be exposed in condensed form (without the corresponding implementation) through tools.
A package collects types, constants, functions, and so on into a named entity that may be imported to enable its constituents be used in another compilation unit. Each source file is part of exactly one package; each package is constructed from one source file.
Within a file, all identifiers are declared explicitly (expect for general predeclared identifiers such as true and false) and thus for each identifier in a file the corresponding declaration can be found in that same file (usually before its use, except for the rare case of forward declarations). Identifiers may denote program entities that are implemented in other files. Nevertheless, such identifiers are still declared via an import declaration in the file that is referring to them. This explicit declaration requirement ensures that every compilation unit can be read by itself.
The scoping of identifiers is uniform: An identifier is visible from the point of its declaration to the end of the immediately surrounding block, and nested identifiers shadow outer identifiers with the same name. All identifiers are in the same namespace; i.e., no two identifiers in the same scope may have the same name even if they denote different language concepts (for instance, such as variable vs a function). Uniform scoping rules make Go programs easier to read and to understand.
Program structure
A compilation unit consists of a package specifier followed by import declarations followed by other declarations. There are no statements at the top level of a file. [gri: do we have a main function? or do we treat all functions uniformly and instead permit a program to be started by providing a package name and a "start" function? I like the latter because if gives a lot of flexibility and should be not hard to implement]. [r: i suggest that we define a symbol, main or Main or start or Start, and begin execution in the single exported function of that name in the program. the flexibility of having a choice of name is unimportant and the corresponding need to define the name in order to link or execute adds complexity. by default it should be trivial; we could allow a run-time flag to override the default for gri's flexibility.]
Typing, polymorphism, and object-orientation
Go programs are strongly typed; i.e., each program entity has a static type known at compile time. Variables also have a dynamic type, which is the type of the value they hold at run-time. Generally, the dynamic and the static type of a variable are identical, except for variables of interface type. In that case the dynamic type of the variable is a pointer to a structure that implements the variable's (static) interface type. There may be many different structures implementing an interface and thus the dynamic type of such variables is generally not known at compile time. Such variables are called polymorphic.
Interface types are the mechanism to support an object-oriented programming style. Different interface types are independent of each other and no explicit hierarchy is required (such as single or multiple inheritance explicitly specified through respective type declarations). Interface types only define a set of functions that a corresponding implementation must provide. Thus interface and implementation are strictly separated.
An interface is implemented by associating functions (methods) with structures. If a structure implements all methods of an interface, it implements that interface and thus can be used where that interface is required. Unless used through a variable of interface type, methods can always be statically bound (they are not "virtual"), and incur no runtime overhead compared to an ordinary function.
Go has no explicit notion of classes, sub-classes, or inheritance. These concepts are trivially modeled in Go through the use of functions, structures, associated methods, and interfaces.
Go has no explicit notion of type parameters or templates. Instead, containers (such as stacks, lists, etc.) are implemented through the use of abstract data types operating on interface types. [gri: there is some automatic boxing, semi-automatic unboxing support for basic types].
Pointers and garbage collection
Variables may be allocated automatically (when entering the scope of the variable) or explicitly on the heap. Pointers are used to refer to heap-allocated variables. Pointers may also be used to point to any other variable; such a pointer is obtained by "getting the address" of that variable. In particular, pointers may point "inside" other variables, or to automatic variables (which are usually allocated on the stack). Variables are automatically reclaimed when they are no longer accessible. There is no pointer arithmetic in Go.
Functions
Functions contain declarations and statements. They may be invoked recursively. Functions may declare nested functions, and nested functions have access to the variables in the surrounding functions, they are in fact closures. Functions may be anonymous and appear as literals in expressions.
Multithreading and channels
[Rob: We need something here]
Notation
The syntax is specified in green productions using Extended Backus-Naur Form (EBNF). In particular:
'' encloses lexical symbols | separates alternatives () used for grouping [] specifies option (0 or 1 times) {} specifies repetition (0 to n times)
A production may be referred to from various places in this document but is usually defined close to its first use. Code examples are written in gray. Annotations are in blue, and open issues are in red. One goal is to get rid of all red text in this document. [r: done!]
Vocabulary and representation
REWRITE THIS: BADLY EXPRESSED
Go program source is a sequence of characters. Each character is a Unicode code point encoded in UTF-8.
A Go program is a sequence of symbols satisfying the Go syntax. A symbol is a non-empty sequence of characters. Symbols are identifiers, numbers, strings, operators, delimiters, and comments. White space must not occur within symbols (except in comments, and in the case of blanks and tabs in strings). They are ignored unless they are essential to separate two consecutive symbols.
White space is composed of blanks, newlines, carriage returns, and tabs only.
A character is a Unicode code point. In particular, capital and lower-case letters are considered as being distinct. Note that some Unicode characters (e.g., the character ä), may be representable in two forms, as a single code point, or as two code points. For the Unicode standard these two encodings represent the same character, but for Go, these two encodings correspond to two different characters).
Source encoding
The input is encoded in UTF-8. In the grammar we use the notation
utf8_char
to refer to an arbitrary Unicode code point encoded in UTF-8.
Digits and Letters
octal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' } . decimal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' } . hex_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'a' | 'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F' } . letter = 'A' | 'a' | ... 'Z' | 'z' | '_' .
For now, letters and digits are ASCII. We may expand this to allow Unicode definitions of letters and digits.
Identifiers
An identifier is a name for a program entity such as a variable, a type, a function, etc.
identifier = letter { letter | decimal_digit } .
- need to explain scopes, visibility (elsewhere) - need to say something about predeclared identifiers, and their (universe) scope (elsewhere)
Character and string literals
A RawStringLit is a string literal delimited by back quotes ``; the first back quote encountered after the opening back quote terminates the string.
RawStringLit = '`' { utf8_char } '`' .
`abc` `n`
Character and string literals are very similar to C except: - Octal character escapes are always 3 digits ( 77 not 77) - Hexadecimal character escapes are always 2 digits (x07 not x7) - Strings are UTF-8 and represent Unicode - `` strings exist; they do not interpret backslashes
CharLit = ''' ( UnicodeValue | ByteValue ) ''' . StringLit = RawStringLit | InterpretedStringLit . InterpretedStringLit = '"' { UnicodeValue | ByteValue } '"' . ByteValue = OctalByteValue | HexByteValue . OctalByteValue = '' octal_digit octal_digit octal_digit . HexByteValue = '' 'x' hex_digit hex_digit . UnicodeValue = utf8_char | EscapedCharacter | LittleUValue | BigUValue . LittleUValue = '' 'u' hex_digit hex_digit hex_digit hex_digit . BigUValue = '' 'U' hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit . EscapedCharacter = '' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' ) .
An OctalByteValue contains three octal digits. A HexByteValue contains two hexadecimal digits. (Note: This differs from C but is simpler.)
It is erroneous for an OctalByteValue to represent a value larger than 255. (By construction, a HexByteValue cannot.)
A UnicodeValue takes one of four forms:
1. The UTF-8 encoding of a Unicode code point. Since Go source text is in UTF-8, this is the obvious translation from input text into Unicode characters. 2. The usual list of C backslash escapes: n t etc. 3. A `little u' value, such as u12AB. This represents the Unicode code point with the corresponding hexadecimal value. It always has exactly 4 hexadecimal digits. 4. A `big U' value, such as 'U00101234'. This represents the Unicode code point with the corresponding hexadecimal value. It always has exactly 8 hexadecimal digits.
Some values that can be represented this way are illegal because they are not valid Unicode code points. These include values above 0x10FFFF and surrogate halves.
A character literal is a form of unsigned integer constant. Its value is that of the Unicode code point represented by the text between the quotes.
'a' 'ä' '本' 't' '