BNF AND EBNF
1. A grammar, G, is a tuple (N, Sigma, P, S) where N is a set of nonterminals,
Sigma is a set of terminals, P is a set of productions, and S is an element
of N. In the case of BNF, these components can be defined as follows.
2. N is a set of strings. The strings begin with a left angle bracket (<),
contain a sequence of alphabetic characters
(ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz) and spaces ( )
beginning with an alphabetic character, and terminate with a right
angle bracket (>).
3. The following BNF rules can be used to describe this set. My comments are
placed within Cstyle comments brackets.
::= A  B  C  D  E  F  G  H  I  J  K  L  M 
N  O  P  Q  R  S  T  U  V  W  X  Y  Z 
a  b  c  d  e  f  g  h  i  j  k  l  m 
n  o  p  q  r  s  t  u  v  w  x  y  z
::= _ /* A space character. I am trying to approximate
the graphic used in the Algol 60 report */
::= 
::= /* The null string of symbols */

::=
::= < >
N is a set of nonterminal names.
4. Sigma is another set of strings, this one of terminal symbols. Some of these
symbols are metacharacters used by BNF and some of them are used by the
target language. Backus used the following symbols in the Algol 60 report.
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
0123456789
+*/
quotient /* colon overstriking hyphen approximates this symbol */
^ /* actually an up arrow */
<
<= /* actually they are overstruck */
=
>= /* actually they are overstruck */
/= /* actually they are overstruck */
()[]'`.
10 /* as a subscript for base notation */
,:;
 /* the PL/I not character */
/\ /* as a single character */
\/ /* as a single character */
implication /* a subset symbol */
equivalence /* three horizontal lines */
I am not listing space here although Algol recognizes it. It is defined
above, and I am going to use it to separate tokens rather than be
included in them. Backus was not explicitly concerned with lexical issues.
For example, the report can be interpreted as indicating that Algol 60
keywords could be treated as single terminal symbols. In defining BNF, I
will punt by making the following definition.
::= 
::=  /* Note the ambiguity here */
::= ::= /* Note the ambiguity here */
::= /* What ever you need */
::=
::= 
::= 
Sigma is a set of tokens.
5. P is a set of productions. A production has a left hand side and a
right hand side. They are separated by a . The right hand
side is either empty or contains a sequence of alternatives. An
alternative is a concatenation of symbols. A symbol is either a nonterminal
or a token.
::=
::=
::= 
::= 
::=
::= 
::= 
P is a set of productions.
6. S is merely an element of N.
7. Note that there are several problems with the notation described above.
+ There is a confusion between metacharacters and terminal characters
when trying to define the metacharacters of BNF.
+ Lists are awkward to express.
+ Grouping/factoring is only available by making a new definition.
8. BNF is capable of describing all context free grammars.
9. EBNF (Extended BNF) adds some features that makes it easier to express
productions while still describing the same set of grammars. There
are several extensions and some transformations provided by EBNF.
For example, as languages have gotten more complex, the number of
nonterminals found in a grammar has gone up, while the number of terminals
has stayed constants. It makes sense therefore to quote the terminals
and let the nonterminals stand alone.
+ Nonterminals now stand alone, without being surrounded by angle brackets.
+ Therefore, terminals must now be enclosed in apostrophes.
+ The left hand side is now separated from the right hand side by a
right pointing arrow (>) instead of the (::=) symbol. The productions
are thought of a rules where the left hand side generates or produces
the right hand side.
+ Rules are terminated by periods, avoiding the potential for confusion
when empty strings occur in the grammar.
+ Parentheses can now be used for grouping, allowing nested alternatives.
Hence,
A > B  C X.
B > T U  W Z.
can be expressed with the single rule
A > (T U  W Z)  C X.
assuming B is not used elsewhere.
+ Optional constructs can be placed in square brackets. Hence,
A > X  X A  X B.
can be expressed by the rule
A > X [ A  B ].
+ The Kleene star (*), used in describing regular expressions can be
used to describe lists. In particular, A* indicates zero or more
occurrences of the symbol A.
A > X  X B.
B > empty  Y B.
can be expressed with
A > X (Y*).
assuming B is not used elsewhere.
Sometimes curly brackets are used instead of *, avoiding the need for
parentheses. The above example could then be expressed as the following.
A > X {Y}.
+ A variation of * is the use of + to indicate one or more occurrences.
A > X B.
B > Y  Y B.
can be expressed with
A > X (Y+).
assuming B is not used elsewhere.