I came to this question in the context of macros, such as where can macros be substituted and where not, but the question is relevant elsewhere as well. In 1364-2005, 3.1 says, "The types of lexical tokens in the language are as follows: - White space - Comment - Operator - Number - String - Identifier - Keyword" That does not seem to be complete. For example, a semicolon at the end of a statement is a token, but we do not think of it as an operator. And this says that a number is a single token, whereas 3.5.1 says that an integer constant number is made up of 3 tokens! Tokens are not explicitly mentioned in many places in the LRM. One of the few places is 19.3.1, where it says, "The text specified for macro text shall not be split across the following lexical tokens: - Comments - Numbers - Strings - Identifiers - Keywords - Operators" The intent seems to have been to say "all tokens except white space". Would that be correct? The question then comes up, what is a token and what is not a token? At first, one might propose that tokens are delimited by wherever white space is allowed. If white space is not allowed between two pieces of text, then we have one token and not two. The first sentence seems probably correct. If white space is allowed, then we have two tokens. But I think the second sentence is not correct. I think we have cases where white space is not allowed, but it is still two tokens and not one. For example, a time literal is an unsigned or fixed point number followed by a time unit. A space is not allowed between them. Yet I think it is still two tokens?Do we have an exhaustive list of where space is not allowed between tokens? I think tokenizing is also context specific. I remember that in 1364, we discussed whether @* is one token or two, whether @(*) is one or two or four. It is never formally settled, but I think that de facto, the answers were two and four, respectively. What about, for example, a (vw) entry in a UDP table (see Table 8-1)? Maybe you compiler people can publish a list of tokens so we can get agreement and commonality on this? And where are macro calls allowed? For any token, whatever a token is? Only where white space delimiters are allowed? I don't think the latter is correct. For example, again in a time literal, can a time literal be written as `A`B, where `A is a number and `B is a timeunit? This is related to the question of whether macros insert white space before and after them, which I think we discussed. See also Mantis 1339. Thanks, Shalom Shalom Bresticker Intel Jerusalem LAD DA +972 2 589-6852 +972 54 721-1033 I don't represent Intel
This archive was generated by hypermail 2.1.8 : Thu Oct 26 2006 - 00:27:43 PDT