Coffin, Eric wrote: > Embedded comments start with [ecoffin:] > >> Coffin, Eric wrote: > [ecoffin:] The ASCII/non-ASCII representation has greater significance > then just the `" feature. It can also be significant for ( `` ) token gluing, depending on how macro bodies undergoing substitution of actuals for formals are tokenized (either accepting or ignoring escaped identifiers). But your example illustrates my concern precisely. > [ecoffin:] The ongoing dialog about white space (what is > it, when to trim it, what function it has) comes into play. For > example, the LRM states (1800/D4 section 5.6.1) that for an escaped > identifiers the trailing white space is not a part of the identifier > itself. Knowing how to treat white space and how to represent an > escaped identifier in a macro expansion are important. Consider the > following trio of macros and their usage. What are reasonable expansions? > > `define escapedIdent \Tuesday > `define variant_A `"Tuesday`" > `define variant_B `"`escapedIdent`" > string S1 = `variant_A; > string S2 = `variant_B; > > Most likely SV users would agree that S1's initializer is "Tuesday", but > what about S2's initializer? Should it be "Tuesday", "\Tuesday ", > "\Tuesday", or a macro expansion error? The answer depends upon if you > treat the macro bodies as ASCII, and thus maintain the escaped > identifier's leading slash and trailing whitespace, or if you treat the > macro bodies as lists of tokens. Because of `", these choices affect the bytes in S2. I don't see a plausible macro /expansion/ error, unless `variant_A is undefined after \Tuesday consumes the newline leaving no terminator for the first macro. (Yuck!) Maybe you mean macro definition error. The line "`define escapedIdent \Tuesday" only needs to be fully "tokenized" up to the character after the macro identifier. If this introduces a formal argument list, then that list is of course tokenized, but here we can choose NOT to accept full Verilog "identifier"s, we are free to describe more classical C-style preprocessor identifiers. The reasons one wants escaped identifiers for the physical components being modeled really don't apply to the preprocessing language. Allowing a many-to-one mapping from text strings to token sequences would always be problematic, since the `" rules seem to ask for fairly strict text substitution. And my final argument for a dedicated "macro_formal_arg_identifier" is that it leaves no doubt that there is exactly one point in your algorithm where the macro body is tokenized to look for just a few traditionally delimited words - not to activate a lot of reduction rules best left to the core parser, and - not to allow macro actuals to reference macro formals. This view favors the expansion "\Tuesday". It is neutral to negative on the subject of trimming of leading whitespace, and it is negative on trimming trailing whitespace. This is the classical view that macro processing manipulates text and knows very little about the language into which that text is being forged. To preserve generality, it wants to allow you to juxtapose and glue text together before the core language even tokenizes it. Similar concerns then apply to macro invocations, where the actual arguments are also not fully tokenized, >>> C++ treats macro expansion as a text-to-token transformation. This is a more advanced view integrating the meta and object language in a way that SV users probably also expect - ignoring all monster cases. C++ has a table of many-to-one spellings of their language tokens. I don't know if it "canonicalizes" all token streams before expanding macros, or leaves it raw, or just squeezes whitespace down to a minimum. If SV were to fully tokenize macro formal arg lists and macro bodies, it would have to answer a question C++ doesn't confront: many-to-one spellings for identifiers. `define escFormal( \F[1] , \Tuesday ) \ `"Tuesday F[1] \Tuesday \F[1] `" If the formal list and the body are "tokenized" for SV parsing, we might get `escFormal(A,B) expanding to "B F[1] BA" SV has a little bit of context-dependent lexical analysis that could complicate this. The string "\n " and the quoted escaped identifier `"\n `" bear watching. There is also a danger of double tokenizing to be avoided. Notice that the canonical form of an escaped identifier can appear in a context where it will look like another escaped identifier. Will such things /always/ be reduced again, or would the user have to "``" glue it to a trailing whitespace-character? Multiple token reductions could come from two sources: 1) After token gluing, something has to "retokenize" the graft to see what congealed. 2) If the macro actuals are not tokenized before substitution, they'll need to be tokenized before parsing. Your system is very careful to arrange BOTH treatments: literal substitution for `` operands and pre-tokenizing for non-glued expansions. So when your system is used: `escFormal(\\once ,\\twice ) (as defined above) expands to: "\twice F[1] \twice\once". If the body of escFormal HAD NOT begun and ended with `", the expansion would not be a single string token, but a sequence of seven tokens: \\twice F [ 1 ] \\twice \\once i.e. NO additional reduction of escaped identifiers occurs. The motivation for using this text-to-token approach has to be to reach a useful definition for token gluing. So I think we need to work a hard example. When one operand is an already recognized token and the other is raw actual argument text I think that the recognized token reverts to its original text form before the compound is retokenized. If a doubly escaped identifier is the operand of a gluing operator with a macro formal \\twice ``macro_formal and the macro's actual is [2], I believe we want one token (not two or three) as the result and it should be \\twice[2] . This cannot happen with the raw text methods, and it won't happen correctly unless your algorithm makes it explicit. But by considering token streams and choosing this nice ordering of immediate and deferred expansions you have the descriptive tools to build this the right way. That /might/ involve special treatment of the whitespace in the macro actual adjacent to the `` glue operator, but too many special rules raise implementation costs. Hybrids of these opposite viewpoints are possible, but seem arbitrary and confusing to me. Multiple reductions of escape sequences must be avoided - they cannot happen INSIDE the fixed-point expansion loop. Your algorithm makes the advanced approach fairly safe and as intuitive as this topic will even be (i.e. just barely). Good work, Greg >>> >> >> Is your proposal (below) different from C++ pre-processing in >> any major respect? >> > > [ecoffin:] I tried to follow C++'s treatment of macro expansion as > closely as I could. >> In the "... looking for identifiers matching formal argument names" >> activity, there is an implied tokenization of the unexpanded body. >> We need to specify the rules for that tokenization, in particular >> whether escaped identifiers are to be recognized. >> >> On first review, though, this looks a lot better than the existing >> section (23.2 of 1800-2005)! >> >> Greg >> >> >> >>> ********************************************************************* >>> >>> Here is a rough outline of a possible way to expand macros that might >>> give some >>> consistency to the various SV implementations out there. >>> >>> Order of actions to expand a macro: >>> >>> - After the macro use has been identified in the SV source text, >>> gather the use's actual arguments. >>> >>> - Independently expand all actual arguments, but do not substitute >>> them into the macro body. If the macro use did not specify an >>> actual and a default value was specified then expand the default >>> text. Some SV implementations first expand and then substitute, >>> while others do not. Note that all arguments should be expanded >>> even if they are not used within the macro body. >>> >>> - Walk through the macro body looking for identifiers matching >>> formal argument names. Replace any macro formal argument with its >>> expanded actual text, unless the macro formal is adjacent to a >>> tick-tick (''). If the formal arg is next to a tick-tick, then >>> literally substitute the (unexpanded) actual text for the formal arg. >>> >>> - do { >>> - Perform token-pasting upon the expansion's body. Token >>> pasting should have no effect upon the `" and the `\`" macro >>> operators. Furthermore, token pasting ignores any white space, >>> and will not paste comments, nor paste across comments. >>> - Rescan the resulting body for any more macros to expand. >>> Expand them. Do not expand `" or `\`". >>> } while the expansion body changes >>> >>> - Expand the special macro-operators, tick-quote `" and >>> tick-slash-tick-quote `\`" >>> >>> -Eric >>> >>> >>> >>> >>> >>> >>> >> >> -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.Received on Tue Nov 20 21:50:14 2007
This archive was generated by hypermail 2.1.8 : Tue Nov 20 2007 - 21:50:37 PST