Krishanu Debnath wrote: >>> >> Now consider this example: (note the embedded comments) >>> >> >>> >> module sample; >>> >> >>> >> string s; >>> >> >>> >> initial >>> >> begin >>> >> s = "\x41"; // this means now s is "A". ASCII value of A is 0x41. >>> >> $display("value of s %s \n", s); >>> >> >>> >> s = "\x4142"; // does this mean s is "A42" ? >>> >> $display("value of s %s \n", s); >>> >> >>> >> s = "\x41\x42"; // does this mean s is "AB" ? >>> >> $display("value of s %s \n", s); >>> >> >>> >> s = "\x4"; // less than two characters followed by x, >>> >> // so it will be not treated as hex number. >>> >> $display("value of s %s \n", s); >>> >> end >>> >> endmodule >>> >> >>> >> Does the above make sense? The question makes a lot of sense. The LRM is very far from definitive on this subject. To be fair, the C standard where this syntax got started is a bit ambiguous, too. Its BNF says the octal escapes are 1-3 digits and its hexadecimal escapes go for as long as they can. But then C says two confusing extra constraints: Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence. [Subject, we assume, to the BNF 1-3 digit definition, so \177 is not \17 followed by ascii "7"?] Secondly they want the value of the octal or hexadecimal escape sequence to be in range of representable unsigned char or wchar_t data. I think we should consider that the natural intent is for each escape sequence to produce exactly one character element of the string. To that end, there should be lexical cues that indicate whether the characters are to be 8 or 16 bits wide. Those cues have to inform the lexical scan which upper bound to use on the escape sequence length. I don't think SV has wchar_t strings (yet). But, surely it is inevitable. Octal notation for 16 bit characters is awkward - should the two pad bits both go into the first digit, or one each in the first and fourth digits? (\177777 vs \377377) I say neither... octal can only represent 8 bits of a char, or 9 of a wchar. Hex notation can represent 4 or 8 bits of a char, 4,8,12, or 16 of a wchar. These right align in the char (wchar). When lexing a char string, 1-2 hex digits may be escaped. When lexing a wchar string, 1-4 hex digits may be escaped. Allowing one escape to create several character elements makes the literals harder to migrate upward from char to wchar. Requiring one escape per char permits means you can replace \x by \x00 and get a sensible result. This also spares us from endianness problems trying to convert long escape sequences into byte streams. If anyone would like an exhaustive review of all the different flavors of Hollerith encoding, perhaps we could find something SV has not yet claimed to implement...;-) Greg JaxonReceived on Fri Mar 24 11:14:28 2006
This archive was generated by hypermail 2.1.8 : Fri Mar 24 2006 - 11:14:45 PST