A few months ago I proposed to introduce character literals in Stratego 
as syntactic sugar for the integer ASCII value of the character. I would 
like to raise this issue again ;-) .

My proposal was to introduce the widely used 'c' syntax to represent a 
character in Stratego. The stratego compiler can desugar this to the 
integer ASCII representation of the character, what we use to work with 
characters right now. The implementation is trivial, it requires no 
changes to the backend, it requires no special ATerm types.

Advantages:

	* You don't need an ASCII table when your programming string manipulations in Stratego.

	* String processing code like in string.r and char.r in the ssl will  become much more clearer.

	* It is tempting to explode string literals (even of length 1) at runtime so you don't have to look up the character codes. Character  literals will improve performance in this case.

Disadvantages:

<ul>
<li>The disadvantage Eelco Visser mentioned when I proposed it before: 
what if the result of a Stratego transformation will contain characters. 
The user will see integer values instead of the character literal in 
this case. I think that this is not a huge problem because: (1) 
characters are not likely to occur in the result of a program: they are 
often used in String manipulation, where the string will be rebuild 
(imploded) later. (2) In another Stratego applications the integer value 
can still be matched against a pattern with a character literal because 
a character literal is simply an int. (3) concrete syntax already makes 
the resulting aterms more difficult to compare with the Stratego code. I 
think the added distance between Stratego terms and ATerms is negligible.

<li>The single quote (apostrophe) is allowed in an identifier. This is a 
problem if the charachter in the literal is also an allowed charachter 
in an identifier. This introduces an ambiguity. However, an identifier 
like 'c' is very unlikely to occur and could be rejected by the SDF grammar.

</ul>

I'm currently writing strategies to rewrite XML entities and character 
references. I'm using overlays right now. This is already an 
improvement, but quite verbose.


<verbatim>
------------------------
rules

unescape-amp :
	[c_amp(), c_a(), c_m(), c_p(), c_semicolon() | cs] -> [c_amp() | cs]
unescape-lt :
	[c_amp(), c_l(), c_t(), c_semicolon() | cs] -> [c_lt() | cs]
unescape-gt :
	[c_amp(), c_g(), c_t(), c_semicolon() | cs] -> [c_gt() | cs]

overlays

	c_space() = 32
	c_quote() = 34
	c_amp()	= 38
	c_apos()  = 39
	c_0()	  = 48
	c_9()	  = 57
	c_semicolon() = 59
	c_numbersign() = 35
------------------------
</verbatim>

This would be possible with characters in Stratego:

<verbatim>
-----------------------------------
unescape-amp : ['&', 'a', 'm', 'p', ';' | cs] -> ['&' | cs]
unescape-lt  : ['&', 'l', 't', ';' | cs] -> ['<' | cs]
unescape-gt  : ['&', 'g', 't', ';' | cs] -> ['>' | cs]
-----------------------------------
</verbatim>

Of course (un)escaping is an example where the usefulness of character 
literals is huge. In general you won't use character literals a lot in 
Stratego. Because of the simplicity of the implementation I think that 
it is still worth the effort.

I would like to hear your opinion :-) .

-- Martin Bravenboer - 07 Dec 2002

Ok. I have added the following to the SDF definition of Stratego:

<verbatim>
----------------------------------------------------------------------
  lexical syntax
	 "\'" CharChar "\'"		-> Char
	 ~[\']			-> CharChar
	 [\\] [\'ntr\ ]		-> CharChar
	 Char		 	-> Id {reject}
 
  context-free syntax
	 Char				  -> Term {cons("Char")}
----------------------------------------------------------------------
</verbatim>

and the following desugaring rules to stratego-desugar:

<verbatim>
----------------------------------------------------------------------
  Desugar :
	 Char(c) -> Int(i)
	 where <DesugarChar <+ explode-string; DesugarCharGeneric> c => i

  DesugarCharGeneric :
	 [39, i, 39] -> i
  DesugarChar :
	 "'\\''" -> 39
  DesugarChar :
	 "'\\n'" -> 10
  DesugarChar :
	 "'\\t'" -> 9
  DesugarChar : // carriage return
	 "'\\r'" -> 13
  DesugarChar : // space
	 "'\\ '" -> 32
----------------------------------------------------------------------
</verbatim>

Note that the desugaring is done at the syntactic level as part of parsing.
This means that characters are pretty-printed as integers. This can be improved
later by shifting the desugaring until later in the process. This requires
deeper embedding of this notion in Stratego, though.

Are any other escapes needed? Note that this will break existing specifications
with identifiers of the form 'c' (which I have never seen).

These changes are available in StrategoRelease09 (beta7).

-- Main.EelcoVisser - 21 Dec 2002