3. PHP's tokenizer. Like regular expressions, but we have a list of tokens and what PHP thinks they are. A bit smarter, but still fails if someone mixes up the syntax.
potx uses the tokenizer to extract t()'ed strings from the source code, IIRC. I think the Tokenizer is the way to go for this documentation issue. We can even re-use parts of potx module for discovering the t() calls. Also, you argument with syntax is not really valid. With almost every imaginable syntax, it is possible to mess something up.
Documentation comments would be much better. They are in code, but have the advantage of not being code. The challenge is figuring out a convenient and consistent comment syntax.
If we want to display the documentation somewhere else than in the source code, we'd also have to extract them somehow (possibly using regular expressions), which is not really different from writing them as actual code in the schema file. And your argument about messing up the syntax is especially true for this, since PHP doesn't perform "syntax checking" for comments. If documentation is provided as code, we can at least be sure that the file parses correctly. Konstantin Käfer — http://kkaefer.com/