PDA

View Full Version : Stripping multiline comments using RegEx


REMIYA
06-08-2005, 12:08 AM
I want to strip the single line and multi line comments wrapped in /* */.

The following RegEx does exatly that, but strips the comments also and in the strings.







//Strips all the multi-line and single-line /* */comments even in strings ""

$pattern ="/\/\\*[\\s\\S]*?\\*\//";

$replacement = "THE COMMENT HAS BEEN HERE";

echo nl2br(preg_replace($pattern, $replacement, $phpcode));







So an initial code of:





/*

This is a multiline comment

*/



/* This is a single line comment */



echo "Hi, /* This is a single line comment in a string*/";





Processed through the above RegEx returns:

THE COMMENT HAS BEEN HERE



THE COMMENT HAS BEEN HERE



echo "Hi, THE COMMENT HAS BEEN HERE";





I've done a lot of testing with RegExs, but all was in vain. So was the search through Internet.

Is there any solution to the problem using RegEx?

Please, test before you post.

Bruce
06-08-2005, 12:59 AM
This kind of problem is very hard to do with regular expressions. What you need is "context", and regular expressions have no context. I can personally think of no easy way to do it with regexp searching.

What is typically done is to build a parser that keeps track of if its in a string or a comment, and does the appropriate replacement. You can use regular expressions to do the searching (for example, search for /.*?("|/\*)/ and then check $1 if it's a quote or a comment), but its an iterative process that can't be done from within a regexp itself.

REMIYA
06-08-2005, 11:07 AM
What is typically done is to build a parser that keeps track of if its in a string or a comment, and does the appropriate replacement. You can use regular expressions to do the searching (for example, search for /.*?("|/\*)/ and then check $1 if it's a quote or a comment), but its an iterative process that can't be done from within a regexp itself.

Writing a whole parser is quite a work for such a task! :umm:

sheila
06-08-2005, 11:40 AM
Generally, there are already parser libraries available, depending on the language you are using, and you would just grab one of those?

REMIYA
06-08-2005, 12:04 PM
Thank you mates,

I have reached to the following solution:
http://php.paco.net/manual/en/function.php-strip-whitespace.php (http://php.paco.net/manual/en/function.php-strip-whitespace.php)

This is the same as php -w from the command line.

Perhaps, I'll have to save the code to a temporary file, execute the function.

And then grab the result code from the file.

I thought, wouldn't it be possible to make it dinamicaly, without saving code to temporary file.

kitchin
06-08-2005, 12:32 PM
Maybe all your comments occur at the beginning of a line, and all the quoted ones do not!

REMIYA
06-12-2005, 11:26 PM
The solution is PHP tokenizer, coming built with PHP5.

Why write a parser when there is already one written :)

PaulKroll
06-13-2005, 03:27 PM
Well, Terra has said PHP5 is still cooking, but given the time so far, it could be closer to "the lithosphere has cooled enough to form a thin, glowing crust" stage... it might be a little while before the surface can sustain life.
- Sorry, Terra, for the "terran" motif. :)