summaryrefslogtreecommitdiffstats
path: root/clang/lib/Format/FormatTokenLexer.cpp
diff options
context:
space:
mode:
authorRichard Smith <richard-llvm@metafoo.co.uk>2017-04-17 23:44:51 +0000
committerRichard Smith <richard-llvm@metafoo.co.uk>2017-04-17 23:44:51 +0000
commit1d2ae94b5dbd0ec575b28744a5c34d3a5c133568 (patch)
tree9de349d61e509c27b1d3c0a81fb1c3125af99cfa /clang/lib/Format/FormatTokenLexer.cpp
parent76423dce15107a7ad772677eefba5597fcb59f98 (diff)
downloadbcm5719-llvm-1d2ae94b5dbd0ec575b28744a5c34d3a5c133568.tar.gz
bcm5719-llvm-1d2ae94b5dbd0ec575b28744a5c34d3a5c133568.zip
Fix mishandling of escaped newlines followed by newlines or nuls.
Previously, if an escaped newline was followed by a newline or a nul, we'd lex the escaped newline as a bogus space character. This led to a bunch of different broken corner cases: For the pattern "\\\n\0#", we would then have a (horizontal) space whose spelling ends in a newline, and would decide that the '#' is at the start of a line, and incorrectly start preprocessing a directive in the middle of a logical source line. If we were already in the middle of a directive, this would result in our attempting to process multiple directives at the same time! This resulted in crashes, asserts, and hangs on invalid input, as discovered by fuzz-testing. For the pattern "\\\n" at EOF (with an implicit following nul byte), we would produce a bogus trailing space character with spelling "\\\n". This was mostly harmless, but would lead to clang-format getting confused and misformatting in rare cases. We now produce a trailing EOF token with spelling "\\\n", consistent with our handling for other similar cases -- an escaped newline is always part of the token containing the next character, if any. For the pattern "\\\n\n", this was somewhat more benign, but would produce an extraneous whitespace token to clients who care about preserving whitespace. However, it turns out that our lexing for line comments was relying on this bug due to an off-by-one error in its computation of the end of the comment, on the slow path where the comment might contain escaped newlines. llvm-svn: 300515
Diffstat (limited to 'clang/lib/Format/FormatTokenLexer.cpp')
-rw-r--r--clang/lib/Format/FormatTokenLexer.cpp3
1 files changed, 3 insertions, 0 deletions
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index 4ee43d6937e..1acc0c30651 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -467,6 +467,9 @@ FormatToken *FormatTokenLexer::getNextToken() {
if (pos >= 0 && Text[pos] == '\r')
--pos;
// See whether there is an odd number of '\' before this.
+ // FIXME: This is wrong. A '\' followed by a newline is always removed,
+ // regardless of whether there is another '\' before it.
+ // FIXME: Newlines can also be escaped by a '?' '?' '/' trigraph.
unsigned count = 0;
for (; pos >= 0; --pos, ++count)
if (Text[pos] != '\\')
OpenPOWER on IntegriCloud