summaryrefslogtreecommitdiffstats
path: root/clang/lib/Lex/Lexer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [Preprocessor] Correct internal token parsing of newline characters in CRLFErich Keane2017-08-241-0/+2
| | | | | | | | | | | | | | | | | | | | Discovered due to a goofy git setup, the test system-headerline-directive.c (and a few others) failed because the token-consumption will consume only the '\r' in CRLF, making the preprocessor's printed value give the wrong line number when returning from an include. For example: (line 1):#include <noline.h>\r\n The "file exit" code causes the printer to try to print the 'returned to the main file' line. It looks up what the current line number is. However, since the current 'token' is the '\n' (since only the \r was consumed), it will give the line number as '1", not '2'. This results in a few failed tests, but more importantly, results in error messages being incorrect when compiling a previously preprocessed file. Differential Revision: https://reviews.llvm.org/D37079 llvm-svn: 311683
* [Lexer] Finding beginning of token with escaped new lineAlexander Kornienko2017-08-101-28/+44
| | | | | | | | | | | | | | | | | | | | Summary: Lexer::GetBeginningOfToken produced invalid location when backtracking across escaped new lines. This fixes PR26228 Reviewers: akyrtzi, alexfh, rsmith, doug.gregor Reviewed By: alexfh Subscribers: alexfh, cfe-commits Patch by Paweł Żukowski! Differential Revision: https://reviews.llvm.org/D30748 llvm-svn: 310576
* Fix invalid warnings for header guards in preamblesErik Verbruggen2017-07-051-1/+1
| | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=33574 Differential Revision: https://reviews.llvm.org/D34882 llvm-svn: 307134
* [PR33394] Avoid lexing editor placeholders when Clang is used onlyAlex Lorenz2017-06-161-1/+2
| | | | | | | | | | | | | | | | for preprocessing r300667 added support for editor placeholder to Clang. That commit didn’t take into account that users who use Clang for preprocessing only (-E) will get the "editor placeholder in source file" error when preprocessing their source (PR33394). This commit ensures that Clang doesn't lex editor placeholders when running a preprocessor only action. rdar://32718000 Differential Revision: https://reviews.llvm.org/D34256 llvm-svn: 305576
* Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova2017-06-031-0/+2
| | | | llvm-svn: 304643
* Allow for unfinished #if blocks in preamblesErik Verbruggen2017-05-301-28/+11
| | | | | | | | | | | | | | | | | | | Previously, a preamble only included #if blocks (and friends like ifdef) if there was a corresponding #endif before any declaration or definition. The problem is that any header file that uses include guards will not have a preamble generated, which can make code-completion very slow. To prevent errors about unbalanced preprocessor conditionals in the preamble, and unbalanced preprocessor conditionals after a preamble containing unfinished conditionals, the conditional stack is stored in the pch file. This fixes PR26045. Differential Revision: http://reviews.llvm.org/D15994 llvm-svn: 304207
* [Lexer] Ensure that the token is not an annotation token whenAlex Lorenz2017-05-171-0/+4
| | | | | | | | | | | retrieving the identifer info for an Objective-C keyword This commit fixes an assertion that's triggered in getIdentifier when the token is an annotation token. rdar://32225463 llvm-svn: 303246
* Add a fix-it for -Wunguarded-availabilityAlex Lorenz2017-05-051-17/+49
| | | | | | | | | | | | | This patch adds a fix-it for the -Wunguarded-availability warning. This fix-it is similar to the Swift one: it suggests that you wrap the statement in an `if (@available)` check. The produced fixits are indented (just like the Swift ones) to make them look nice in Xcode's fix-it preview. rdar://31680358 Differential Revision: https://reviews.llvm.org/D32424 llvm-svn: 302253
* Add support for editor placeholders to ClangAlex Lorenz2017-04-191-0/+33
| | | | | | | | | | | | | | | | | | | | | This commit teaches Clang to recognize editor placeholders that are produced when an IDE like Xcode inserts a code-completion result that includes a placeholder. Now when the lexer sees a placeholder token, it emits an 'editor placeholder in source file' error and creates an identifier token that represents the placeholder. The parser/sema can now recognize the placeholders and can suppress the diagnostics related to the placeholders. This ensures that live issues in an IDE like Xcode won't get spurious diagnostics related to placeholders. This commit also adds a new compiler option named '-fallow-editor-placeholders' that silences the 'editor placeholder in source file' error. This is useful for an IDE like Xcode as we don't want to display those errors in live issues. rdar://31581400 Differential Revision: https://reviews.llvm.org/D32081 llvm-svn: 300667
* Do not warn about whitespace between ??/ trigraph and newline in line ↵Richard Smith2017-04-181-4/+6
| | | | | | comments if trigraphs are disabled in the current language. llvm-svn: 300609
* Fix mishandling of escaped newlines followed by newlines or nuls.Richard Smith2017-04-171-18/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if an escaped newline was followed by a newline or a nul, we'd lex the escaped newline as a bogus space character. This led to a bunch of different broken corner cases: For the pattern "\\\n\0#", we would then have a (horizontal) space whose spelling ends in a newline, and would decide that the '#' is at the start of a line, and incorrectly start preprocessing a directive in the middle of a logical source line. If we were already in the middle of a directive, this would result in our attempting to process multiple directives at the same time! This resulted in crashes, asserts, and hangs on invalid input, as discovered by fuzz-testing. For the pattern "\\\n" at EOF (with an implicit following nul byte), we would produce a bogus trailing space character with spelling "\\\n". This was mostly harmless, but would lead to clang-format getting confused and misformatting in rare cases. We now produce a trailing EOF token with spelling "\\\n", consistent with our handling for other similar cases -- an escaped newline is always part of the token containing the next character, if any. For the pattern "\\\n\n", this was somewhat more benign, but would produce an extraneous whitespace token to clients who care about preserving whitespace. However, it turns out that our lexing for line comments was relying on this bug due to an off-by-one error in its computation of the end of the comment, on the slow path where the comment might contain escaped newlines. llvm-svn: 300515
* Skip Unicode character expansion in assembly filesSanne Wouda2017-04-071-9/+11
| | | | | | | | | | | | | | | | | | | Summary: When using the C preprocessor with assembly files, either with a capital `S` file extension, or with `-xassembler-with-cpp`, the Unicode escape sequence `\u` is ignored. The `\u` pattern can be used for expanding a macro argument that starts with `u`. Author: Salman Arif <salman.arif@arm.com> Reviewers: rengolin, olista01 Reviewed By: olista01 Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D31765 llvm-svn: 299754
* Allow lexer to handle string_view literals. Patch from Anton Bikineev.Eric Fiselier2016-12-301-3/+3
| | | | | | | This implements the compiler side of p0403r0. This patch was reviewed as https://reviews.llvm.org/D26829. llvm-svn: 290744
* Move UTF functions into namespace llvm.Justin Lebar2016-09-301-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This lets people link against LLVM and their own version of the UTF library. I determined this only affects llvm, clang, lld, and lldb by running $ git grep -wl 'UTF[0-9]\+\|\bConvertUTF\bisLegalUTF\|getNumBytesFor' | cut -f 1 -d '/' | sort | uniq clang lld lldb llvm Tested with ninja lldb ninja check-clang check-llvm check-lld (ninja check-lldb doesn't complete for me with or without this patch.) Reviewers: rnk Subscribers: klimek, beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D24996 llvm-svn: 282822
* Fix some Clang-tidy modernize-use-using and Include What You Use warnings; ↵Eugene Zelenko2016-09-071-20/+22
| | | | | | | | other minor fixes. Differential revision: https://reviews.llvm.org/D24115 llvm-svn: 280870
* Implement filtering for code completion of identifiers.Vassil Vassilev2016-07-271-1/+9
| | | | | | | | Patch by Cristina Cristescu and Axel Naumann! Agreed on post commit review (D17820). llvm-svn: 276878
* [Lexer] Let the compiler infer string lengths. No functionality change intended.Benjamin Kramer2016-04-011-2/+2
| | | | llvm-svn: 265126
* [Lexer] Don't read out of bounds if a conflict marker is at the end of a fileBenjamin Kramer2016-04-011-1/+1
| | | | | | | | | | This can happen as we look for '<<<<' while scanning tokens but then expect '<<<<\n' to tell apart perforce from diff3 conflict markers. Just harden the pointer arithmetic. Found by libfuzzer + asan! llvm-svn: 265125
* Update diagnostics now that hexadecimal literals look likely to be part of ↵Richard Smith2016-03-041-2/+3
| | | | | | C++17. llvm-svn: 262753
* Remove use of builtin comma operator.Richard Trieu2016-02-181-1/+3
| | | | | | Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261271
* [OpenCL] Adding reserved operator logical xor for OpenCLAnastasia Stulova2016-02-031-0/+3
| | | | | | | | | | | | | | | | | | This patch adds the reserved operator ^^ when compiling for OpenCL (spec v1.1 s6.3.g), which results in a more meaningful error message. Patch by Neil Hickey! Review: http://reviews.llvm.org/D13280 M test/SemaOpenCL/unsupported.cl M include/clang/Basic/TokenKinds.def M include/clang/Basic/DiagnosticParseKinds.td M lib/Basic/OperatorPrecedence.cpp M lib/Lex/Lexer.cpp M lib/Parse/ParseExpr.cpp llvm-svn: 259651
* Fix -Wnull-conversion for long macros.Richard Trieu2016-01-261-0/+25
| | | | | | | | | Move the function to get a macro name from DiagnosticRenderer.cpp to Lexer.cpp so that other files can use it. Lexer now has two functions to get the immediate macro name, the newly added one is better for diagnostic purposes. Make -Wnull-conversion use this function for better NULL macro detection. llvm-svn: 258778
* Emit a -Wmicrosoft warning when treating ^Z as EOF in MS mode.Nico Weber2015-12-291-1/+4
| | | | llvm-svn: 256596
* [clang] Disable Unicode in asm filesVinicius Tinti2015-11-201-2/+6
| | | | | | | | | Clang should not convert tokens to Unicode when preprocessing assembly files. Fixes PR25558. llvm-svn: 253738
* Use %select to merge similar diagnostics. NFCCraig Topper2015-11-141-5/+5
| | | | llvm-svn: 253119
* Disable trigraph and escaped newline expansion on all types of raw string ↵Craig Topper2015-10-221-1/+1
| | | | | | literals not just ASCII type. llvm-svn: 251025
* Replace a few std::string& with StringRef. NFC.Rafael Espindola2015-06-011-1/+1
| | | | | | Patch by Косов Евгений! llvm-svn: 238774
* Fix buffer overflow in LexerKostya Serebryany2015-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: Fix PR22407, where the Lexer overflows the buffer when parsing #include<\ (end of file after slash) Test Plan: Added a test that will trigger in asan build. This case is also covered by the clang-fuzzer bot. Reviewers: rnk Reviewed By: rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D9489 llvm-svn: 236466
* Use delegating ctors to reduce code duplication. NFC.Benjamin Kramer2015-03-061-8/+2
| | | | llvm-svn: 231476
* Lex: Don't crash if both conflict markers are on the same lineDavid Majnemer2014-12-141-2/+2
| | | | | | | | | | We would check if the terminator marker is on a newline. However, the logic would end up out-of-bounds if the terminator marker immediately follows the start marker. This fixes PR21820. llvm-svn: 224210
* [c++1z] Support for u8 character literals.Richard Smith2014-11-081-6/+14
| | | | llvm-svn: 221576
* Fix warning in Altivec code when building with GCC 4.8.2 on Ubuntu 14.04.Jay Foad2014-10-291-1/+1
| | | | llvm-svn: 220855
* C++1y is now C++14!Aaron Ballman2014-08-191-2/+2
| | | | | | Changes diagnostic options, language standard options, diagnostic identifiers, diagnostic wording to use c++14 instead of c++1y. It also modifies related test cases to use the updated diagnostic wording. llvm-svn: 215982
* Use StringRef instead of MemoryBuffer&.Rafael Espindola2014-08-121-7/+7
| | | | | | | This code doesn't care where the data it is processing comes from, so a StringRef is probably the most natural interface. llvm-svn: 215448
* Change MemoryBuffer* to MemoryBuffer& parameter to Lexer::ComputePreambleDavid Blaikie2014-08-111-9/+9
| | | | | | | | | | | | | (dropping const from the reference as MemoryBuffer is immutable already, so const is just redundant - and while I'd personally put const everywhere, that's not the LLVM Way (see llvm::Type for another example of an immutable type where "const" is omitted for brevity)) Changing the pointer argument to a reference parameter makes call sites identical between callers with unique_ptrs or raw pointers, minimizing the churn in a pending unique_ptr migrations. llvm-svn: 215391
* Hide the concept of diagnostic levels from lex, parse and semaAlp Toker2014-06-151-6/+3
| | | | | | | | | | | | | | | | The compilation pipeline doesn't actually need to know about the high-level concept of diagnostic mappings, and hiding the final computed level presents several simplifications and other potential benefits. The only exceptions are opportunistic checks to see whether expensive code paths can be avoided for diagnostics that are guaranteed to be ignored at a certain SourceLocation. This commit formalizes that invariant by introducing and using DiagnosticsEngine::isIgnored() in place of individual level checks throughout lex, parse and sema. llvm-svn: 211005
* Remove historical Unicode TODOsAlp Toker2014-05-181-16/+3
| | | | | | There's no immediate demand or plan to work on these. llvm-svn: 209090
* [C++11] Use 'nullptr'. Lex edition.Craig Topper2014-05-171-9/+12
| | | | llvm-svn: 209083
* Provide and use a safe Token::getRawIdentifier() accessorAlp Toker2014-05-171-3/+2
| | | | llvm-svn: 209061
* Revert r205436:Roman Divacky2014-04-031-28/+5
| | | | | | | | | | | | | | | | Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2. This provides some 3% speedup when preprocessing gcc.c as a single file. The patch is wrong, it always uses SSE2, and when I fix that there's no speedup at all. I am not sure where the 3% came from previously. --Thi lie, and those below, will be ignored-- M Lex/Lexer.cpp llvm-svn: 205548
* Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2.Roman Divacky2014-04-021-5/+28
| | | | | | This provides some 3% speedup when preprocessing gcc.c as a single file. llvm-svn: 205436
* [C++11] Replace llvm::tie with std::tie.Benjamin Kramer2014-03-021-1/+1
| | | | llvm-svn: 202639
* Fix a minor bug in lexing pp-numbers with digit separators: if a pp-number ↵Richard Smith2014-02-281-0/+1
| | | | | | contains "'e+", the pp-number ends between the 'e' and the '+'. llvm-svn: 202533
* PR18855: Add support for UCNs and UTF-8 encoding within ud-suffixes.Richard Smith2014-02-171-60/+90
| | | | llvm-svn: 201532
* Rename language option MicrosoftMode to MSVCCompatAlp Toker2014-01-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | There's been long-standing confusion over the role of these two options. This commit makes the necessary changes to differentiate them clearly, following up from r198936. MicrosoftExt (aka. fms-extensions): Enable largely unobjectionable Microsoft language extensions to ease portability. This mode, also supported by gcc, is used for building software like FreeBSD and Linux kernel extensions that share code with Windows drivers. MSVCCompat (aka. -fms-compatibility, formerly MicrosoftMode): Turn on a special mode supporting 'heinous' extensions for drop-in compatibility with the Microsoft Visual C++ product. Standards-compilant C and C++ code isn't guaranteed to work in this mode. Implies MicrosoftExt. Note that full -fms-compatibility mode is currently enabled by default on the Windows target, which may need tuning to serve as a reasonable default. See cfe-commits for the full discourse, thread 'r198497 - Move MS predefined type_info out of InitializePredefinedMacros' No change in behaviour. llvm-svn: 199209
* Sort all the #include lines with LLVM's utils/sort_includes.py whichChandler Carruth2014-01-071-1/+1
| | | | | | | encodes the canonical rules for LLVM's style. I noticed this had drifted quite a bit when cleaning up LLVM, so wanted to clean up Clang as well. llvm-svn: 198686
* Lexer: Issue -Wbackslash-newline-escape for line commentsAlp Toker2013-12-141-1/+8
| | | | | | | | | | | | The warning for backslash and newline separated by whitespace was missed in this code path. backslash<whitespace><newline> is handled differently from compiler to compiler so it's important to warn consistently where there's ambiguity. Matches similar handling of block comments and non-comment lines. llvm-svn: 197331
* Fix raw lex crash and -frewrite-includes noeol-at-eof failureAlp Toker2013-12-131-1/+2
| | | | | | Raw lexers don't have a preprocessor so we need to null check. llvm-svn: 197245
* Lex: Don't restrict legal UCNs when preprocessing assemblyJustin Bogner2013-10-211-0/+4
| | | | | | | | | | | | | | | The C and C++ standards disallow using universal character names to refer to some characters, such as basic ascii and control characters, so we reject these sequences in the lexer. However, when the preprocessor isn't being used on C or C++, it doesn't make sense to apply these restrictions. Notably, accepting these characters avoids issues with unicode escapes when GHC uses the compiler as a preprocessor on haskell sources. Fixes rdar://problem/14742289 llvm-svn: 193067
* Per updates to D3781, allow underscore under ' in a pp-number, and allow ' ↵Richard Smith2013-09-261-1/+1
| | | | | | in a #line directive. llvm-svn: 191443
OpenPOWER on IntegriCloud