summaryrefslogtreecommitdiffstats
path: root/clang/lib/Lex/Lexer.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* C++1y is now C++14!Aaron Ballman2014-08-191-2/+2
| | | | | | Changes diagnostic options, language standard options, diagnostic identifiers, diagnostic wording to use c++14 instead of c++1y. It also modifies related test cases to use the updated diagnostic wording. llvm-svn: 215982
* Use StringRef instead of MemoryBuffer&.Rafael Espindola2014-08-121-7/+7
| | | | | | | This code doesn't care where the data it is processing comes from, so a StringRef is probably the most natural interface. llvm-svn: 215448
* Change MemoryBuffer* to MemoryBuffer& parameter to Lexer::ComputePreambleDavid Blaikie2014-08-111-9/+9
| | | | | | | | | | | | | (dropping const from the reference as MemoryBuffer is immutable already, so const is just redundant - and while I'd personally put const everywhere, that's not the LLVM Way (see llvm::Type for another example of an immutable type where "const" is omitted for brevity)) Changing the pointer argument to a reference parameter makes call sites identical between callers with unique_ptrs or raw pointers, minimizing the churn in a pending unique_ptr migrations. llvm-svn: 215391
* Hide the concept of diagnostic levels from lex, parse and semaAlp Toker2014-06-151-6/+3
| | | | | | | | | | | | | | | | The compilation pipeline doesn't actually need to know about the high-level concept of diagnostic mappings, and hiding the final computed level presents several simplifications and other potential benefits. The only exceptions are opportunistic checks to see whether expensive code paths can be avoided for diagnostics that are guaranteed to be ignored at a certain SourceLocation. This commit formalizes that invariant by introducing and using DiagnosticsEngine::isIgnored() in place of individual level checks throughout lex, parse and sema. llvm-svn: 211005
* Remove historical Unicode TODOsAlp Toker2014-05-181-16/+3
| | | | | | There's no immediate demand or plan to work on these. llvm-svn: 209090
* [C++11] Use 'nullptr'. Lex edition.Craig Topper2014-05-171-9/+12
| | | | llvm-svn: 209083
* Provide and use a safe Token::getRawIdentifier() accessorAlp Toker2014-05-171-3/+2
| | | | llvm-svn: 209061
* Revert r205436:Roman Divacky2014-04-031-28/+5
| | | | | | | | | | | | | | | | Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2. This provides some 3% speedup when preprocessing gcc.c as a single file. The patch is wrong, it always uses SSE2, and when I fix that there's no speedup at all. I am not sure where the 3% came from previously. --Thi lie, and those below, will be ignored-- M Lex/Lexer.cpp llvm-svn: 205548
* Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2.Roman Divacky2014-04-021-5/+28
| | | | | | This provides some 3% speedup when preprocessing gcc.c as a single file. llvm-svn: 205436
* [C++11] Replace llvm::tie with std::tie.Benjamin Kramer2014-03-021-1/+1
| | | | llvm-svn: 202639
* Fix a minor bug in lexing pp-numbers with digit separators: if a pp-number ↵Richard Smith2014-02-281-0/+1
| | | | | | contains "'e+", the pp-number ends between the 'e' and the '+'. llvm-svn: 202533
* PR18855: Add support for UCNs and UTF-8 encoding within ud-suffixes.Richard Smith2014-02-171-60/+90
| | | | llvm-svn: 201532
* Rename language option MicrosoftMode to MSVCCompatAlp Toker2014-01-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | There's been long-standing confusion over the role of these two options. This commit makes the necessary changes to differentiate them clearly, following up from r198936. MicrosoftExt (aka. fms-extensions): Enable largely unobjectionable Microsoft language extensions to ease portability. This mode, also supported by gcc, is used for building software like FreeBSD and Linux kernel extensions that share code with Windows drivers. MSVCCompat (aka. -fms-compatibility, formerly MicrosoftMode): Turn on a special mode supporting 'heinous' extensions for drop-in compatibility with the Microsoft Visual C++ product. Standards-compilant C and C++ code isn't guaranteed to work in this mode. Implies MicrosoftExt. Note that full -fms-compatibility mode is currently enabled by default on the Windows target, which may need tuning to serve as a reasonable default. See cfe-commits for the full discourse, thread 'r198497 - Move MS predefined type_info out of InitializePredefinedMacros' No change in behaviour. llvm-svn: 199209
* Sort all the #include lines with LLVM's utils/sort_includes.py whichChandler Carruth2014-01-071-1/+1
| | | | | | | encodes the canonical rules for LLVM's style. I noticed this had drifted quite a bit when cleaning up LLVM, so wanted to clean up Clang as well. llvm-svn: 198686
* Lexer: Issue -Wbackslash-newline-escape for line commentsAlp Toker2013-12-141-1/+8
| | | | | | | | | | | | The warning for backslash and newline separated by whitespace was missed in this code path. backslash<whitespace><newline> is handled differently from compiler to compiler so it's important to warn consistently where there's ambiguity. Matches similar handling of block comments and non-comment lines. llvm-svn: 197331
* Fix raw lex crash and -frewrite-includes noeol-at-eof failureAlp Toker2013-12-131-1/+2
| | | | | | Raw lexers don't have a preprocessor so we need to null check. llvm-svn: 197245
* Lex: Don't restrict legal UCNs when preprocessing assemblyJustin Bogner2013-10-211-0/+4
| | | | | | | | | | | | | | | The C and C++ standards disallow using universal character names to refer to some characters, such as basic ascii and control characters, so we reject these sequences in the lexer. However, when the preprocessor isn't being used on C or C++, it doesn't make sense to apply these restrictions. Notably, accepting these characters avoids issues with unicode escapes when GHC uses the compiler as a preprocessor on haskell sources. Fixes rdar://problem/14742289 llvm-svn: 193067
* Per updates to D3781, allow underscore under ' in a pp-number, and allow ' ↵Richard Smith2013-09-261-1/+1
| | | | | | in a #line directive. llvm-svn: 191443
* Implement C++1y digit separator proposal (' as a digit separator). This is notRichard Smith2013-09-261-0/+12
| | | | | | yet approved by full committee, but was unanimously supported by EWG. llvm-svn: 191417
* Avoid a signed/unsigned comparison warning with compilers that don't know howRichard Smith2013-09-241-1/+1
| | | | | | to handle constant expressions. llvm-svn: 191336
* Handle standard libraries that miss out the space when defining the standardRichard Smith2013-09-241-6/+28
| | | | | | | | literal operators. Also, for now, allow the proposed C++1y "il", "i", and "if" suffixes too. (Will revert the latter if LWG decides not to go ahead with that change after all.) llvm-svn: 191274
* Fix use-after-free in r190980.Eli Friedman2013-09-191-3/+6
| | | | llvm-svn: 190984
* Make Preprocessor::Lex non-recursive.Eli Friedman2013-09-191-90/+163
| | | | | | | | | | | | | | | Before this patch, Lex() would recurse whenever the current lexer changed (e.g. upon entry into a macro). This patch turns the recursion into a loop: the various lex routines now don't return a token when the current lexer changes, and at the top level Preprocessor::Lex() now loops until it finds a token. Normally, the recursion wouldn't end up being very deep, but the recursion depth can explode in edge cases like a bunch of consecutive macros which expand to nothing (like in the testcase test/Preprocessor/macro_expand_empty.c in this patch). <rdar://problem/14569770> llvm-svn: 190980
* Use new UnicodeCharSet interface.Alexander Kornienko2013-08-291-15/+35
| | | | | | | | | | | | | | Summary: This is a Clang part of http://llvm-reviews.chandlerc.com/D1534 Reviewers: jordan_rose, klimek, rsmith Reviewed By: rsmith CC: cfe-commits Differential Revision: http://llvm-reviews.chandlerc.com/D1535 llvm-svn: 189583
* Fix "//" comments with -traditional-cpp in C++.Eli Friedman2013-08-281-2/+4
| | | | | | | | | Apparently, gcc's -traditional-cpp behaves slightly differently in C++ mode; specifically, it discards "//" comments. Match gcc's behavior. <rdar://problem/14808126> llvm-svn: 189515
* Respect -Wnewline-eof even in C++11 mode.Jordan Rose2013-08-231-4/+22
| | | | | | | | | | | If the user has requested this warning, we should emit it, even if it's not an extension in the current language mode. However, being an extension is more important, so prefer the pedantic warning or the pedantic-compatibility warning if those are enabled. <rdar://problem/12922063> llvm-svn: 189110
* ObjectiveC migrator: More work towardsFariborz Jahanian2013-08-201-2/+3
| | | | | | insertion of ObjC audit pragmas. llvm-svn: 188733
* C++1y literal suffix support:Richard Smith2013-07-231-6/+18
| | | | | | | * Allow ns, us, ms, s, min, h as numeric ud-suffixes * Allow s as string ud-suffix llvm-svn: 186933
* Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.Michael J. Spencer2013-05-241-1/+1
| | | | llvm-svn: 182675
* [modules] If we hit a failure while loading a PCH/module, abort parsing ↵Argyrios Kyrtzidis2013-05-241-0/+6
| | | | | | | | | | instead of trying to continue in an invalid state. Also don't let libclang create a PCH with such an error. Fixes rdar://13953768 llvm-svn: 182629
* [Lexer] Improve Lexer::getSourceText() when the given range deals with ↵Argyrios Kyrtzidis2013-05-161-33/+24
| | | | | | | | function macro arguments. This is a modified version of a patch by Manuel Klimek. llvm-svn: 182055
* Typo and misc comment fix.Richard Smith2013-05-101-2/+4
| | | | llvm-svn: 181583
* [libclang] Make sure the preable does not truncate comments.Argyrios Kyrtzidis2013-04-191-2/+15
| | | | | | rdar://13647445 llvm-svn: 179907
* Add -Wc99-compat warning for C11 unicode string and character literals.Richard Smith2013-03-111-6/+8
| | | | llvm-svn: 176817
* When lexing in C11 mode, accept unicode character and string literals, per C11Richard Smith2013-03-091-9/+13
| | | | | | 6.4.4.4/1 and 6.4.5/1. llvm-svn: 176780
* Preprocessor: don't consider // to be a line comment in -E -std=c89 mode.Jordan Rose2013-03-051-4/+7
| | | | | | | | | | | | | | | | | | | It's beneficial when compiling to treat // as the start of a line comment even in -std=c89 mode, since it's not valid C code (with a few rare exceptions) and is usually intended as such. We emit a pedantic warning and then continue on as if line comments were enabled. This has been our behavior for quite some time. However, people use the preprocessor for things besides C source files. In today's prompting example, the input contains (unquoted) URLs, which contain // but should still be preserved. This change instructs the lexer to treat // as a plain token if Clang is in C90 mode and generating preprocessed output rather than actually compiling. <rdar://problem/13338743> llvm-svn: 176526
* Preprocessor: preserve whitespace in -traditional-cpp mode.Jordan Rose2013-02-211-17/+28
| | | | | | | | | Note that unlike GNU cpp we currently do not preserve whitespace in macros (even in -traditional-cpp mode). <rdar://problem/12897179> llvm-svn: 175778
* Properly validate UCNs for C99 and C++03 (both more restrictive than C(++)11).Jordan Rose2013-02-091-89/+86
| | | | | | | | Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a particular UCN is incompatible with a different standard, and -Wunicode when a UCN refers to a surrogate character in C++03. llvm-svn: 174788
* Pull Lexer's CharInfo table out for general use throughout Clang.Jordan Rose2013-02-081-170/+5
| | | | | | | | | | | Rewriting the same predicates over and over again is bad for code size and code maintainence. Using the functions in <ctype.h> is generally unsafe unless they are specified to be locale-independent (i.e. only isdigit and isxdigit). The next commit will try to clean up uses of <ctype.h> functions within Clang. llvm-svn: 174765
* Lexer: Don't warn about Unicode in preprocessor directives.Jordan Rose2013-01-311-2/+4
| | | | | | | | | This allows people to use Unicode in their #pragma mark and in macros that exist only to be string-ized. <rdar://problem/13107323&13121362> llvm-svn: 174081
* Fix r173881 to properly skip invalid UTF-8 characters in raw lexing and -E.Jordan Rose2013-01-301-0/+1
| | | | | | | | This caused hangs as we processed the same invalid byte over and over. <rdar://problem/13115651> llvm-svn: 173959
* Move UTF conversion routines from clang/lib/Basic to llvm/lib/SupportDmitri Gribenko2013-01-301-9/+11
| | | | | | This is required to use them in TableGen. llvm-svn: 173924
* Don't warn about Unicode characters in -E mode.Jordan Rose2013-01-301-18/+20
| | | | | | | | | | | | | | | People use the C preprocessor for things other than C files. Some of them have Unicode characters. We shouldn't warn about Unicode characters appearing outside of identifiers in this case. There's not currently a way for the preprocessor to tell if it's in -E mode, so I added a new flag, derived from the PreprocessorOutputOptions. This is only used by the Unicode warnings for now, but could conceivably be used by other warnings or even behavioral differences later. <rdar://problem/13107323> llvm-svn: 173881
* PR15067 (again): Don't warn about UCNs in C90 if we're raw-lexing.Jordan Rose2013-01-281-1/+2
| | | | | | Fixes a crash. Thanks, Richard. llvm-svn: 173701
* PR15067: Don't assert when a UCN appears in a C90 file.Jordan Rose2013-01-271-3/+6
| | | | | | | Unfortunately, we can't accept the UCN as an extension because we're required to treat it as two tokens for preprocessing purposes. llvm-svn: 173622
* Lexer.cpp: Fix a warning with ptrdiff_t on i686. [-Wsign-compare]NAKAMURA Takumi2013-01-251-1/+1
| | | | llvm-svn: 173447
* Clarify comment: "diagnose" is better than "warn" when emitting an error.Jordan Rose2013-01-251-1/+1
| | | | | | Thanks, Dmitri. llvm-svn: 173400
* Add a fixit for \U1234 -> \u1234.Jordan Rose2013-01-241-1/+9
| | | | llvm-svn: 173371
* As an extension, treat Unicode whitespace characters as whitespace.Jordan Rose2013-01-241-0/+23
| | | | llvm-svn: 173370
* Handle universal character names and Unicode characters outside of literals.Jordan Rose2013-01-241-13/+275
| | | | | | | | | | | | | | | | | | | | | | | | This is a missing piece for C99 conformance. This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U'). Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo. Of course, valid code that does *not* use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN). This patch also adds basic support for actual UTF-8 in the source. This is treated almost exactly the same as UCNs except that we consider stray Unicode characters to be mistakes and offer a fixit to remove them. llvm-svn: 173369
OpenPOWER on IntegriCloud