summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] narrow a shuffle that doesn't use or set any high elementsSanjay Patel2019-01-251-2/+47
| | | | | | | | | | | | | | | This isn't the final fix for our reduction/horizontal codegen, but it takes care of a lot of the problems. After we narrow the shuffle, existing combines for insert/extract and binops kick in, and we end up with cheaper 128-bit ops. The avg and mul reduction tests show an existing shuffle lowering hole for AVX2/AVX512. I think in its most minimal form this is: https://bugs.llvm.org/show_bug.cgi?id=40434 ...but we might need multiple fixes to get it right. Differential Revision: https://reviews.llvm.org/D57156 llvm-svn: 352209
* [JSON] Work around excess-precision issue when comparing T_Integer numbers.Sam McCall2019-01-251-0/+6
| | | | | | | | | | Reviewers: bkramer Subscribers: kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D57237 llvm-svn: 352204
* Fix gcc -Wparentheses warning. NFCI.Simon Pilgrim2019-01-251-4/+4
| | | | llvm-svn: 352193
* Fix gcc -Wparentheses warning. NFCI.Simon Pilgrim2019-01-251-2/+2
| | | | llvm-svn: 352191
* [ARM GlobalISel] Support shifts for Thumb2Diana Picus2019-01-251-4/+4
| | | | | | | | | | Same as ARM. On this occasion we split some of the instruction select tests for more complicated instructions into their own files, so we can reuse them for ARM and Thumb mode. Likewise for the legalizer tests. llvm-svn: 352188
* [ARM GlobalISel] Remove rebase artifact from r351882. NFCDiana Picus2019-01-251-3/+0
| | | | | | | r351882 introduced some superfluous calls to mark G_INTTOPTR and G_PTRTOINT as legal (looks like a rebase mishap). Remove them. llvm-svn: 352187
* [TblGen] Extend !if semantics through new feature !condJaved Absar2019-01-255-1/+224
| | | | | | | | | | | | | | | | | This patch extends TableGen language with !cond operator. Instead of embedding !if inside !if which can get cumbersome, one can now use !cond. Below is an example to convert an integer 'x' into a string: !cond(!lt(x,0) : "Negative", !eq(x,0) : "Zero", !eq(x,1) : "One, 1 : "MoreThanOne") Reviewed By: hfinkel, simon_tatham, greened Differential Revision: https://reviews.llvm.org/D55758 llvm-svn: 352185
* [MSP430] Fix absolute addressing mode printing in AsmPrinterAnton Korobeynikov2019-01-251-2/+2
| | | | | | | | | | | | | Align checks for absolute addressing mode with its current implementation (SR is used as a base register). This fixes https://bugs.llvm.org/show_bug.cgi?id=39993 Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56785 llvm-svn: 352178
* [PowerPC] Enhance the fast selection of cmp instruction and clean up related ↵Zi Xuan Wu2019-01-251-3/+12
| | | | | | | | | | | | | | | | | asserts Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support. We'd use VSX float comparison instruction instead of non-vsx float comparison instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is opened. If the target does not have corresponding VSX instruction comparison for some type, just copy VSX-related register to common float register class and use non-vsx comparison instruction. Differential Revision: https://reviews.llvm.org/D57078 llvm-svn: 352174
* [X86] Add non-masked versions of vpconflict intrinsics so we can use a ↵Craig Topper2019-01-251-0/+6
| | | | | | | | select in the header file in clang. I'll remove and autoupgrade the old intrinsics in a future commit. llvm-svn: 352172
* [RISCV] Custom-legalise i32 SDIV/UDIV/UREM on RV64MAlex Bradbury2019-01-253-49/+55
| | | | | | | | | | | | | | Follow the same custom legalisation strategy as used in D57085 for variable-length shifts (see that patch summary for more discussion). Although we may lose out on some late-stage DAG combines, I think this custom legalisation strategy is ultimately easier to reason about. There are some codegen changes in rv64m-exhaustive-w-insts.ll but they are all neutral in terms of the number of instructions. Differential Revision: https://reviews.llvm.org/D57096 llvm-svn: 352171
* [LoopSimplifyCFG] Fix inconsistency in blocks in loop markupMax Kazantsev2019-01-251-1/+1
| | | | | | | | | | | | 2nd part of D57095 with the same reason, just in another place. We never fold branches that are not immediately in the current loop, but this check is missing in `IsEdgeLive` As result, it may think that the edge in subloop is dead while it's live. It's a pessimization in the current stance. Differential Revision: https://reviews.llvm.org/D57147 Reviewed By: rupprecht llvm-svn: 352170
* [RISCV] Custom-legalise 32-bit variable shifts on RV64Alex Bradbury2019-01-253-67/+105
| | | | | | | | | | | | | | | | | | | | | | | | | The previous DAG combiner-based approach had an issue with infinite loops between the target-dependent and target-independent combiner logic (see PR40333). Although this was worked around in rL351806, the combiner-based approach is still potentially brittle and can fail to select the 32-bit shift variant when profitable to do so, as demonstrated in the pr40333.ll test case. This patch instead introduces target-specific SelectionDAG nodes for SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a good example of how this approach can improve codegen. This adds DAG combine that does SimplifyDemandedBits on the operands (only lower 32-bits of first operand and lower 5 bits of second operand are read). This seems better than implementing SimplifyDemandedBitsForTargetNode as there is no guarantee that would be called (and it's not for e.g. the anyext return test cases). Also implements ComputeNumSignBitsForTargetNode. There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new instruction sequences are semantically equivalent. Differential Revision: https://reviews.llvm.org/D57085 llvm-svn: 352169
* AMDGPU/GlobalISel: Remove leftover setActionMatt Arsenault2019-01-251-11/+8
| | | | | | Also move G_GEP actions together. llvm-svn: 352168
* AMDGPU/GlobalISel: Scalarize add/subMatt Arsenault2019-01-252-3/+2
| | | | llvm-svn: 352167
* GlobalISel: fewerElementsVector for more cast typesMatt Arsenault2019-01-252-3/+11
| | | | llvm-svn: 352166
* GlobalISel: fewerElementsVector for a few more trivial opsMatt Arsenault2019-01-252-5/+11
| | | | llvm-svn: 352165
* AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mulMatt Arsenault2019-01-253-1/+9
| | | | llvm-svn: 352162
* [HotColdSplit] Describe the pass in more detail, NFCVedant Kumar2019-01-251-5/+18
| | | | llvm-svn: 352161
* [HotColdSplit] Split more aggressively before/after cold invokesVedant Kumar2019-01-251-39/+55
| | | | | | | | | | | While a cold invoke itself and its unwind destination can't be extracted, code which unconditionally executes before/after the invoke may still be profitable to extract. With cost model changes from D57125 applied, this gives a 3.5% increase in split text across LNT+externals on arm64 at -Os. llvm-svn: 352160
* GlobalISel: Support fewerElementsVector for icmp/fcmpMatt Arsenault2019-01-252-9/+83
| | | | | | Also legalize 64-bit compares for AMDGPU llvm-svn: 352157
* GlobalISel: Implement fewerElementsVector for extensionsMatt Arsenault2019-01-252-2/+61
| | | | llvm-svn: 352155
* hwasan: If we split the entry block, move static allocas back into the entry ↵Peter Collingbourne2019-01-251-0/+15
| | | | | | | | | | | | block. Otherwise they are treated as dynamic allocas, which ends up increasing code size significantly. This reduces size of Chromium base_unittests by 2MB (6.7%). Differential Revision: https://reviews.llvm.org/D57205 llvm-svn: 352152
* GlobalISel: Add convenience mutatations to scalarizeMatt Arsenault2019-01-254-46/+26
| | | | llvm-svn: 352143
* [GlobalISel][AArch64] Avoid unused variable warning for variable only used ↵Benjamin Kramer2019-01-241-0/+1
| | | | | | in assert llvm-svn: 352133
* [PowerPC] Exploit store instructions that store a single vector elementNemanja Ivanovic2019-01-241-2/+102
| | | | | | | | | | | This patch exploits the instructions that store a single element from a vector to preform a (store (extract_elt)). We already have code that does this with ISA 3.0 instructions that were added to handle i8/i16 types. However, we had never exploited the existing ones that handle f32/f64/i32/i64 types. Differential revision: https://reviews.llvm.org/D56175 llvm-svn: 352131
* RegBankSelect: Fix use after free in r352123Matt Arsenault2019-01-241-1/+1
| | | | llvm-svn: 352130
* [GlobalISel][AArch64] Avoid unused function warnings in Release buildsBenjamin Kramer2019-01-241-0/+2
| | | | llvm-svn: 352129
* [x86] move half-size shuffle mask creation to helper; NFCSanjay Patel2019-01-241-34/+60
| | | | | | | | As noted in D57156, we want to check at least part of this pattern earlier (in combining), so this will allow the code to be shared instead of duplicated. llvm-svn: 352127
* [GISel]: Change how CSE is enabled by default for each passAditya Nandakumar2019-01-243-6/+12
| | | | | | | | | | | | | | | https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126
* Suppress unused capture warning in CheckCopyJessica Paquette2019-01-241-1/+1
| | | | | | | | | | | Werror bots didn't like the lambda + assert thing in my previous commit. Capture everything to suppress the error. Example failure here: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393 llvm-svn: 352124
* RegBankSelect: Support some more complex part mappingsMatt Arsenault2019-01-245-25/+295
| | | | llvm-svn: 352123
* [PDB] Increase TPI hash bucket count.Zachary Turner2019-01-241-2/+2
| | | | | | | | | | | | | | | PDBs contain several serialized hash tables. In the microsoft-pdb repo published to support LLVM implementing PDB support, the provided initializes the bucket count for the TPI and IPI streams to the maximum size. This occurs in tpi.cpp L33 and tpi.cpp L398. In the LLVM code for generating PDBs, these streams are created with minimum number of buckets. This difference makes LLVM generated PDBs slower for when used for debugging. Patch by C.J. Hebert Differential Revision: https://reviews.llvm.org/D56942 llvm-svn: 352117
* [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceilJessica Paquette2019-01-244-94/+484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113
* allow COFF .def directive in module assembly when using ThinLTOBob Haarman2019-01-241-0/+9
| | | | | | | | | | | | | | | | | | | | Summary: Using COFF's .def directive in module assembly used to crash ThinLTO with "this directive only supported on COFF targets" when getting symbol information in ModuleSymbolTable. This change allows ModuleSymbolTable to process such code and adds a test to verify that the .def directive has the desired effect on the native object file, with and without ThinLTO. Fixes https://bugs.llvm.org/show_bug.cgi?id=36789 Reviewers: rnk, pcc, vlad.tsyrklevich Subscribers: mehdi_amini, eraman, hiraditya, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D57073 llvm-svn: 352112
* [Analysis] Fix isSafeToLoadUnconditionally handling of volatile.Eli Friedman2019-01-241-0/+8
| | | | | | | | | A volatile operation cannot be used to prove an address points to normal memory. (LangRef was recently updated to state it explicitly.) Differential Revision: https://reviews.llvm.org/D57040 llvm-svn: 352109
* Limit dyld image suffixes guessed by guessLibraryShortName()Michael Trent2019-01-241-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: guessLibraryShortName() separates a full Mach-O dylib install name path into a short name and a dyld image suffix. The short name is the name of the dylib without its path or extension. The dyld image suffix is a string used by dyld to load variants of dylibs if available at runtime; for example, "when binding this process, load 'debug' variants of all required dylibs." dyld knows exactly what the image suffix is, but by convention diagnostic tools such as llvm-nm attempt to guess suffix names by looking at the install name path. These dyld image suffixes are separated from the short name by a '_' character. Because the '_' character is commonly used to separate words in filenames guessLibraryShortName() cannot reliably separate a dylib's short name from an arbitrary image suffix; imagine if both the short name and the suffix contains an '_' character! To better deal with this ambiguity, guessLibraryShortName() will recognize only "_debug" and "_profile" as valid Suffix values. Calling code needs to be tolerant of guessLibraryShortName() guessing incorrectly. The previous implementation of guessLibraryShortName() did not allow '_' characters to appear in short names. When present, the short name would be truncated, e.g., "libcompiler_rt" => "libcompiler". This change allows "libcompiler_rt" and "libcompiler_rt_debug" to both be recognized as "libcompiler_rt". rdar://47412244 Reviewers: kledzik, lhames, pete Reviewed By: pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56978 llvm-svn: 352104
* Fix a compiler error introduced in r352093.Haojian Wu2019-01-241-1/+1
| | | | llvm-svn: 352098
* [LICM] Cleanup duplicated code. [NFCI]Alina Sbirlea2019-01-241-17/+11
| | | | llvm-svn: 352093
* [MemorySSA +LICM CFHoist] Solve PR40317.Alina Sbirlea2019-01-241-0/+5
| | | | | | | | | | | | | | | Summary: MemorySSA needs updating each time an instruction is moved. LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions. Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore. Reviewers: jnspaulsson Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D57176 llvm-svn: 352092
* [HotColdSplit] Move splitting earlier in the pipelineVedant Kumar2019-01-242-9/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performing splitting early has several advantages: - Inhibiting inlining of cold code early improves code size. Compared to scheduling splitting at the end of the pipeline, this cuts code size growth in half within the iOS shared cache (0.69% to 0.34%). - Inhibiting inlining of cold code improves compile time. There's no need to inline split cold functions, or to inline as much *within* those split functions as they are marked `minsize`. - During LTO, extra work is only done in the pre-link step. Less code must be inlined during cross-module inlining. An additional motivation here is that the most common cold regions identified by the static/conservative splitting heuristic can (a) be found before inlining and (b) do not grow after inlining. E.g. __assert_fail, os_log_error. The disadvantages are: - Some opportunities for splitting out cold code may be missed. This gap can potentially be narrowed by adding a worklist algorithm to the splitting pass. - Some opportunities to reduce code size may be lost (e.g. store sinking, when one side of the CFG diamond is split). This does not outweigh the code size benefits of splitting earlier. On net, splitting early in the pipeline has substantial code size benefits, and no major effects on memory locality or performance. We measured memory locality using ktrace data, and consistently found that 10% fewer pages were needed to capture 95% of text page faults in key iOS benchmarks. We measured performance on frequency-stabilized iOS devices using LNT+externals. This reverses course on the decision made to schedule splitting late in r344869 (D53437). Differential Revision: https://reviews.llvm.org/D57082 llvm-svn: 352080
* [x86] rename VectorShuffle -> Shuffle; NFCSanjay Patel2019-01-241-706/+631
| | | | | | | This wasn't consistent within the file, so made it harder to search. Standardize on the shorter name to save some typing. llvm-svn: 352077
* Fix emission of _fltused for MSVC.James Y Knight2019-01-245-43/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | It should be emitted when any floating-point operations (including calls) are present in the object, not just when calls to printf/scanf with floating point args are made. The difference caused by this is very subtle: in static (/MT) builds, on x86-32, in a program that uses floating point but doesn't print it, the default x87 rounding mode may not be set properly upon initialization. This commit also removes the walk of the types pointed to by pointer arguments in calls. (To assist in opaque pointer types migration -- eventually the pointee type won't be available.) That latter implies that it will no longer consider a call like `scanf("%f", &floatvar)` as sufficient to emit _fltused on its own. And without _fltused, `scanf("%f")` will abort with error R6002. This new behavior is unlikely to bite anyone in practice (you'd have to read a float, and do nothing with it!), and also, is consistent with MSVC. Differential Revision: https://reviews.llvm.org/D56548 llvm-svn: 352076
* Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the ↵Julian Lettner2019-01-2410-19/+3
| | | | | | | | presence of `noreturn` calls" This reverts commit cea84ab93aeb079a358ab1c8aeba6d9140ef8b47. llvm-svn: 352069
* [SelectionDAGBuilder] Simplify HasSideEffect calculation. NFC.Nirav Dave2019-01-241-13/+7
| | | | llvm-svn: 352067
* [InlineAsm] Don't calculate registers for inline asm memory operands. NFCI.Nirav Dave2019-01-241-0/+4
| | | | llvm-svn: 352066
* [x86] add low/high undef half shuffle mask helpers; NFCSanjay Patel2019-01-241-8/+19
| | | | | | | | This is the most common usage for isUndefInRange, so make the code slightly less duplicated and more readable. llvm-svn: 352063
* [RS4GC] Be slightly less conservative for gep vector_base, scalar_idxPhilip Reames2019-01-241-11/+10
| | | | | | | | After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that. Differential Revision: https://reviews.llvm.org/D57161 llvm-svn: 352061
* [RS4GC] Avoid crashing on gep scalar_base, vector_idxPhilip Reames2019-01-241-0/+28
| | | | | | | | | | | | This is an alternative to https://reviews.llvm.org/D57103. After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread. The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector. This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern. Differential Revision: https://reviews.llvm.org/D57138 llvm-svn: 352059
* [TargetLowering] Rename getExpandedFixedPointMultiplication to ↵Simon Pilgrim2019-01-242-3/+2
| | | | | | | | expandFixedPointMul. NFCI. Match the (much shorter) name used in various legalization methods. llvm-svn: 352056
OpenPOWER on IntegriCloud