summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove and autoupgrade vpconflict intrinsics that take a mask and ↵Craig Topper2019-01-262-12/+16
| | | | | | | | passthru argument. We have unmasked versions as of r352172 llvm-svn: 352270
* Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked ↵Craig Topper2019-01-262-16/+4
| | | | | | | | loads in the type legalizer" This might be breaking an lldb windows buildbot. llvm-svn: 352268
* [X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and ↵Craig Topper2019-01-262-39/+34
| | | | | | | | | | | | | | | | cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics. Summary: See clang patch D56998 for a full description. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56999 llvm-svn: 352266
* [WebAssembly][NFC] Group SIMD-related ISel configurationThomas Lively2019-01-261-59/+45
| | | | | | | | | | Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57263 llvm-svn: 352262
* [PowerPC] Update Vector Costs for P9Nemanja Ivanovic2019-01-265-12/+59
| | | | | | | | | | | | | For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261
* [X86] Add DAG combine to merge vzext_movl with the various fp<->int ↵Craig Topper2019-01-263-84/+26
| | | | | | | | | | | | | | | | conversion operations that only write the lower 64-bits of an xmm register and zero the rest. Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56971 llvm-svn: 352260
* [SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the ↵Craig Topper2019-01-262-4/+16
| | | | | | | | | | | | | | | | | | | | | type legalizer Summary: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization. On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57186 llvm-svn: 352255
* [NFC] Test commit : fix typo.Alexey Lapshin2019-01-251-1/+1
| | | | llvm-svn: 352248
* [RISCV] Add target DAG combine for bitcast fabs/fneg on RV32FDAlex Bradbury2019-01-251-3/+28
| | | | | | | | | | | | | DAGCombiner::visitBITCAST will perform: fold (bitconvert (fneg x)) -> (xor (bitconvert x), signbit) fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit)) As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for SplitF64. llvm-svn: 352247
* [llvm] Opt-in flag for X86DiscriminateMemOpsMircea Trofin2019-01-252-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, if an instruction with a memory operand has no debug information, X86DiscriminateMemOps will generate one based on the first line of the enclosing function, or the last seen debug info. This may cause confusion in certain debugging scenarios. The long term approach would be to use the line number '0' in such cases, however, that brings in challenges: the base discriminator value range is limited (4096 values). For the short term, adding an opt-in flag for this feature. See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319) Reviewers: dblaikie, jmorse, gbedwell Reviewed By: dblaikie Subscribers: aprantl, eraman, hiraditya Differential Revision: https://reviews.llvm.org/D57257 llvm-svn: 352246
* [GlobalISel][AArch64][NFC] Fix incorrect comment in selectUnmergeValuesJessica Paquette2019-01-251-1/+1
| | | | | | s/scalar/vector/ llvm-svn: 352243
* Revert rL352238.Alina Sbirlea2019-01-251-2/+2
| | | | llvm-svn: 352241
* [WarnMissedTransforms] Set default to 1.Alina Sbirlea2019-01-251-2/+2
| | | | | | | | | | | | | | Summary: Set default value for retrieved attributes to 1, since the check is against 1. Eliminates the warning noise generated when the attributes are not present. Reviewers: sanjoy Subscribers: jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D57253 llvm-svn: 352238
* Reapply: [RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUIAna Pazos2019-01-253-3/+18
| | | | | | This reapplies commit r352010 with RISC-V test fixes. llvm-svn: 352237
* [MBP] Don't move bottom block before header if it can't reduce taken branchesGuozhi Wei2019-01-251-0/+38
| | | | | | | | | | | | | | | | | | If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case: -->OldTop<- | . | | . | | . | ---Pred | | | BB----- Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving. Differential Revision: https://reviews.llvm.org/D57067 llvm-svn: 352236
* [X86] Combine masked store and truncate into masked truncating stores.Craig Topper2019-01-251-5/+13
| | | | | | | | | | We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch. This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate. Differential Revision: https://reviews.llvm.org/D57218 llvm-svn: 352230
* [HotColdSplit] Introduce a cost model to control splitting behaviorVedant Kumar2019-01-251-36/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main goal of the model is to avoid *increasing* function size, as that would eradicate any memory locality benefits from splitting. This happens when: - There are too many inputs or outputs to the cold region. Argument materialization and reloads of outputs have a cost. - The cold region has too many distinct exit blocks, causing a large switch to be formed in the caller. - The code size cost of the split code is less than the cost of a set-up call. A secondary goal is to prevent excessive overall binary size growth. With the cost model in place, I experimented to find a splitting threshold that works well in practice. To make warm & cold code easily separable for analysis purposes, I moved split functions to a "cold" section. I experimented with thresholds between [0, 4] and set the default to the threshold which minimized geomean __text size. Experiment data from building LNT+externals for X86 (N = 639 programs, all sizes in bytes): | Configuration | __text geom size | __cold geom size | TEXT geom size | | **-Os** | 1736.3 | 0, n=0 | 10961.6 | | -Os, thresh=0 | 1740.53 | 124.482, n=134 | 11014 | | -Os, thresh=1 | 1734.79 | 57.8781, n=90 | 10978.6 | | -Os, thresh=2 | ** 1733.85 ** | 65.6604, n=61 | 10977.6 | | -Os, thresh=3 | 1733.85 | 65.3071, n=61 | 10977.6 | | -Os, thresh=4 | 1735.08 | 67.5156, n=54 | 10965.7 | | **-Oz** | 1554.4 | 0, n=0 | 10153 | | -Oz, thresh=2 | ** 1552.2 ** | 65.633, n=61 | 10176 | | **-O3** | 2563.37 | 0, n=0 | 13105.4 | | -O3, thresh=2 | ** 2559.49 ** | 71.1072, n=61 | 13162.4 | Picking thresh=2 reduces the geomean __text section size by 0.14% at -Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that TEXT size is page-aligned, whereas section sizes are byte-aligned. Experiment data from building LNT+externals for ARM64 (N = 558 programs, all sizes in bytes): | Configuration | __text geom size | __cold geom size | TEXT geom size | | **-Os** | 1763.96 | 0, n=0 | 42934.9 | | -Os, thresh=2 | ** 1760.9 ** | 76.6755, n=61 | 42934.9 | Picking thresh=2 reduces the geomean __text section size by 0.17% at -Os and causes no growth in the TEXT segment. Measurements were done with D57082 (r352080) applied. Differential Revision: https://reviews.llvm.org/D57125 llvm-svn: 352228
* [MC] Teach the MachO object writer about N_FUNC_COLDVedant Kumar2019-01-255-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker to order a symbol towards the end of its section, to improve locality. Example: ``` void a1() {} __attribute__((cold)) void a2() {} void a3() {} int main() { a1(); a2(); a3(); return 0; } ``` A linker that supports N_FUNC_COLD will order _a2 to the end of the text section. From `nm -njU` output, we see: ``` _a1 _a3 _main _a2 ``` Differential Revision: https://reviews.llvm.org/D57190 llvm-svn: 352227
* [x86] simplify logic in lowerShuffleWithUndefHalf(); NFCISanjay Patel2019-01-251-7/+9
| | | | | | | | | | | This seems unnecessarily complicated because we gave names to opposite polarity bools and have code comments that don't really line up with the logic. Step 1: remove UndefUpper and assert that it is the opposite of UndefLower after the initial early exit. llvm-svn: 352217
* [DiagnosticInfo] Add support for preserving newlines in remark arguments.Florian Hahn2019-01-251-1/+23
| | | | | | | | | | | | | | | | | | | | This patch adds a new type StringBlockVal which can be used to emit a YAML block scalar, which preserves newlines in a multiline string. It also updates MappingTraits<DiagnosticInfoOptimizationBase::Argument> to use it for argument values with more than a single newline. This is helpful for remarks that want to display more in-depth information in a more structured way. Reviewers: thegameg, anemet Reviewed By: anemet Subscribers: hfinkel, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D57159 llvm-svn: 352216
* [TEST][COMMIT] - fix comment typo in AsmPrinter/DwarfDebug.cpp - NFCTom Weaver2019-01-251-1/+1
| | | | llvm-svn: 352214
* [X86] Simplify X86ISD::ADD/SUB if we don't use the result flagSimon Pilgrim2019-01-251-0/+18
| | | | | | | | | | | | Simplify to the generic ISD::ADD/SUB if we don't make use of the result flag. This mainly helps with ADDCARRY/SUBBORROW intrinsics which get expanded to X86ISD::ADD/SUB but could be simplified further. Noticed in some of the test cases in PR31754 Differential Revision: https://reviews.llvm.org/D57234 llvm-svn: 352210
* [x86] narrow a shuffle that doesn't use or set any high elementsSanjay Patel2019-01-251-2/+47
| | | | | | | | | | | | | | | This isn't the final fix for our reduction/horizontal codegen, but it takes care of a lot of the problems. After we narrow the shuffle, existing combines for insert/extract and binops kick in, and we end up with cheaper 128-bit ops. The avg and mul reduction tests show an existing shuffle lowering hole for AVX2/AVX512. I think in its most minimal form this is: https://bugs.llvm.org/show_bug.cgi?id=40434 ...but we might need multiple fixes to get it right. Differential Revision: https://reviews.llvm.org/D57156 llvm-svn: 352209
* [JSON] Work around excess-precision issue when comparing T_Integer numbers.Sam McCall2019-01-251-0/+6
| | | | | | | | | | Reviewers: bkramer Subscribers: kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D57237 llvm-svn: 352204
* Fix gcc -Wparentheses warning. NFCI.Simon Pilgrim2019-01-251-4/+4
| | | | llvm-svn: 352193
* Fix gcc -Wparentheses warning. NFCI.Simon Pilgrim2019-01-251-2/+2
| | | | llvm-svn: 352191
* [ARM GlobalISel] Support shifts for Thumb2Diana Picus2019-01-251-4/+4
| | | | | | | | | | Same as ARM. On this occasion we split some of the instruction select tests for more complicated instructions into their own files, so we can reuse them for ARM and Thumb mode. Likewise for the legalizer tests. llvm-svn: 352188
* [ARM GlobalISel] Remove rebase artifact from r351882. NFCDiana Picus2019-01-251-3/+0
| | | | | | | r351882 introduced some superfluous calls to mark G_INTTOPTR and G_PTRTOINT as legal (looks like a rebase mishap). Remove them. llvm-svn: 352187
* [TblGen] Extend !if semantics through new feature !condJaved Absar2019-01-255-1/+224
| | | | | | | | | | | | | | | | | This patch extends TableGen language with !cond operator. Instead of embedding !if inside !if which can get cumbersome, one can now use !cond. Below is an example to convert an integer 'x' into a string: !cond(!lt(x,0) : "Negative", !eq(x,0) : "Zero", !eq(x,1) : "One, 1 : "MoreThanOne") Reviewed By: hfinkel, simon_tatham, greened Differential Revision: https://reviews.llvm.org/D55758 llvm-svn: 352185
* [MSP430] Fix absolute addressing mode printing in AsmPrinterAnton Korobeynikov2019-01-251-2/+2
| | | | | | | | | | | | | Align checks for absolute addressing mode with its current implementation (SR is used as a base register). This fixes https://bugs.llvm.org/show_bug.cgi?id=39993 Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56785 llvm-svn: 352178
* [PowerPC] Enhance the fast selection of cmp instruction and clean up related ↵Zi Xuan Wu2019-01-251-3/+12
| | | | | | | | | | | | | | | | | asserts Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support. We'd use VSX float comparison instruction instead of non-vsx float comparison instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is opened. If the target does not have corresponding VSX instruction comparison for some type, just copy VSX-related register to common float register class and use non-vsx comparison instruction. Differential Revision: https://reviews.llvm.org/D57078 llvm-svn: 352174
* [X86] Add non-masked versions of vpconflict intrinsics so we can use a ↵Craig Topper2019-01-251-0/+6
| | | | | | | | select in the header file in clang. I'll remove and autoupgrade the old intrinsics in a future commit. llvm-svn: 352172
* [RISCV] Custom-legalise i32 SDIV/UDIV/UREM on RV64MAlex Bradbury2019-01-253-49/+55
| | | | | | | | | | | | | | Follow the same custom legalisation strategy as used in D57085 for variable-length shifts (see that patch summary for more discussion). Although we may lose out on some late-stage DAG combines, I think this custom legalisation strategy is ultimately easier to reason about. There are some codegen changes in rv64m-exhaustive-w-insts.ll but they are all neutral in terms of the number of instructions. Differential Revision: https://reviews.llvm.org/D57096 llvm-svn: 352171
* [LoopSimplifyCFG] Fix inconsistency in blocks in loop markupMax Kazantsev2019-01-251-1/+1
| | | | | | | | | | | | 2nd part of D57095 with the same reason, just in another place. We never fold branches that are not immediately in the current loop, but this check is missing in `IsEdgeLive` As result, it may think that the edge in subloop is dead while it's live. It's a pessimization in the current stance. Differential Revision: https://reviews.llvm.org/D57147 Reviewed By: rupprecht llvm-svn: 352170
* [RISCV] Custom-legalise 32-bit variable shifts on RV64Alex Bradbury2019-01-253-67/+105
| | | | | | | | | | | | | | | | | | | | | | | | | The previous DAG combiner-based approach had an issue with infinite loops between the target-dependent and target-independent combiner logic (see PR40333). Although this was worked around in rL351806, the combiner-based approach is still potentially brittle and can fail to select the 32-bit shift variant when profitable to do so, as demonstrated in the pr40333.ll test case. This patch instead introduces target-specific SelectionDAG nodes for SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a good example of how this approach can improve codegen. This adds DAG combine that does SimplifyDemandedBits on the operands (only lower 32-bits of first operand and lower 5 bits of second operand are read). This seems better than implementing SimplifyDemandedBitsForTargetNode as there is no guarantee that would be called (and it's not for e.g. the anyext return test cases). Also implements ComputeNumSignBitsForTargetNode. There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new instruction sequences are semantically equivalent. Differential Revision: https://reviews.llvm.org/D57085 llvm-svn: 352169
* AMDGPU/GlobalISel: Remove leftover setActionMatt Arsenault2019-01-251-11/+8
| | | | | | Also move G_GEP actions together. llvm-svn: 352168
* AMDGPU/GlobalISel: Scalarize add/subMatt Arsenault2019-01-252-3/+2
| | | | llvm-svn: 352167
* GlobalISel: fewerElementsVector for more cast typesMatt Arsenault2019-01-252-3/+11
| | | | llvm-svn: 352166
* GlobalISel: fewerElementsVector for a few more trivial opsMatt Arsenault2019-01-252-5/+11
| | | | llvm-svn: 352165
* AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mulMatt Arsenault2019-01-253-1/+9
| | | | llvm-svn: 352162
* [HotColdSplit] Describe the pass in more detail, NFCVedant Kumar2019-01-251-5/+18
| | | | llvm-svn: 352161
* [HotColdSplit] Split more aggressively before/after cold invokesVedant Kumar2019-01-251-39/+55
| | | | | | | | | | | While a cold invoke itself and its unwind destination can't be extracted, code which unconditionally executes before/after the invoke may still be profitable to extract. With cost model changes from D57125 applied, this gives a 3.5% increase in split text across LNT+externals on arm64 at -Os. llvm-svn: 352160
* GlobalISel: Support fewerElementsVector for icmp/fcmpMatt Arsenault2019-01-252-9/+83
| | | | | | Also legalize 64-bit compares for AMDGPU llvm-svn: 352157
* GlobalISel: Implement fewerElementsVector for extensionsMatt Arsenault2019-01-252-2/+61
| | | | llvm-svn: 352155
* hwasan: If we split the entry block, move static allocas back into the entry ↵Peter Collingbourne2019-01-251-0/+15
| | | | | | | | | | | | block. Otherwise they are treated as dynamic allocas, which ends up increasing code size significantly. This reduces size of Chromium base_unittests by 2MB (6.7%). Differential Revision: https://reviews.llvm.org/D57205 llvm-svn: 352152
* GlobalISel: Add convenience mutatations to scalarizeMatt Arsenault2019-01-254-46/+26
| | | | llvm-svn: 352143
* [GlobalISel][AArch64] Avoid unused variable warning for variable only used ↵Benjamin Kramer2019-01-241-0/+1
| | | | | | in assert llvm-svn: 352133
* [PowerPC] Exploit store instructions that store a single vector elementNemanja Ivanovic2019-01-241-2/+102
| | | | | | | | | | | This patch exploits the instructions that store a single element from a vector to preform a (store (extract_elt)). We already have code that does this with ISA 3.0 instructions that were added to handle i8/i16 types. However, we had never exploited the existing ones that handle f32/f64/i32/i64 types. Differential revision: https://reviews.llvm.org/D56175 llvm-svn: 352131
* RegBankSelect: Fix use after free in r352123Matt Arsenault2019-01-241-1/+1
| | | | llvm-svn: 352130
* [GlobalISel][AArch64] Avoid unused function warnings in Release buildsBenjamin Kramer2019-01-241-0/+2
| | | | llvm-svn: 352129
OpenPOWER on IntegriCloud