summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole ↵David Green2019-02-211-54/+12
| | | | | | | | | optimisation of CMPs I believe it's causing bootstrap failures for A32 code. I'll take a look at what's wrong. llvm-svn: 354569
* [AArch64] Print instruction before atomic semantic annotationsDavid Spickett2019-02-211-5/+6
| | | | | | | | | | | | | | | | | Commit r353303 added annotations when acquire semantics were dropped from an instruction. printAnnotation was called before printInstruction. So if you didn't set a separate comment output stream you got <comment><instr> instead of <instr><comment> as expected. To fix this move the new printAnnotation to after the instruction is printed. Differential Revision: https://reviews.llvm.org/D58059 llvm-svn: 354565
* [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPsDavid Green2019-02-211-12/+54
| | | | | | | | | This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354564
* [ARM] Negative constants mishandled in ARM CGPSam Parker2019-02-211-5/+5
| | | | | | | | | | | | | During type promotion, sometimes we convert negative an add with a negative constant into a sub with a positive constant. The loop that performs this transformation has two issues: - it iterates over a set, causing non-determinism. - it breaks, instead of continuing, when it finds the first non-negative operand. Differential Revision: https://reviews.llvm.org/D58452 llvm-svn: 354557
* [WebAssembly] Default to something reasonable in WebAssemblyAddMissingPrototypesSam Clegg2019-02-211-7/+13
| | | | | | | | | | | | | | | | | | Previously if we couldn't derive a prototype for a "no-prototype" function from C we would leave it as is: void foo(...) With this change we instead give is an empty signature and remove the "no-prototype" attribute. This fixes the current wasm waterfall test failure. Tags: #llvm Differential Revision: https://reviews.llvm.org/D58488 llvm-svn: 354544
* [AMDGPU] fix commuted case of sub combineStanislav Mekhanoshin2019-02-211-5/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543
* Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"Amara Emerson2019-02-212-153/+0
| | | | | | This reverts r354521 because it broke the bots, but passes on Darwin somehow. llvm-svn: 354532
* [WebAssembly] Don't error on conflicting uses of prototype-less functionsSam Clegg2019-02-201-6/+8
| | | | | | | | | | | | | | | | When we can't determine with certainty the signature of a function import we pick the fist signature we find rather than error'ing out. The resulting program might not do what is expected since we might pick the wrong signature. However since undefined behavior in C to use the same function with different signatures this seems better than refusing to compile such programs. Fixes PR40472 Differential Revision: https://reviews.llvm.org/D58304 llvm-svn: 354523
* [AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTORAmara Emerson2019-02-202-0/+153
| | | | | | | | | | | | | | This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and implements them with a very pessimistic TBL2 instruction in the selector. For TBL2, support is also needed to generate constant pool entries and load from them in order to materialize the mask register. Currently supports <2 x s64> and <4 x s32> result types. Differential Revision: https://reviews.llvm.org/D58466 llvm-svn: 354521
* AMDGPU/GlobalISel: Move SMRD selection logic to TableGenTom Stellard2019-02-204-128/+136
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52922 llvm-svn: 354516
* [SDAG] Support vector UMULO/SMULONikita Popov2019-02-201-0/+2
| | | | | | | | | | | | | | | Second part of https://bugs.llvm.org/show_bug.cgi?id=40442. This adds an extra UnrollVectorOverflowOp() method to SDAG, because the general UnrollOverflowOp() method can't deal with multiple results. Additionally we need to expand UMULO/SMULO during vector op legalization, as it may result in unrolling, which may need additional type legalization. Differential Revision: https://reviews.llvm.org/D57997 llvm-svn: 354513
* [X86] Add more load folding patterns for blend instructions as a follow up ↵Craig Topper2019-02-201-6/+66
| | | | | | | | | | | | | | to r354363. This avoids depending on the peephole pass to do load folding. Also adds some load folding for some insert_subvector patterns that use blend. All of this was found by temporarily adding TB_NO_FORWARD to the blend immediate entries in the load folding tables. I've added -disable-peephole to some of the affected tests from that experiment to ensure we're testing isel patterns. llvm-svn: 354511
* [X86][SSE] combineX86ShufflesRecursively - begin generalizing the number of ↵Simon Pilgrim2019-02-201-11/+7
| | | | | | | | shuffle inputs. NFCI. We currently bail if the target shuffle decodes to more than 2 input vectors, this is some initial cleanup that still has the limit but generalizes the opindices to an array that will be necessary when we drop the limit. llvm-svn: 354489
* GlobalISel: Fix fewerElementsVector for ctlz with different result typeMatt Arsenault2019-02-201-2/+2
| | | | | | Also complete the set of related operations. llvm-svn: 354480
* GlobalISel: Implement moreElementsVector for g_insert resultsMatt Arsenault2019-02-201-14/+24
| | | | llvm-svn: 354477
* [Hexagon] Split vector pairs for ISD::SIGN_EXTEND and ISD::ZERO_EXTENDKrzysztof Parzyszek2019-02-202-0/+7
| | | | llvm-svn: 354473
* [MIPS MSA] Avoid some DAG combines for vector shiftsPetar Avramovic2019-02-202-0/+9
| | | | | | | | | | DAG combiner combines two shifts into shift + and with bitmask. Avoid such combines for vectors since leaving two vector shifts as they are produces better end results. Differential Revision: https://reviews.llvm.org/D58225 llvm-svn: 354461
* [X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUsCraig Topper2019-02-201-1/+0
| | | | | | | | | | | | | | | | | | | Summary: Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag. Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead. Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU. Reviewers: spatel, RKSimon, chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D58412 llvm-svn: 354436
* Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit ↵Eric Christopher2019-02-202-8/+0
| | | | | | | | | | | | | | | | horizontal X86 instructions (add, sub)" As this has broken the lto bootstrap build for 3 days and is showing a significant regression on the Dither_benchmark results (from the LLVM benchmark suite) -- specifically, on the BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have regressed by about 28% on Skylake, 34% on Haswell, and over 40% on Sandybridge. This reverts commit r353923. llvm-svn: 354434
* [RISCV] Implement pseudo instructions for load/store from a symbol address.Kito Cheng2019-02-205-14/+137
| | | | | | | | | | | | | Summary: Those pseudo-instructions are making load/store instructions able to load/store from/to a symbol, and its always using PC-relative addressing to generating a symbol address. Reviewers: asb, apazos, rogfer01, jrtc27 Differential Revision: https://reviews.llvm.org/D50496 llvm-svn: 354430
* [PowerPC] exploit P9 instruction maddld.Chen Zheng2019-02-202-4/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D58364 llvm-svn: 354427
* [WebAssembly] Refactor atomic operation definitions (NFC)Heejin Ahn2019-02-201-205/+226
| | | | | | | | | | | | | | | | | | | Summary: - Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and make other operations inherit from them - Factor the common opcode prefix '0xfe' out from the opcodes into the common class - Reorder instructions in the order of increasing opcodes Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58338 llvm-svn: 354421
* [WebAssembly] Fix load/store name detection for atomic instructionsHeejin Ahn2019-02-202-7/+6
| | | | | | | | | | | | | | | | | | | | Summary: Fixed a bug in the routine in AsmParser that determines whether the current instruction is a load or a store. Atomic instructions' prefixes are not `atomic_` but `atomic.`, and all atomic instructions are also memory instructions. Also fixed the printing format of atomic instructions to match other memory instructions and added encoding tests for atomic instructions. Reviewers: aardappel, tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58337 llvm-svn: 354419
* [WebAssembly] Fixed disassembler not knowing about OPERAND_EVENTWouter van Oortmerssen2019-02-201-0/+1
| | | | | | | | | | | | Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58414 llvm-svn: 354416
* [WebAssembly] Update MC for bulk memoryThomas Lively2019-02-191-0/+1
| | | | | | | | | | | | | | | | | | Summary: Rename MemoryIndex to InitFlags and implement logic for determining data segment layout in ObjectYAML and MC. Also adds a "passive" flag for the .section assembler directive although this cannot be assembled yet because the assembler does not support data sections. Reviewers: sbc100, aardappel, aheejin, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57938 llvm-svn: 354397
* [X86] Mark FP32_TO_INT16_IN_MEM/FP32_TO_INT32_IN_MEM/FP32_TO_INT64_IN_MEM as ↵Craig Topper2019-02-191-1/+3
| | | | | | | | | | | | | | clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR. After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags. Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first. We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this. Fixes PR40779 llvm-svn: 354395
* PowerPC: Fix typos in commentsJinsong Ji2019-02-191-2/+2
| | | | llvm-svn: 354382
* [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if ↵Craig Topper2019-02-192-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | they view 512-bit vectors differently. The use of the -mprefer-vector-width=256 command line option mixed with functions using vector intrinsics can create situations where one function thinks 512 vectors are legal, but another fucntion does not. If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion might try to pass by value instead. This will result in type legalization for the two functions handling the 512 bit vector differently leading to runtime failures. Had the 512 bit vector been passed by value from clang codegen, both functions would have been tagged with a min-legal-vector-width=512 function attribute. That would make them be legalized the same way. I observed this issue in 32-bit mode where a union containing a 512 bit vector was being passed by a function that used intrinsics to one that did not. The caller ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1. The fix implemented here is just to consider it a mismatch if two functions would handle 512 bit differently without looking at the types that are being considered. This is the easist and safest fix, but it can be improved in the future. Differential Revision: https://reviews.llvm.org/D58390 llvm-svn: 354376
* [X86][SSE] Generalize X86ISD::BLENDI support to more value typesSimon Pilgrim2019-02-192-60/+63
| | | | | | | | | | | | | | | | | | D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required. With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors. This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts. I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case). The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV. Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361 Differential Revision: https://reviews.llvm.org/D57888 llvm-svn: 354363
* [X86][AVX2] Hide VPBLENDD instructions behind AVX2 predicateSimon Pilgrim2019-02-191-0/+2
| | | | | | This was the cause of the regression in D57888 - the commuted load pattern wasn't hidden by the predicate so once we enabled v4i32 blends on SSE41+ targets then isel was incorrectly matched against AVX2+ instructions. llvm-svn: 354358
* [X86] Bugfix for nullptr check by klocworkCraig Topper2019-02-192-4/+5
| | | | | | | | | | klocwork critical issues in CG files: Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D58363 llvm-svn: 354357
* X86AsmParser AVX-512: Return error instead of hitting assertCraig Topper2019-02-191-0/+2
| | | | | | | | | | When parsing a sequence of tokens beginning with {, it will hit an assert and crash if the token afterwards is not an identifier. Instead of this, return a more verbose error as seen elsewhere in the function. Patch by Brandon Jones (BrandonTJones) Differential Revision: https://reviews.llvm.org/D57375 llvm-svn: 354356
* [X86] Filter out tuning feature flags and a few ISA feature flags when ↵Craig Topper2019-02-192-4/+57
| | | | | | | | | | | | | | | | checking for function inline compatibility. Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining. There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present. Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible. I've based the implementation here of the similar code in AMDGPU. Differential Revision: https://reviews.llvm.org/D58371 llvm-svn: 354355
* GlobalISel: Implement moreElementsVector for selectMatt Arsenault2019-02-191-18/+9
| | | | llvm-svn: 354354
* GlobalISel: Implement moreElementsVector for G_EXTRACT sourceMatt Arsenault2019-02-191-0/+1
| | | | llvm-svn: 354348
* [X86][AVX] Update VBROADCAST folds to always use v2i64 X86vzloadSimon Pilgrim2019-02-192-3/+3
| | | | | | | | The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands. Follow up to D58053 llvm-svn: 354346
* GlobalISel: Implement moreElementsVector for bit opsMatt Arsenault2019-02-191-0/+20
| | | | llvm-svn: 354345
* Cast from SDValue directly instead of superfluous getNode(). NFCI.Simon Pilgrim2019-02-191-2/+2
| | | | llvm-svn: 354343
* [X86][AVX] EltsFromConsecutiveLoads - Add BROADCAST lowering supportSimon Pilgrim2019-02-191-3/+70
| | | | | | | | | | This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads. It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load. Differential Revision: https://reviews.llvm.org/D58053 llvm-svn: 354340
* [RISCV][NFC] Move some std::string to StringRefAlex Bradbury2019-02-193-5/+5
| | | | llvm-svn: 354333
* [ARM GlobalISel] Support G_PHI for Thumb2Diana Picus2019-02-191-5/+5
| | | | | | Same as arm mode. llvm-svn: 354310
* [X86] Remove command line strings from the ProcIntel* features.Craig Topper2019-02-191-10/+10
| | | | | | These should always follow the CPU string. There's no reason to control them independently. llvm-svn: 354304
* [GlobalISel][AArch64] Legalize + select some llvm.ctlz.* intrinsicsJessica Paquette2019-02-181-0/+4
| | | | | | | | | | | | Legalize/select llvm.ctlz.* Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and arm64-vclz.ll to show that we perform valid transformations in optimized builds, and document where GISel can improve. Differential Revision: https://reviews.llvm.org/D58155 llvm-svn: 354299
* [CGP] form usub with overflow from sub+icmpSanjay Patel2019-02-182-0/+12
| | | | | | | | | | | | | | | | | The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298
* AMDGPU: Use MachineInstr::mayAlias to replace ↵Changpeng Fang2019-02-182-21/+8
| | | | | | | | | | | | | | | areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass. Summary: This is to fix a memory dependence bug in LoadStoreOptimizer. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58295 llvm-svn: 354295
* GlobalISel: Implement widenScalar for g_extract scalar resultsMatt Arsenault2019-02-181-2/+3
| | | | llvm-svn: 354293
* [x86] split more v8f32/v8i32 shuffles in loweringSanjay Patel2019-02-181-4/+5
| | | | | | | | | | | | | | Similar to D57867 - this is a small patch with lots of test diffs. With half-vector-width narrowing potential, using an extract + 128-bit vshufps is a win because it replaces a 256-bit shuffle with a 128-bit shufle. This seems like it should be a win even for targets with 'fast-variable-shuffle', but we are intentionally deferring that to an independent change to make sure that is true. Differential Revision: https://reviews.llvm.org/D58181 llvm-svn: 354279
* [X86] In FP_TO_INTHelper, when moving data from SSE register to X87 register ↵Craig Topper2019-02-171-13/+9
| | | | | | | | | | file via the stack, use the same stack slot we use for the integer conversion. No need for a separate stack slot. The lifetimes don't overlap. Also fix the MachinePointerInfo for the final load after the integer conversion to indicate it came from the stack slot. llvm-svn: 354234
* [X86] When type legalizing the result of a i64 fp_to_uint on 32-bit targets. ↵Craig Topper2019-02-161-27/+10
| | | | | | | | | | Generate all of the ops as i64 and let them be legalized. No need to manually split everything. We can let the type legalizer work for us. The test change seems to be caused by some DAG ordering issue that was previously circumventing a one use check in LowerSELECT where FP selects are turned into blends if the setcc has one use. But it was running after an integer select and the same setcc had been legalized to cmov and X86SISD::CMP. This dropped the use count of the setcc, but wasn't what was intended. llvm-svn: 354197
* [X86] Don't prevent load folding for cvtsi2ss/cvtsi2sd based on ↵Craig Topper2019-02-161-39/+48
| | | | | | | | | | hasPartialRegUpdate. Preventing the load fold won't fix the partial register update since the input we can fold is a GPR. So it will do nothing to prevent a false dependency on an XMM register. llvm-svn: 354193
OpenPOWER on IntegriCloud