summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [WebAssembly] Update expectations for gcc torture testsSam Clegg2019-04-301-0/+12
| | | | | | | | | | This is needed to make the wasm waterfall green again after we land the update to WASI: https://github.com/WebAssembly/waterfall/pull/492 Differential Revision: https://reviews.llvm.org/D61351 llvm-svn: 359634
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-3016-317/+1071
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x)Simon Pilgrim2019-04-301-5/+7
| | | | | | | | This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality Minor improvement for PR39709 llvm-svn: 359608
* [WebAssembly] Support f16 libcallsDan Gohman2019-04-302-1/+23
| | | | | | | | | | | | Add support for f16 libcalls in WebAssembly. This entails adding signatures for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt already supports this). Differential Revision: https://reviews.llvm.org/D61287 Reviewer: dschuff llvm-svn: 359600
* [X86] Remove if that's always trueCraig Topper2019-04-301-2/+1
| | | | | | | | It's been like this since it was added in a refactor of this code. Fixes PR41659 llvm-svn: 359597
* [X86] If PreprocessISelDAG reorders a load before a call, make sure we ↵Craig Topper2019-04-301-0/+5
| | | | | | | | | | | | remove dead nodes from the graph The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue. This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it. Differential Revision: https://reviews.llvm.org/D61164 llvm-svn: 359582
* [X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation ↵Craig Topper2019-04-301-91/+75
| | | | | | | | | | | | | | | | | | | | | | from other LEA optimizations. This removes some of the class variables. Merge basic block processing into runOnMachineFunction to keep the flags local. Pass MachineBasicBlock around instead of an iterator. We can get the iterator in the few places that need it. Allows a range-based outer for loop. Separate the Atom optimization from the rest of the optimizations. This allows fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA when profitable by its heuristics. I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index register is equal to the destination register. This is profitable regardless of the various slow flags. But again we would want Atom to be able to undo that. Differential Revision: https://reviews.llvm.org/D60993 llvm-svn: 359581
* [ARM] Implement TTI::getMemcpyCostSjoerd Meijer2019-04-302-0/+37
| | | | | | | | | This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547
* Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to ↵Simon Pilgrim2019-04-301-0/+8
| | | | | | | | | | | | | | | | | SCALAR_TO_VECTOR(Elt) for all SSE flavors Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd. INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32. So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4. https://bugs.llvm.org/show_bug.cgi?id=41512 Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike. Committed on behalf of @Serge_Preis (Serge Preis) Differential Revision: https://reviews.llvm.org/D60852 llvm-svn: 359545
* [ARM GlobalISel] Widen small shift operandsDiana Picus2019-04-301-0/+1
| | | | | | | The legalizer was already widening the shift amount. Add tests for that behaviour, and also support widening the shifted value. llvm-svn: 359542
* [AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptrFangrui Song2019-04-301-3/+2
| | | | | | | Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers. A unique_ptr makes the ownership clearer. llvm-svn: 359541
* [ARM GlobalISel] Be more careful about bailing outDiana Picus2019-04-301-2/+2
| | | | | | | | | Bail out on function arguments/returns with types aggregating an unsupported type. This fixes cases where we would happily and incorrectly lower functions taking e.g. [1 x i64] parameters, when we don't even support plain i64 yet. llvm-svn: 359540
* [TargetLowering] Change getOptimalMemOpType to take a function attribute listSjoerd Meijer2019-04-3015-48/+38
| | | | | | | | | | | | The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
* [WebAssembly] Make an assertion message prettier. NFC.Dan Gohman2019-04-291-2/+2
| | | | | | This is a follow-up to https://reviews.llvm.org/D59521. llvm-svn: 359509
* [WebAssembly] Define the signature for __stack_chk_failDan Gohman2019-04-291-1/+9
| | | | | | | | | | | | | | | The WebAssembly backend needs to know the signatures of all runtime libcall functions. This adds the signature for __stack_chk_fail which was previously missing. Also, make the error message for a missing libcall include the name of the function. Differential Revision: https://reviews.llvm.org/D59521 Reviewed By: sbc100 llvm-svn: 359505
* [PowerPC] Try harder to avoid load/move-to VSR for partial vector loadsRoland Froese2019-04-291-15/+36
| | | | | | | | | | Change the PPCISelLowering.cpp function that decides to avoid update form in favor of partial vector loads to know about newer load types and to not be confused by the chain operand. Differential Revision: https://reviews.llvm.org/D60102 llvm-svn: 359504
* [GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1hJessica Paquette2019-04-291-5/+68
| | | | | | | | | | This was falling back and gives us a reason to create a selectIntrinsic function which we would need eventually anyway. Update arm64-crypto.ll to show that we correctly select it. Also factor out the code for finding an intrinsic ID. llvm-svn: 359501
* [X86] Run CFIInstrInserter on Windows if Dwarf is usedMartin Storsjo2019-04-291-1/+5
| | | | | | | | | | | This is necessary since SVN r330706, as tail merging can include CFI instructions since then. This fixes PR40322 and PR40012. Differential Revision: https://reviews.llvm.org/D61252 llvm-svn: 359496
* [X86][SSE] isHorizontalBinOp - add support for target shufflesSimon Pilgrim2019-04-291-30/+49
| | | | | | | | | | | | Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support. This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type Fixes PR39936 and should help with PR39920/PR39921 Differential Revision: https://reviews.llvm.org/D61245 llvm-svn: 359491
* [X86] scaleShuffleMask - avoid potential signed overflow warning.Simon Pilgrim2019-04-291-3/+3
| | | | | | | | | | Use size_t assignment to prevent a bad explicit type conversion warning. Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359479
* Avoid "checking a pointer after dereferencing" warning. NFCI.Simon Pilgrim2019-04-291-1/+1
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359473
* Move if() to newline to stop ambiguity over whether it should be else if. NFCI.Simon Pilgrim2019-04-291-1/+2
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359472
* Fix operator precedence warning. NFCI.Simon Pilgrim2019-04-291-1/+2
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359469
* Remove superfluous break from switch statement. NFCI.Simon Pilgrim2019-04-291-1/+0
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359467
* [AArch64][SVE] Asm: add aliases for unpredicated bitwise logical instructionsCullen Rhodes2019-04-292-4/+14
| | | | | | | | | This patch adds aliases for element sizes .B/.H/.S to the AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts these instructions with all element sizes up to 64-bit (.D). The preferred disassembly is .D. llvm-svn: 359457
* [ARM] Add bitcast/extract_subvec. of fp16 vectorsDiogo N. Sampaio2019-04-291-91/+144
| | | | | | | | | | | | | | | | | | | | Summary: This patch adds some basic operations for fp16 vectors, such as bitcast from fp16 to i16, required to perform extract_subvector (also added here) and extract_element. Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60618 llvm-svn: 359433
* [ARM] Add v4f16 and v8f16 types to the CallingConvDiogo N. Sampaio2019-04-291-18/+18
| | | | | | | | | | | | | | | | | | | | | Summary: The Procedure Call Standard for the Arm Architecture states that float16x4_t and float16x8_t behave just as uint16x4_t and uint16x8_t for argument passing. This patch adds the fp16 vectors to the ARMCallingConv.td file. Reviewers: miyuki, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60720 llvm-svn: 359431
* [X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, ↵Craig Topper2019-04-292-42/+160
| | | | | | | | | | | | | | | | | | | (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax. The 128/256 bit version of these instructions require an 'x' or 'y' suffix to disambiguate the memory form in att syntax. We were allowing the same suffix in intel syntax, but it appears gas does not do that. gas does allow the 'x' and 'y' suffix on register and broadcast forms even though its not needed. We were allowing it on unmasked register form, but not on masked versions or on masked or unmasked broadcast form. While there fix some test coverage holes so they can be extended with the 'x' and 'y' suffix tests. llvm-svn: 359418
* [X86][SSE] combineExtractVectorElt - add early-out to return zero/undef for ↵Simon Pilgrim2019-04-281-2/+4
| | | | | | out-of-range extraction indices. llvm-svn: 359406
* [X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3Simon Pilgrim2019-04-281-0/+22
| | | | | | Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results. llvm-svn: 359400
* [X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity ↵Simon Pilgrim2019-04-281-2/+12
| | | | | | | | | | reduction (PR38840) An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0. Differential Revision: https://reviews.llvm.org/D61230 llvm-svn: 359396
* [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result ↵Craig Topper2019-04-283-70/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392
* Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-12/+10
| | | | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. ........ Causing irregular buildbot failures. llvm-svn: 359391
* [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-10/+12
| | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. llvm-svn: 359389
* [X86][AVX512] Improve vector bool reductionsSimon Pilgrim2019-04-271-11/+22
| | | | | | As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern. llvm-svn: 359386
* [X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332)Simon Pilgrim2019-04-271-4/+44
| | | | | | | | Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector. We can extend this in the future to handle more opcodes and non-zero selections. llvm-svn: 359378
* [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabledCraig Topper2019-04-275-19/+72
| | | | | | | | | | | | | | | | Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-275-128/+17
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-268-361/+696
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extractsJessica Paquette2019-04-261-38/+6
| | | | | | | | | | | | getConstantVRegValWithLookThrough does the same thing as the getConstantValueForReg function, and has more visibility across GISel. Plus, it supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code reuse and more functionality for free by using it. Add some test cases to select-extract-vector-elt.mir to show that we can now look through those instructions. llvm-svn: 359351
* [AsmPrinter] refactor to support %c w/ GlobalAddress'Nick Desaulniers2019-04-2617-78/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337
* [X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has ↵Simon Pilgrim2019-04-261-0/+7
| | | | | | one use llvm-svn: 359332
* [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64Jessica Paquette2019-04-262-1/+38
| | | | | | | | | There are instructions for these, so mark them as legal. Select the correct instruction in AArch64InstructionSelector.cpp. Update select-bswap.mir and arm64-rev.ll to reflect the changes. llvm-svn: 359331
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-264-102/+281
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [X86] Sink NoRegister creation for unused Base/Index registers into ↵Craig Topper2019-04-261-36/+26
| | | | | | getAddressOperands. NFCI llvm-svn: 359318
* [X86] Segment registers should have i16 type not i32.Craig Topper2019-04-261-1/+1
| | | | | | Probably doesn't really matter, but was inconsistent with the rest of the code. llvm-svn: 359317
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-266-154/+605
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-04-261-0/+29
| | | | | | | | | | | The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313
* Fix Wparentheses warning. NFCI.Simon Pilgrim2019-04-261-5/+5
| | | | llvm-svn: 359299
* [X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching ↵Simon Pilgrim2019-04-261-50/+59
| | | | | | | | | | code from LowerVectorAllZeroTest Create a matchBitOpReduction helper that checks for the pattern with any opcode. First step towards reusing this code to recognize other scalar reduction patterns. llvm-svn: 359296
OpenPOWER on IntegriCloud