summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW ↵Simon Pilgrim2019-05-111-1/+6
| | | | | | | | | | SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535
* [CostModel][X86] Add min/max reduction costs for all SSE targetsSimon Pilgrim2019-05-111-6/+90
| | | | | | | | The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528
* [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.Simon Pilgrim2019-05-111-0/+45
| | | | | | Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526
* FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI.Simon Pilgrim2019-05-111-0/+2
| | | | | | Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525
* [X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel ↵Craig Topper2019-05-113-4/+14
| | | | | | | | to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524
* [X86] Sink some fast isel code into the only if that uses it. NFCCraig Topper2019-05-111-13/+13
| | | | llvm-svn: 360523
* [X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCICraig Topper2019-05-111-16/+7
| | | | llvm-svn: 360522
* HexagonConstEvaluator::evaluateHexExt - check incoming opcodes. NFCI.Simon Pilgrim2019-05-111-0/+2
| | | | | | Only certain extension opcodes are supported - fixes scan build warning. llvm-svn: 360520
* [X86] Use getRegClassFor to simplify some code in fast isel. NFCICraig Topper2019-05-111-40/+18
| | | | | | | No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513
* [X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.Craig Topper2019-05-111-1/+1
| | | | | | | | We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512
* [X86] Add a test case for idempotent atomic operations with speculative load ↵Craig Topper2019-05-111-1/+3
| | | | | | | | hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511
* [SystemZ] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1112-36/+11
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360510
* [Sparc] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1111-34/+9
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360506
* [RISCV] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-33/+8
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure llvm-svn: 360505
* [PowerPC] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-34/+9
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360502
* [NVPTX] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-34/+9
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360500
* [MSP430] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-33/+8
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360498
* [Mips] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1113-37/+11
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360497
* [Lanai] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-35/+9
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360496
* [BPF] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-34/+8
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360494
* [AVR] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-36/+6
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360493
* [ARM] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1112-36/+11
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360490
* [ARC] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1110-31/+5
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360488
* [AMDGPU] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1111-37/+11
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360487
* [AArch64] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1010-39/+9
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360486
* [XCore] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1010-34/+8
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360485
* [X86] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1022-58/+33
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484
* Factor out redzone ABI checks [NFCI]Philip Reames2019-05-104-5/+18
| | | | | | | | | | As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479
* [X86] Disable speculative load hardening for operations with an explicit RSP ↵Craig Topper2019-05-101-0/+8
| | | | | | | | | | | | | | base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475
* Skip over prefetchesMircea Trofin2019-05-101-0/+16
| | | | | | | | | | | | | | | | Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471
* [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Robert Lougher2019-05-101-2/+4
| | | | | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436
* [X86][SSE] Add getHopForBuildVector vector splittingSimon Pilgrim2019-05-101-0/+16
| | | | | | | | If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435
* [PowerPC] custom lower `v2f64 fpext v2f32`Lei Huang2019-05-103-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429
* [WebAssembly] Don't assume that strongly defined symbols are DSO-localSam Clegg2019-05-101-3/+3
| | | | | | | | | | | | | | | | The current PIC model for WebAssembly is more like ELF in that it allows symbol interposition. This means that more functions end up being addressed via the GOT and fewer directly added to the wasm table. One effect is a reduction in the number of wasm table entries similar to the previous attempt in https://reviews.llvm.org/D61539 which was reverted. Differential Revision: https://reviews.llvm.org/D61772 llvm-svn: 360402
* [WebAssembly] Remove friend18.C from list of known gcc torture test ↵Sam Clegg2019-05-101-1/+0
| | | | | | | | failures. NFC. Differential Revision: https://reviews.llvm.org/D61775 llvm-svn: 360401
* [llvm] X86DiscriminateMemOps: insert debug info when missingMircea Trofin2019-05-101-2/+3
| | | | | | | | | | | | | | Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396
* [AMDGPU] Pattern for v_xor3_b32Stanislav Mekhanoshin2019-05-101-1/+4
| | | | | | | | | This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395
* [X86] Improve lowering of idemptotent RMW operationsPhilip Reames2019-05-091-1/+88
| | | | | | | | The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393
* Add ".dword" directiveBill Wendling2019-05-091-3/+5
| | | | | | | | | | | | | | | | | | Summary: The ".dword" directive is a synonym for ".xword" and is used used by klibc, a minimalistic libc subset for initramfs. Reviewers: t.p.northover, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: nickdesaulniers, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61719 llvm-svn: 360381
* [AMDGPU] gfx1010 v_interp_* instructionsStanislav Mekhanoshin2019-05-091-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D61703 llvm-svn: 360364
* [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)Simon Pilgrim2019-05-091-2/+3
| | | | | | | | | | As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360
* [AMDGPU] gfx1010 changes for PAL metadataStanislav Mekhanoshin2019-05-091-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D61704 llvm-svn: 360353
* [X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput)Roman Lebedev2019-05-091-209/+303
| | | | | | | | | | | | | | | | I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this *slightly* (< -1%) improves perf. llvm-svn: 360341
* [ARM][CGP] Guard against signext args and sitofpSam Parker2019-05-091-10/+12
| | | | | | | | | | Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331
* [ARM GlobalISel] Map DBG_VALUE for types != s32Diana Picus2019-05-091-2/+8
| | | | | | | | ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321
* X86WinAllocaExpander: Drop code looking through register copies (PR41786)Hans Wennborg2019-05-091-16/+4
| | | | | | | | | | | | | This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320
* AMDGPU: Mark scheduler classes as finalMatt Arsenault2019-05-081-2/+2
| | | | llvm-svn: 360294
* AMDGPU: Select VOP3 form of addMatt Arsenault2019-05-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293
* [AMDGPU] gfx1010 exp modificationsStanislav Mekhanoshin2019-05-083-2/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287
* AMDGPU: Fix a mis-placed bracketChangpeng Fang2019-05-081-1/+1
| | | | | | | Differential Revision: https://reviews.llvm.org/D61430 llvm-svn: 360283
OpenPOWER on IntegriCloud