summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433)Simon Pilgrim2019-05-131-7/+2
| | | | | | | | | | Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594
* [X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709)Simon Pilgrim2019-05-131-1/+6
| | | | | | Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway. llvm-svn: 360588
* Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits ↵Craig Topper2019-05-132-0/+20
| | | | | | | | | | | | | | | | | | | | bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552
* [X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded.Simon Pilgrim2019-05-121-0/+12
| | | | | | Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543
* [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW ↵Simon Pilgrim2019-05-111-1/+6
| | | | | | | | | | SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535
* [CostModel][X86] Add min/max reduction costs for all SSE targetsSimon Pilgrim2019-05-111-6/+90
| | | | | | | | The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528
* [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.Simon Pilgrim2019-05-111-0/+45
| | | | | | Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526
* FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI.Simon Pilgrim2019-05-111-0/+2
| | | | | | Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525
* [X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel ↵Craig Topper2019-05-113-4/+14
| | | | | | | | to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524
* [X86] Sink some fast isel code into the only if that uses it. NFCCraig Topper2019-05-111-13/+13
| | | | llvm-svn: 360523
* [X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCICraig Topper2019-05-111-16/+7
| | | | llvm-svn: 360522
* [X86] Use getRegClassFor to simplify some code in fast isel. NFCICraig Topper2019-05-111-40/+18
| | | | | | | No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513
* [X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.Craig Topper2019-05-111-1/+1
| | | | | | | | We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512
* [X86] Add a test case for idempotent atomic operations with speculative load ↵Craig Topper2019-05-111-1/+3
| | | | | | | | hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511
* [X86] Move InstPrinter files to MCTargetDesc. NFCRichard Trieu2019-05-1022-58/+33
| | | | | | | | | For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484
* Factor out redzone ABI checks [NFCI]Philip Reames2019-05-104-5/+18
| | | | | | | | | | As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479
* [X86] Disable speculative load hardening for operations with an explicit RSP ↵Craig Topper2019-05-101-0/+8
| | | | | | | | | | | | | | base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475
* Skip over prefetchesMircea Trofin2019-05-101-0/+16
| | | | | | | | | | | | | | | | Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471
* [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Robert Lougher2019-05-101-2/+4
| | | | | | | | | | | | | | | | | | Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436
* [X86][SSE] Add getHopForBuildVector vector splittingSimon Pilgrim2019-05-101-0/+16
| | | | | | | | If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435
* [llvm] X86DiscriminateMemOps: insert debug info when missingMircea Trofin2019-05-101-2/+3
| | | | | | | | | | | | | | Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396
* [X86] Improve lowering of idemptotent RMW operationsPhilip Reames2019-05-091-1/+88
| | | | | | | | The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393
* [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)Simon Pilgrim2019-05-091-2/+3
| | | | | | | | | | As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360
* [X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput)Roman Lebedev2019-05-091-209/+303
| | | | | | | | | | | | | | | | I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this *slightly* (< -1%) improves perf. llvm-svn: 360341
* X86WinAllocaExpander: Drop code looking through register copies (PR41786)Hans Wennborg2019-05-091-16/+4
| | | | | | | | | | | | | This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320
* [COFF] Use COFF stubs for extern_weak functionsReid Kleckner2019-05-073-11/+15
| | | | | | | | | | | | | | | | | | | | | | | Summary: A COFF stub indirects the reference to a symbol through memory. A .refptr.$sym global variable pointer is created to refer to $sym. Typically mingw uses these for external global variable declarations, but we can use them for weak function declarations as well. Updates the dso_local classification to add a special case for extern_weak symbols on COFF in both clang and LLVM. Fixes PR37598 Reviewers: smeenai, mstorsjo Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61615 llvm-svn: 360207
* Make sure that the DAG combiner doesn't merge stores that we explicitlyEric Christopher2019-05-071-9/+15
| | | | | | | asked not be greater than preferred vector width for the vectorizer. Test for both 128 and 256 with a skylake architecture. llvm-svn: 360183
* Fix local shadow variable warning. NFCI.Simon Pilgrim2019-05-071-2/+2
| | | | llvm-svn: 360157
* [X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773)Simon Pilgrim2019-05-071-0/+24
| | | | | | Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining. llvm-svn: 360134
* Fixed "Value stored to 'Opc' is never read" warning. NFCI.Simon Pilgrim2019-05-071-1/+1
| | | | llvm-svn: 360133
* [X86] Reduce scope of variables where possible. NFCI.Simon Pilgrim2019-05-073-10/+4
| | | | | | Fixes cppcheck warnings. llvm-svn: 360131
* [X86] Use extended vector register classes in getRegForInlineAsmConstraint ↵Craig Topper2019-05-061-6/+6
| | | | | | | | | | | | | | to support x/y/zmm16-31 when the type is mismatched. The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch. If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller. To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32. Differential Revision: https://reviews.llvm.org/D61457 llvm-svn: 360102
* [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly ↵Craig Topper2019-05-062-51/+80
| | | | | | | | | | | | | | printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085
* Revert r359392 and r358887Craig Topper2019-05-063-22/+70
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* Modernize repmovsb implementation of x86 memcpy and allow runtime sizes.Guillaume Chatelet2019-05-061-97/+118
| | | | | | | | | | | | | | | | | | | | Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61593 Fix typo. Turn this patch into an NFC. Addressing comments llvm-svn: 360050
* [X86] Fix uninitialized members in constructor warnings. NFCI.Simon Pilgrim2019-05-062-3/+3
| | | | | | Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning. llvm-svn: 360047
* X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI.Simon Pilgrim2019-05-061-1/+1
| | | | | | findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this. llvm-svn: 360037
* [X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning.Simon Pilgrim2019-05-061-1/+2
| | | | | | scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there). llvm-svn: 360028
* [X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle ↵Simon Pilgrim2019-05-061-9/+8
| | | | | | | | indices. NFCI. Fixes cppcheck local shadow warning as well. llvm-svn: 360027
* Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper LakeLuo, Yuanke2019-05-068-0/+210
| | | | | | | | | | | | | | | | | | | | | | | | Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017
* [X86] Pull out repeated Subtarget feature tests. NFCI.Simon Pilgrim2019-05-051-12/+11
| | | | | | Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc llvm-svn: 360001
* [TTI][X86] Make getAddressComputationCost cost value const. NFCI.Simon Pilgrim2019-05-051-1/+1
| | | | llvm-svn: 359999
* Move getOpcode() call into if statement. NFCI.Simon Pilgrim2019-05-051-2/+1
| | | | | | Avoids a cppcheck "Local variable name shadows outer variable" warning. llvm-svn: 359991
* [X86] Make X86RegisterInfo(const Triple &TT) constructor explicit.Simon Pilgrim2019-05-051-1/+1
| | | | | | Fixes cppcheck warning. llvm-svn: 359981
* [X86] Fix some cppcheck "Local variable name shadows outer variable" ↵Simon Pilgrim2019-05-051-44/+42
| | | | | | warnings. NFCI. llvm-svn: 359976
* [X86] Allow assembly parser to accept x/y/z suffixes on non-memory ↵Craig Topper2019-05-031-5/+26
| | | | | | | | | | | | vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903
* [X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI.Simon Pilgrim2019-05-031-15/+7
| | | | | | Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901
* [X86] Remove repeated variables. NFCI.Simon Pilgrim2019-05-031-2/+0
| | | | llvm-svn: 359889
* Avoid cppcheck operator precedence warnings. NFCI.Simon Pilgrim2019-05-032-2/+2
| | | | | | Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884
* [X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI.Simon Pilgrim2019-05-031-5/+2
| | | | | | Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871
OpenPOWER on IntegriCloud