summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AVX-512] If gather mask is all ones, force the input to a zero vector.Craig Topper2017-03-131-1/+4
| | | | | | | | We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651
* [ARM] GlobalISel: Support SP in regbankselectDiana Picus2017-03-131-0/+1
| | | | | | | We used to hit an unreachable in getRegBankFromRegClass when dealing with the stack pointer. This commit adds support for the GPRsp reg class. llvm-svn: 297621
* [AArch64] Map Sched Read/Write resources for Falkor.Balaram Makam2017-03-131-1/+183
| | | | llvm-svn: 297611
* ARMDisassembler: loop over ARM decode tablesSjoerd Meijer2017-03-131-57/+20
| | | | | | | | | Loop over the ARM decode tables; this is a clean-up to reduce some code duplication. Differential Revision: https://reviews.llvm.org/D30814 llvm-svn: 297608
* [AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so ↵Craig Topper2017-03-131-8/+8
| | | | | | they can be correctly matched by EVEX2VEX table generation. llvm-svn: 297601
* [AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.Craig Topper2017-03-131-3/+2
| | | | llvm-svn: 297600
* [AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.Craig Topper2017-03-131-10/+10
| | | | llvm-svn: 297599
* [X86] Remove unused SDTypeProfile. NFCCraig Topper2017-03-121-2/+0
| | | | llvm-svn: 297594
* [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.Craig Topper2017-03-123-46/+17
| | | | | | This allows us to remove a duplicate set of patterns. llvm-svn: 297593
* [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.Craig Topper2017-03-121-2/+3
| | | | | | The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591
* [x86] don't blindly transform SETB into SBBSanjay Patel2017-03-121-39/+50
| | | | | | | | | | | | | | | | | | | | I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586
* Remove CRC32 instructions from AArch64InstrInfo::hasShiftedRegAzharuddin Mohammed2017-03-121-8/+0
| | | | | | | | | | | | | | | | | | | | | | | Summary: A53 scheduler causes an assertion failure on all CRC instructions: include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. The case statements corresponding to CRC instructions are incorrect and should be removed. Also adding a testcase while on this. Reviewers: t.p.northover, javed.absar, apazos, rengolin Reviewed By: rengolin Subscribers: evandro, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30274 llvm-svn: 297582
* [AVX-512] Fix a bad use of a high GR8 register after copying from a mask ↵Craig Topper2017-03-122-0/+13
| | | | | | | | | | register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask. I'm pretty sure there are more problems lurking here. But I think this fixes PR32241. I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again. llvm-svn: 297574
* [AVX-512] Remove unused field in X86VectorVTInfo tablegen class.Craig Topper2017-03-121-7/+0
| | | | llvm-svn: 297572
* [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)Simon Pilgrim2017-03-111-1/+27
| | | | | | | | | | | | Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568
* Remove unnecessary whitespace.Simon Pilgrim2017-03-111-7/+1
| | | | llvm-svn: 297567
* [X86] Remove unnecessary commented out code. NFCCraig Topper2017-03-111-1/+0
| | | | llvm-svn: 297563
* AMDGPU: Remove packf16 intrinsicMatt Arsenault2017-03-112-7/+0
| | | | llvm-svn: 297557
* AMDGPU: Keep track of modifiers when converting v_mac to v_madMatt Arsenault2017-03-111-4/+10
| | | | | | | | | | | | | | | | Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556
* [AMDGPU] Remove getBidirectionalReasonRankStanislav Mekhanoshin2017-03-111-13/+1
| | | | | | | | | | | | | | This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536
* [RDF] Remove the map of reaching defs from copy propagationKrzysztof Parzyszek2017-03-102-57/+26
| | | | | | Use Liveness::getNearestAliasedRef to find the reaching def instead. llvm-svn: 297526
* [RDF] Implement Liveness::getNearestAliasedRef(Reg, Inst)Krzysztof Parzyszek2017-03-103-4/+66
| | | | | | | | This function will find the closest ref node aliased to Reg that is in an instruction preceding Inst. This could be used to identify the hypothetical reaching def of Reg, if Reg was a member of Inst. llvm-svn: 297524
* [X86][SSE] Fix load folding for (V)CVTDQ2PDSimon Pilgrim2017-03-101-2/+2
| | | | | | This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl llvm-svn: 297523
* [X86] Fix Wunused-lambda-capture warningSimon Pilgrim2017-03-101-2/+2
| | | | llvm-svn: 297521
* Sink accessing TII to fix release Werror builds.Eric Christopher2017-03-101-5/+3
| | | | llvm-svn: 297507
* [AArch64, X86] Additional debug information for MacroFusionEvandro Menezes2017-03-102-7/+19
| | | | | | | | In order to make it easier to parse information about the performance of MacroFusion, this patch adds the function and the instruction names to the debug output of this pass. llvm-svn: 297504
* [AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets ↵Konstantin Zhuravlyov2017-03-106-39/+48
| | | | | | | | for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499
* Rename PT_NOTE namespace name used in AMDGPUPTNote.hYaxun Liu2017-03-103-10/+11
| | | | | | | | Patch by Guansong Zhang. Differential Revision: https://reviews.llvm.org/D30750 llvm-svn: 297498
* [APInt] Add APInt::insertBits() method to insert an APInt into a larger APIntSimon Pilgrim2017-03-102-6/+4
| | | | | | | | | | | | We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458
* [mips][msa] Accept more values for constant splatsSimon Dardis2017-03-102-11/+233
| | | | | | | | | | | | | | | | | This patches teaches the MIPS backend to accept more values for constant splats. Previously, only 10 bit signed immediates or values that could be loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes that constraint so that any constant value that be splatted is accepted. As a result, the constant pool is used less for vector operations, and the suite of bit manipulation instructions b(clr|set|neg)i can now be used with the full range of their immediate operand. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30640 llvm-svn: 297457
* imm_comp_XFORM (defined in ARMInstrThumb.td) duplicates imm_not_XFORM ↵Artyom Skrobov2017-03-102-7/+2
| | | | | | | | | | | | | | (defined in ARMInstrInfo.td) Reviewers: grosbach, rengolin, jmolloy Reviewed By: jmolloy Subscribers: aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D30782 llvm-svn: 297456
* Refactor the multiply-accumulate combines to act onArtyom Skrobov2017-03-102-108/+64
| | | | | | | | | | | | | | | | | ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE]. Summary: This allows for some simplification because the combines are no longer limited to just one go at the node before it gets legalized into an ARM target-specific one. Reviewers: jmolloy, rogfer01 Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30401 llvm-svn: 297453
* For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,Artyom Skrobov2017-03-103-28/+151
| | | | | | | | | | | | same as already done for ARM and Thumb2. Reviewers: jmolloy, rogfer01, efriedma Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D30400 llvm-svn: 297443
* [WebAssembly] Fix the opcode numbers for floating-point le and gt.Dan Gohman2017-03-091-2/+2
| | | | llvm-svn: 297420
* [Hexagon] Fixes to the bitsplit generationKrzysztof Parzyszek2017-03-091-11/+46
| | | | | | | | - Fix the insertion point, which occasionally could have been incorrect. - Avoid creating multiple bitsplits with the same operands, if an old one could be reused. llvm-svn: 297414
* [Hexagon] Refactor the DAG preprocessing code, NFCKrzysztof Parzyszek2017-03-091-32/+88
| | | | | | Extract individual transformations into their own functions. llvm-svn: 297401
* [Hexagon] Add -mhvx option to the Hexagon backendKrzysztof Parzyszek2017-03-091-2/+8
| | | | llvm-svn: 297393
* [Hexagon] Propagate zext of i1 into arithmetic code in selection DAGKrzysztof Parzyszek2017-03-091-0/+96
| | | | | | | (op ... (zext i1 c) ...) -> (select c (op ... 1 ...), (op ... 0 ...)) llvm-svn: 297391
* [X86][SSE] Speed up constant pool shuffle mask decoding with direct copy ↵Simon Pilgrim2017-03-091-7/+27
| | | | | | | | (PR32037). If the constants are already the correct size, we can copy them directly into the shuffle mask. llvm-svn: 297381
* [mips] Revert fixes for PR32020.Simon Dardis2017-03-097-400/+162
| | | | | | | | | | | | | | | The fix introduces segfaults and clobbers the value to be stored when the atomic sequence loops. Revert "[Target/MIPS] Kill dead code, no functional change intended." This reverts commit r296153. Revert "Recommit "[mips] Fix atomic compare and swap at O0."" This reverts commit r296134. llvm-svn: 297380
* [ARM] remove FIXMEs and add vcmp MC testSjoerd Meijer2017-03-091-5/+0
| | | | | | | | | Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for vcmp that was actually missing. Differential Revision: https://reviews.llvm.org/D30745 llvm-svn: 297376
* [mips] Fix return loweringSimon Dardis2017-03-091-3/+12
| | | | | | | | | | | | | Fix a machine verifier issue where a instruction was using a invalid register. The return pseudo is expanded and has the return address register added to it. The return register may have been spuriously mark as killed earlier. This partially resolves PR/27458 Thanks to Quentin Colombet for reporting the issue! llvm-svn: 297372
* AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not ↵Changpeng Fang2017-03-091-0/+4
| | | | | | | | | | | | vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328
* [Hexagon] Use correct offset when extracting from the high wordKrzysztof Parzyszek2017-03-081-0/+1
| | | | | | | | When extracting a bitfield from the high register in a register pair, the final offset should be relative to the high register (for 32-bit extracts). llvm-svn: 297288
* [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()Daniel Cederman2017-03-081-6/+5
| | | | | | | | | | | | | | | | | | Summary: By using reg_nodbg_empty() to determine if a function can be treated as a leaf function or not, we miss the case when the register pair L0_L1 is used but not L0 by itself. This has the effect that use_all_i32_regs(), a test in reserved-regs.ll which tries to use all registers, gets treated as a leaf function. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: davide, RKSimon, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D27089 llvm-svn: 297285
* [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 ↵Simon Pilgrim2017-03-081-8/+7
| | | | | | | | elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267
* Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to ↵Tim Shen2017-03-081-5/+43
| | | | | | | | | | | | | reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256
* [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & ↵Justin Lebar2017-03-081-3/+1
| | | | | | | | | | | | isImageReadWrite calls This is repetition of isImage() function in NVPTXUtilities.cpp. Patch by Briana Grace! Differential Revision: https://reviews.llvm.org/D30706 llvm-svn: 297252
* AMDGPU: Don't wait at end of block with a trivial successorMatt Arsenault2017-03-081-2/+14
| | | | | | | | | | If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251
* AMDGPU: Constant fold rcp nodeMatt Arsenault2017-03-081-2/+12
| | | | | | | When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
OpenPOWER on IntegriCloud