summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AVX-512: Truncating store for i1 vectorsElena Demikhovsky2016-04-041-1/+62
| | | | | | | | | Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283
* [X86] Removed duplicate code.Simon Pilgrim2016-04-031-5/+5
| | | | llvm-svn: 265274
* [X86][SSE] Support for MOVMSK signbit extraction instructionsSimon Pilgrim2016-04-035-45/+32
| | | | | | | | | | Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266
* [X86] Tidied up X86ISD instruction nodes. NFCI.Simon Pilgrim2016-04-031-50/+59
| | | | | | | | Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263
* AVX-512: Load and Extended Load for i1 vectorsElena Demikhovsky2016-04-032-10/+122
| | | | | | | | | | Implemented load+{sign|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259
* [lanai] Fix for LanaiDelaySlotFiller and LanaiMCInstLower.cppJacques Pienaar2016-04-033-104/+87
| | | | | | | | | | | | | Summary: * Fix to stop delay slot filler from inserting SP modifying instructions in the newly expanded call/return instructions. * In LowerSymbol the outermost type was not LanaiMCExpr if there was a binary expression * Remove printExpr in LanaiInstPrinter Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18734 llvm-svn: 265251
* [mips][microMIPS] Revert commits r264245 and r264248.Zoran Jovanovic2016-04-0211-106/+51
| | | | | | | Commit r264245 was the reason for failing tests in LLVM test suite. Commit r264248 depends on the first one. llvm-svn: 265249
* AArch64: support .cpu directiveSaleem Abdulrasool2016-04-021-0/+72
| | | | | | | | | | | | | | | | Add support for the AArch64 .cpu directive. This is a slightly involved directive since the parameter is actually a variable encoded string. The general structure is: <cpu>[[+-]<feature>]* We now map some of the supported string names for features for internal representation of feature flags. If we encounter one which we do not support, bail out as we cannot validate the assembly any longer. Resolves PR27010. llvm-svn: 265240
* AArch64: avoid clobbering SP for dead MOVimm pseudos.Tim Northover2016-04-013-2/+13
| | | | | | | | We were producing ORR, which actually defines a GPR32sp rather than a GPR32. Should fix PR23209. llvm-svn: 265198
* Remove useless check for ThreadModel==Single in ARMISelLowering. NFC.James Y Knight2016-04-011-7/+3
| | | | | | | | | | | ThreadModel::Single is already handled already by ARMPassConfig adding LowerAtomicPass to the pass list, which lowers all atomics to non-atomic ops and deletes fences. So by the time we get to ISel, there's no atomic fences left, so they don't need special handling. llvm-svn: 265178
* AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}Tom Stellard2016-04-018-3/+117
| | | | | | | | | | | | | | | | | Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-1/+2
| | | | | | | | | | | | | | | | | Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161
* [AArch64] Fix a typo. NFC.Chad Rosier2016-04-011-1/+1
| | | | llvm-svn: 265160
* [x86] avoid intermediate splat for non-zero memsets (PR27100)Sanjay Patel2016-04-011-6/+8
| | | | | | | | | | | | | | | | | | | | Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148
* [AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields.Valery Pykhtin2016-04-012-8/+8
| | | | | | | | $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141
* [x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type.Andrea Di Biagio2016-04-011-1/+0
| | | | | | | Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138
* [MIPS][LLVM-MC] Fix JR encoding for MIPSR6 ISASagar Thakur2016-04-011-1/+1
| | | | | | | | | | Summary: The assembler was picking the wrong JR variant because the pre-R6 one was still enabled at R6. Author: nitesh.jain Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits, mohit.bhakkad, sagar, bhushan, jaydeep Differential: D18387 llvm-svn: 265134
* [X86] Introduce Lakemont CPU.Andrey Turetskiy2016-04-011-0/+3
| | | | | | | | Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128
* Fix for pr24346: arm asm label calculation error in subJames Molloy2016-04-013-6/+16
| | | | | | | | | | | | | | | | | | | | | | Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255) and a 4-bit rotation (0-30, even) in its least significant 12 bits. The original fixup, FK_Data_4, patches the instruction by the value bit-to-bit, regardless of the encoding. For example, assuming the label L1 and L2 are 0x0 and 0x104 respectively, the following instruction: add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260 would be assembled to the following, which adds 1 to r0, instead of 260: e2800104 add r0, r0, #4, 2 ; equivalently 1 The new fixup kind fixup_arm_mod_imm takes care of the encoding: e2800f41 add r0, r0, #260 Patch by Ting-Yuan Huang! llvm-svn: 265122
* [AArch64] Better errors for out-of-range fixupsOliver Stannard2016-04-011-24/+45
| | | | | | | | | When a fixup that can be resolved by the assembler is out of range, we should report an error in the source, rather than crashing. Differential Revision: http://reviews.llvm.org/D18402 llvm-svn: 265120
* [PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the ↵Chuang-Yu Cheng2016-04-012-26/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tail call branch instruction might disappear Bug Pattern: # BB#0: # %entry cmpldi 3, 0 beq- 0, .LBB0_2 # BB#1: # %exit lwz 4, 0(3) #TC_RETURNd8 LVComputationKind 0 .LBB0_2: # %cond.false mflr 0 std 0, 16(1) stdu 1, -96(1) .Ltmp0: .cfi_def_cfa_offset 96 .Ltmp1: .cfi_offset lr, 16 bl __assert_fail nop The branch instruction for tail call return is not generated, because the shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit block is not sent to emitEpilogue, that's why the branch is not generated. Thanks Kit's opinions! Reviewers: nemanjai hfinkel tjablin kbarton http://reviews.llvm.org/D17606 llvm-svn: 265112
* Use range-based for loops. NFC.Michael Kuperstein2016-04-011-6/+5
| | | | llvm-svn: 265105
* AArch64ISelLowering: Remove unused variables/arguments; NFCMatthias Braun2016-04-012-5/+1
| | | | llvm-svn: 265098
* [NVPTX] Add a truncate DAG node to some calls.Justin Lebar2016-04-011-2/+10
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we were running afoul of the assertion EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!" in SelectionDAGBuilder.cpp when running the NVPTX/i8-param.ll test. This is because our backend (for some reason) treats small return values as i32, but it wasn't ever truncating the i32 back down to the expected width in the DAG. Unclear to me whether this fixes any actual bugs -- in this test, at least, the generated code is unchanged. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: http://reviews.llvm.org/D17872 llvm-svn: 265091
* [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.Justin Lebar2016-04-011-7/+17
| | | | | | | | | | | | | | | | | | | Summary: Previously the NVVMReflect pass would read its configuration from command-line flags or a static configuration given to the pass at instantiation time. This doesn't quite work for clang's use-case. It needs to pass a value for __CUDA_FTZ down on a per-module basis. We use a module flag for this, so the NVVMReflect pass needs to be updated to read said flag. Reviewers: tra, rnk Subscribers: cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18672 llvm-svn: 265090
* [NVPTX] Annotate some instructions as hasSideEffects = 0.Justin Lebar2016-04-012-146/+171
| | | | | | | | | | | | | | | | | | | | Summary: Tablegen tries to infer this from the selection DAG patterns defined for the instructions, but it can't always. An instructive example is CLZr64. CLZr32 is correctly inferred to have no side-effects, but the selection DAG pattern for CLZr64 is slightly more complicated, and in particular the ctlz DAG node is not at the root of the pattern. Thus tablegen can't infer that CLZr64 has no side-effects. Reviewers: jholewinski Subscribers: jholewinski, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17472 llvm-svn: 265089
* Follow-up to r265036: I got these iterators mixed upHans Wennborg2016-03-311-2/+2
| | | | llvm-svn: 265076
* [AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth()Jun Bum Lim2016-03-311-1/+1
| | | | | | | | | | | | | | Summary: This change will allow loads with imp-def to be clustered in machine-scheduler pass. areMemAccessesTriviallyDisjoint() can also handle loads with imp-def. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18665 llvm-svn: 265051
* [PowerPC] Add a late MI-level pass for QPX load/splat simplificationHal Finkel2016-03-315-4/+170
| | | | | | | | | | | | | | | Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047
* Revert r265039 "[X86] Merge adjacent stack adjustments in ↵Hans Wennborg2016-03-311-19/+12
| | | | | | | | | | eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046
* [ARM] Expand v1i64 and v2i64 ctpop.Benjamin Kramer2016-03-311-0/+2
| | | | | | | The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040
* [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr ↵Hans Wennborg2016-03-311-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039
* Change eliminateCallFramePseudoInstr() to return an iteratorHans Wennborg2016-03-3130-79/+73
| | | | | | | | | | | | | | | | | | | | | This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036
* [lanai] isBrImm should accept any non-constant immediate.Jacques Pienaar2016-03-311-17/+6
| | | | | | | | isBrImm should accept any non-constant immediate. Previously it was only accepting LanaiMCExpr ones which was wrong. Differential Revision: http://reviews.llvm.org/D18571 llvm-svn: 265032
* [PPC] basic support for Power 9 direct move instructionsEhsan Amiri2016-03-311-2/+17
| | | | | | | | http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions llvm-svn: 265031
* [x86] use SSE/AVX ops for non-zero memsets (PR27100)Sanjay Patel2016-03-311-5/+7
| | | | | | | | | | | | | | Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029
* [PowerPC] Correctly compute 64-bit offsets in fast iselUlrich Weigand2016-03-311-6/+5
| | | | | | | | | | | | | | | | | | | | | PPCSimplifyAddress contains this code: IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context) : Type::getInt64Ty(*Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, *not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually *always* MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023
* [PowerPC] Basic support for P9 atomic loads and storesNemanja Ivanovic2016-03-317-0/+66
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022
* [AArch64] Handle missing store pair opportunityJun Bum Lim2016-03-311-22/+23
| | | | | | | | | | | | | | | | | | | | Summary: This change will handle missing store pair opportunity where the first store instruction stores zero followed by the non-zero store. For example, this change will convert : str wzr, [x8] str w1, [x8, #4] into: stp wzr, w1, [x8] Reviewers: jmolloy, t.p.northover, mcrosier Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18570 llvm-svn: 265021
* [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast iselUlrich Weigand2016-03-313-20/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct *if and only if* the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020
* [mips] Range check simm16Daniel Sanders2016-03-314-35/+68
| | | | | | | | | | | | | | | | | | | Summary: There are too many instructions to exhaustively test so addiu and lwc2 are used as representative examples. It should be noted that many memory instructions that should have simm16 range checking do not because it is also necessary to support the macro of the same name which accepts simm32. The range checks for these occur in the macro expansion. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18437 llvm-svn: 265019
* [mips] Range check simm11 and mem_simm11.Daniel Sanders2016-03-312-6/+15
| | | | | | | | | | | | | | | | | | Summary: ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem is that they don't trigger the custom parser because all the candidates are disabled by feature bits. On all other subtargets, the diagnostics are accurate but are subject to the usual issues of needing to report multiple ways to correct the code (e.g. smaller offset, enable a CPU feature) but only being able to report one error. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18436 llvm-svn: 265018
* [AMDGPU] Disassembler: support for DPPSam Kolton2016-03-312-7/+23
| | | | | Review: http://reviews.llvm.org/D18642 llvm-svn: 265015
* [mips] Split mem_msa into range checked mem_simm10 and mem_simm10_lsl[123]Daniel Sanders2016-03-315-65/+99
| | | | | | | | | | | | | | Summary: Also, made test_mi10.s formatting consistent with the majority of the MC tests. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18435 llvm-svn: 265014
* Prevent X86ISelLowering from merging volatile loadsNirav Dave2016-03-311-7/+7
| | | | | | | | | | | | | Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013
* [mips] Range check simm9 and fix a bug this revealed.Daniel Sanders2016-03-314-12/+21
| | | | | | | | | | | | | | Summary: The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a 12-bit offset when it is only 9-bit. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18434 llvm-svn: 265010
* [mips][microMIPS] Implement MFC*, MFHC* and DMFC* instructionsZlatko Buljan2016-03-317-15/+101
| | | | | | Differential Revision: http://reviews.llvm.org/D17334 llvm-svn: 265002
* Indentation fix in SystemZInstrInfo.cppJonas Paulsson2016-03-311-2/+2
| | | | llvm-svn: 265000
* [X86] Use MVT instead of EVT in code called after legalization.Craig Topper2016-03-311-3/+3
| | | | llvm-svn: 264992
* [PowerPC] Load two floats directly instead of using one 64-bit integer loadHal Finkel2016-03-311-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988
OpenPOWER on IntegriCloud