summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* AVX-512: fixed a bug in fp_to_uint pattern on KNLElena Demikhovsky2016-03-291-151/+499
| | | | | | | | | Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701
* [PowerPC] Refactor popcnt[dw] target featuresHal Finkel2016-03-291-0/+2
| | | | | | | | | Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690
* [Codegen] Decrease minimum jump table density.Kyle Butt2016-03-298-30/+111
| | | | | | | | | | | Minimum density for both optsize and non optsize are now options -sparse-jump-table-density (default 10) for non optsize functions -dense-jump-table-density (default 40) for optsize functions, which matches the current default. This improves several benchmarks at google at the cost of a small codesize increase. For code compiled with -Os, the old behavior continues llvm-svn: 264689
* fix checks: *_DAG -> *-DAGSanjay Patel2016-03-282-4/+4
| | | | llvm-svn: 264676
* fix CHECK_NEXT -> CHECK-NEXTSanjay Patel2016-03-282-2/+2
| | | | llvm-svn: 264674
* fix CHECK_DAG -> CHECK-DAGSanjay Patel2016-03-282-4/+4
| | | | llvm-svn: 264673
* fix CHECK_NEXT -> CHECK-NEXTSanjay Patel2016-03-281-4/+4
| | | | llvm-svn: 264672
* fix CHECK_LABEL -> CHECK-LABELSanjay Patel2016-03-281-16/+16
| | | | llvm-svn: 264671
* trailing whitespaceSanjay Patel2016-03-281-315/+315
| | | | llvm-svn: 264670
* [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op ↵Simon Pilgrim2016-03-284-214/+119
| | | | | | | | | | | | | | for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666
* fix CHECK_NEXT -> CHECK-NEXTSanjay Patel2016-03-281-15/+15
| | | | llvm-svn: 264661
* MIRParser: Add %subreg.xxx syntax for subregister index operandsMatthias Braun2016-03-282-0/+58
| | | | | | Differential Revision: http://reviews.llvm.org/D18279 llvm-svn: 264608
* CodeGen: Correct specification of PHI nodesMatthias Braun2016-03-281-2/+2
| | | | | | | | | | They do have a def machine operand. Fixing the definition is necessary for an upcoming patch. Differential Revision: http://reviews.llvm.org/D18384 llvm-svn: 264607
* [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when ↵Haicheng Wu2016-03-281-0/+45
| | | | | | | | optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606
* [PowerPC] On the A2, popcnt[dw] are very slowHal Finkel2016-03-281-4/+18
| | | | | | | | | | | | | | | | The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600
* Introduce MachineFunctionProperties and the AllVRegsAllocated propertyDerek Schuff2016-03-283-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593
* AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructionsTom Stellard2016-03-283-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589
* [Hexagon] Improve handling of unaligned vector loads and storesKrzysztof Parzyszek2016-03-281-0/+31
| | | | llvm-svn: 264584
* [Hexagon] Only use restore functions for single register at -OzKrzysztof Parzyszek2016-03-281-0/+42
| | | | llvm-svn: 264581
* [lanai] Add Lanai backend.Jacques Pienaar2016-03-2813-0/+746
| | | | | | | | | | Add the Lanai backend to lib/Target. General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html). Differential Revision: http://reviews.llvm.org/D17011 llvm-svn: 264578
* AVX-512: Fixed ICMP instruction selection for i1 operandsElena Demikhovsky2016-03-281-21/+111
| | | | | | | | | | ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566
* [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵Hal Finkel2016-03-271-4/+191
| | | | | | | | | | | | | | | | | loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
* [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targetsSimon Pilgrim2016-03-262-352/+106
| | | | | | Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517
* [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targetsSimon Pilgrim2016-03-262-694/+52
| | | | | | | | Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512
* [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectorsSimon Pilgrim2016-03-264-5135/+695
| | | | | | | | Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
* [X86][SSE] Added v64i8 vector integer multiply testsSimon Pilgrim2016-03-261-0/+946
| | | | llvm-svn: 264510
* [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵Simon Pilgrim2016-03-261-18/+7
| | | | | | | | multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
* [PowerPC] Disable the CTR optimization in the presence of {min,max}numDavid Majnemer2016-03-261-0/+44
| | | | | | | | | The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
* [X86][SSE] Refreshed vector integer multiply testsSimon Pilgrim2016-03-261-121/+485
| | | | | | | | Add all 256-bit vector tests. Added AVX512F/AVX512BW test targets. Renamed tests something more meaningful. llvm-svn: 264507
* [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddrDavid Majnemer2016-03-251-0/+29
| | | | | | | | | We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465
* [MachineCopyPropagation] Expose more dead copies across instructions with ↵Jun Bum Lim2016-03-251-0/+67
| | | | | | | | | | | regmasks When encountering instructions with regmasks, instead of cleaning up all the elements in MaybeDeadCopies map, remove only the instructions erased. By keeping more instruction in MaybeDeadCopies, this change will expose more dead copies across instructions with regmasks. llvm-svn: 264462
* Prevent construction of cycle in DAG store mergeNirav Dave2016-03-251-0/+41
| | | | | | | | | | | | | | | | | | | | | When merging stores in DAGCombiner, add check to ensure that no dependenices exist that would cause the construction of a cycle in our DAG. This may happen if one store has a data dependence on another instruction (e.g. a load) which itself has a (chain) dependence on another store being merged. These stores cannot be merged safely and doing so results in a cycle that is discovered in LegalizeDAG. This test is only done in cases where Antialias analysis is used (UseAA) as non-AA store merge candidates will be merged logically after all loads which have been checked to not alias. Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight Subscribers: llvm-commits, tberghammer, danalbert, srhines Differential Revision: http://reviews.llvm.org/D18336 llvm-svn: 264461
* ARM: maintain BB ordering when expanding WIN__DBZCHKSaleem Abdulrasool2016-03-252-25/+64
| | | | | | | | | | | | | It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454
* [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsizeHans Wennborg2016-03-252-1/+89
| | | | | | | | | | | | 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440
* X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)Hans Wennborg2016-03-253-102/+217
| | | | | | | | | This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
* ARM: fix optimised division on WoASaleem Abdulrasool2016-03-251-3/+32
| | | | | | | | | We did not have an explicit branch to the continuation BB. When the check was hoisted, this could permit control follow to fall through into the division trap. Add the explicit branch to the continuation basic block to ensure that code execution is correct. llvm-svn: 264370
* CXX TLS: collect return blocks after SelectAllBasicBlocks.Manman Ren2016-03-241-0/+16
| | | | | | | | | | It is incorrect to get the corresponding MBB for a ReturnInst before SelectAllBasicBlocks since SelectAllBasicBlocks can change the correspondence between a ReturnInst and the MBB it is in. PR27062 llvm-svn: 264358
* Lower varargs correctly in deopt bundle loweringSanjoy Das2016-03-241-0/+17
| | | | | | | Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg. llvm-svn: 264354
* LiveInterval: Fix Distribute() failing on liveranges with unused VNInfosMatthias Braun2016-03-241-0/+53
| | | | | | This fixes http://llvm.org/PR26991 llvm-svn: 264345
* Finish the incomplete 'd' inline asm constraint support for PPC byEric Christopher2016-03-241-1/+31
| | | | | | making sure we give it a register and mark it as a register constraint. llvm-svn: 264340
* Reorder check lines, comments in test and remove unnecessary IR.Eric Christopher2016-03-241-17/+15
| | | | llvm-svn: 264339
* Match call and target calling conventions in testSanjoy Das2016-03-241-3/+4
| | | | | | Fixes an issue in rL264329. llvm-svn: 264337
* Add lowering support for llvm.experimental.deoptimizeSanjoy Das2016-03-241-0/+86
| | | | | | | | | | | | | | | Summary: Only adds support for "naked" calls to llvm.experimental.deoptimize. Support for round-tripping through RewriteStatepointsForGC will come as a separate patch (should be simpler than this one). Reviewers: reames Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18429 llvm-svn: 264329
* [Hexagon] Add support for run-time stack overflow checkingKrzysztof Parzyszek2016-03-241-0/+44
| | | | | | Patch by Sundeep Kushwaha. llvm-svn: 264328
* [Hexagon] Generate PIC-specific versions of save/restore routinesKrzysztof Parzyszek2016-03-241-0/+69
| | | | | | | | | | | | In PIC mode, the registers R14, R15 and R28 are reserved for use by the PLT handling code. This causes all functions to clobber these registers. While this is not new for regular function calls, it does also apply to save/restore functions, which do not follow the standard ABI conventions with respect to the volatile/non-volatile registers. Patch by Jyotsna Verma. llvm-svn: 264324
* [Statepoints] Fix yet another issue around gc pointer uniqueingSanjoy Das2016-03-241-2/+13
| | | | | | | | | | | | | Given that StatepointLowering now uniques derived pointers before putting them in the per-statepoint spill map, we may end up with missing entries for derived pointers when we visit a gc.relocate on a pointer that was de-duplicated away. Fix this by keeping two maps, one mapping gc pointers to their de-duplicated values, and one mapping a de-duplicated value to the slot it is spilled in. llvm-svn: 264320
* Remove unnecessary redirect from testSanjoy Das2016-03-241-2/+2
| | | | llvm-svn: 264308
* AVX-512: Generate KTEST instead of TEST fir i1 vectorsElena Demikhovsky2016-03-241-2/+111
| | | | | | | | | | | | KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298
* CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS.Tim Northover2016-03-241-2/+52
| | | | | | | | | | | | | If the operation's type has been promoted during type legalization, we need to account for the fact that the high bits of the comparison operand are likely unspecified. The LHS is usually zero-extended, but MIPS sign extends it, so we have to be slightly careful. Patch by Simon Dardis. llvm-svn: 264296
* Remove unsafe AssertZext after promoting result of FP_TO_FP16Pirama Arumuga Nainar2016-03-241-0/+12
| | | | | | | | | | | | | | | Summary: Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32 instruction, do not guarantee that the top 16 bits are zeroed out. Remove the unsafe AssertZext and add tests to exercise this. Reviewers: jmolloy, sbaranga, kristof.beyls, aadg Subscribers: llvm-commits, srhines, aemerson Differential Revision: http://reviews.llvm.org/D18426 llvm-svn: 264285
OpenPOWER on IntegriCloud