summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* R600/SI: Remove explicit m0 operand from DS instructionsTom Stellard2015-05-122-2/+2
| | | | | | | Instead add m0 as an implicit operand. This helps avoid spills of the m0 register in some cases. llvm-svn: 237141
* R600/SI: Make sendmsg test more strictTom Stellard2015-05-121-0/+2
| | | | | | We want to make sure that the m0 copies are being cse'd. llvm-svn: 237134
* AVX-512, X86: Added lowering for shift operations for SKX.Elena Demikhovsky2015-05-121-0/+29
| | | | | | | | The other changes in the LowerShift() are not functional, just to make the code more convenient. So, the functional changes for SKX only. llvm-svn: 237129
* [ARM] Use AEABI aligned function variantsJohn Brawn2015-05-121-31/+82
| | | | | | | | | | | AEABI defines aligned variants of memcpy etc. that can be faster than the default version due to not having to do alignment checks. When emitting target code for these functions make use of these aligned variants if possible. Also convert memset to memclr if possible. Differential Revision: http://reviews.llvm.org/D8060 llvm-svn: 237127
* Reverse ordering of base and derived pointer during safepoint lowering.Igor Laevsky2015-05-122-21/+136
| | | | | | | | According to the documentation in StackMap section for the safepoint we should have: "The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated." But before this change we emitted them in reverse order - derived pointer first, base pointer second. llvm-svn: 237126
* [mips][FastISel] Handle calls with non legal types i8 and i16.Vasileios Kalintiris2015-05-121-0/+184
| | | | | | | | | | | | | | | | | | Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel. Based on a patch by Reed Kotler. Test Plan: "Make check" test forthcoming. Test-suite passes at O0/O2 and with mips32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6770 llvm-svn: 237121
* [mips][FastISel] Simplify callabi.ll by using multiple check prefixes.Vasileios Kalintiris2015-05-121-397/+274
| | | | | | | | | | Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9635 llvm-svn: 237119
* [mips][FastISel] Allow computation of addresses from constant expressions.Vasileios Kalintiris2015-05-121-0/+18
| | | | | | | | | | | | | | | | | | | Summary: Try to compute addresses when the offset from a memory location is a constant expression. Based on a patch by Reed Kotler. Test Plan: Passes test-suite for -O0/O2 and mips 32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D6767 llvm-svn: 237117
* AVX-512: select operation for i1 vectorsElena Demikhovsky2015-05-121-0/+75
| | | | | | | | like: select i1 %cond, <16 x i1> %a, <16 x i1> %b. I added pseudo-CMOV patterns to resolve the "select". Added tests for KNL and SKX. llvm-svn: 237106
* [X86] DAGCombine should not assume arbitrary vector types are simpleMichael Kuperstein2015-05-121-0/+11
| | | | | | | | | The X86-specific DAGCombine for stores should not assume vector types are always simple. This fixes PR23476. Differential Revision: http://reviews.llvm.org/D9659 llvm-svn: 237097
* Migrate existing backends that care about software floating pointEric Christopher2015-05-125-10/+14
| | | | | | | | | | | | | | | | | | | | to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079
* [WinEH] Handle nested landing pads that return directly to the parent function.Andrew Kaylor2015-05-115-5/+219
| | | | | | Differential Revision: http://reviews.llvm.org/D9684 llvm-svn: 237063
* [WinEH] Update exception numbering to give handlers their own base state.Andrew Kaylor2015-05-114-3/+300
| | | | | | Differential Revision: http://reviews.llvm.org/D9512 llvm-svn: 237014
* [X86] Updates to X86 backend for f16 promotionPirama Arumuga Nainar2015-05-111-7/+201
| | | | | | | | | | | | | | | | | | | | | | | Summary: r235215 adds support for f16 to be considered as a load/store type and promote f16 operations to f32. This patch has miscellaneous fixes for the X86 backend so all f16 operations are handled: 1. Set loadextaction for f16 vectors to expand. 2. Handle FP_EXTEND in a switch statement when handling v2f32 3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or (store (SINT_TO_FP )) to a FILD. Tests included. Reviewers: ab, srhines, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9092 llvm-svn: 237004
* AVX-512: Changed CC parameter in "cmp" intrinsicElena Demikhovsky2015-05-114-408/+408
| | | | | | | | from i8 to i32 according to the Intel Spec by Igor Breger (igor.breger@intel.com) llvm-svn: 236979
* AVX-512: Added SKX instructions and intrinsics:Elena Demikhovsky2015-05-113-30/+635
| | | | | | | | {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971
* AVX-512: fixed UINT_TO_FP operation for 512-bit types.Elena Demikhovsky2015-05-101-0/+17
| | | | llvm-svn: 236955
* AVX-512: fixed a bug in i1 vectors loweringElena Demikhovsky2015-05-101-1/+50
| | | | llvm-svn: 236947
* llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/NAKAMURA Takumi2015-05-091-1/+1
| | | | llvm-svn: 236928
* Fix MergeConsecutiveStore for non-byte-sized memory accesses.James Y Knight2015-05-091-0/+15
| | | | | | | | | | | | | | | | | | | The bug showed up as a compile-time assertion failure: Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed when building msan tests on x86-64. Prior to r236850, this bug was masked due to a bogus alignment check, which also accidentally rejected non-byte-sized accesses. Afterwards, an invalid ElementSizeBytes == 0 got further into the function, and triggered the assertion failure. It would probably be a good idea to allow it to handle merging stores of unusual widths as well, but for now, to un-break it, I'm just making the minimal fix. Differential Revision: http://reviews.llvm.org/D9626 llvm-svn: 236927
* [Fast-ISel] Don't mark the first use of a remat constant as killed.Pete Cooper2015-05-091-0/+29
| | | | | | | | | | | | | | | | | | | When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill. However, rematerialised constant expressions aren't generated bottom up. The local value save area grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the first will kill it, then the second, which is *lower* in the BB will read a killed register. This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset. Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions. However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely. llvm-svn: 236922
* ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not ↵Arnold Schwaighofer2015-05-084-13/+55
| | | | | | | | | | | | | | | | | | | | non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916
* Switch lowering: cluster adjacent fall-through cases even at -O0Hans Wennborg2015-05-081-3/+12
| | | | | | | It's cheap to do, and codegen is much faster if cases can be merged into clusters. llvm-svn: 236905
* [Fast-ISel] Clear kill flags on registers replaced by updateValueMap.Pete Cooper2015-05-081-0/+24
| | | | | | | | | | When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,. However, its possible that the source register we are rewriting *to* to also have uses. If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails. This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register. llvm-svn: 236897
* [Hexagon] Generate more hardware loopsBrendon Cahoon2015-05-087-89/+450
| | | | | | | | | Refactored parts of the hardware loop pass to generate more. Also, added more tests. Differential Revision: http://reviews.llvm.org/D9568 llvm-svn: 236896
* [X86] Fast-ISel was incorrectly always killing the source of a truncate.Pete Cooper2015-05-081-0/+40
| | | | | | | | | | | A trunc from i32 to i1 on x86_64 generates an instruction such as %vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9 However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one. Otherwise, we are killing the source of the truncate which could be used later in the program. llvm-svn: 236890
* Extend the statepoint intrinsic to allow statepoints to be marked as ↵Pat Gavlin2015-05-088-45/+178
| | | | | | | | | | | | | | | | | | | | | | transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888
* Clear kill flags on all used registers when sinking instructions.Pete Cooper2015-05-081-0/+29
| | | | | | | | | | | | | The test here was sinking the AND here to a lower BB: %vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8 TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8 which meant that vreg8 was read after it was killed. This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND. llvm-svn: 236886
* [Hexagon] Update AnalyzeBranch, etc target hooksBrendon Cahoon2015-05-082-0/+655
| | | | | | | | | | | | | Improved the AnalyzeBranch, InsertBranch, and RemoveBranch functions in order to handle more of our branch instructions. This requires changes to analyzeCompare and PredicateInstructions. Specifically, we've added support for new value compare jumps, improved handling of endloop, added more compare instructions, and improved support for predicate instructions. Differential Revision: http://reviews.llvm.org/D9559 llvm-svn: 236876
* [X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when ↵Andrea Di Biagio2015-05-081-0/+17
| | | | | | | | | | | | | | | | | | | | | | decoding a PSHUFB mask. The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes where the mask node is a load from constant pool, and the constant pool node is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching it how to also look through X86ISD::WrapperRIP. This helps function combineX86ShufflesRecusively to combine more shuffle sequences containing PSHUFB nodes if we are in RIPRel PIC mode. Before this change, llc (with -relocation-model=pic -march=x86-64) was unable to decode a pshufb where the mask was loaded from a constant pool. For example, the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its operand, so instead of generating a single 'movaps' the backend always generated a sub-optimal 'movdqa + pshufb' sequence. Added test x86-fold-pshufb.ll. llvm-svn: 236863
* Fix test added in r236850 for OSX builders.James Y Knight2015-05-081-1/+1
| | | | | | | Need to specify triple so that llvm emits the asm syntax that the test expected. llvm-svn: 236855
* Fix alignment checks in MergeConsecutiveStores.James Y Knight2015-05-082-8/+71
| | | | | | | | | | | | | | | 1) check whether the alignment of the memory is sufficient for the *merged* store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first *might* have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850
* [mips] Emit the .insn directive for empty basic blocks.Vasileios Kalintiris2015-05-081-0/+27
| | | | | | | | | | | | | | | | | | | Summary: In microMIPS, labels need to know whether they are on code or data. This is indicated with STO_MIPS_MICROMIPS and can be inferred by being followed by instructions. For empty basic blocks, we can ensure this by emitting the .insn directive after the label. Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the exception handling regression tests under: SingleSource/Regression/C++/EH Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9530 llvm-svn: 236815
* Now that we have a soft-float attribute, use it instead of theEric Christopher2015-05-0827-60/+60
| | | | | | hard coded command line option for the Mips soft float tests. llvm-svn: 236801
* Clear kill flags in tail duplication.Pete Cooper2015-05-071-0/+54
| | | | | | | | | | | | | | If we duplicate an instruction then we must also clear kill flags on any uses we rewrite. Otherwise we might be killing a register which was used in other BBs. For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated. %vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10 %vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10 The copy here is inserted after the add and so needs vreg10 to be live. llvm-svn: 236782
* [AArch64] Fix sext/zext folding in address arithmetic.Pete Cooper2015-05-071-0/+39
| | | | | | | | We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764
* Add VSX Scalar loads and stores to the PPC back endNemanja Ivanovic2015-05-072-2/+143
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755
* Fix information loss in branch probability computation.Diego Novillo2015-05-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This addresses PR 22718. When branch weights are too large, they were being clamped to the range [1, MaxWeightForBB]. But this clamping is only applied to edges that go outside the range, so it distorts the relative branch probabilities. This patch changes the weight calculation to scale every branch so the relative probabilities are preserved. The scaling is done differently now. First, all the branch weights are added up, and if the sum exceeds 32 bits, it computes an integer scale to bring all the weights within the range. The patch fixes an existing test that had slightly wrong branch probabilities due to the previous clamping. It now gets branch weights scaled accordingly. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9442 llvm-svn: 236750
* [x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507)Sanjay Patel2015-05-071-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finish the job that was abandoned in D6958 following the refactoring in http://reviews.llvm.org/rL230221: 1. Uncomment the intrinsic def for the AVX r_Int instruction. 2. Add missing r_Int entries to the load folding tables; there are already tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I haven't added any more in this patch. 3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ). So instead of this: movaps %xmm0, %xmm1 rcpss %xmm1, %xmm1 movss %xmm1, %xmm0 We should now get: rcpss %xmm0, %xmm0 And instead of this: vsqrtss %xmm0, %xmm0, %xmm1 vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3] We should now get: vsqrtss %xmm0, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D9504 llvm-svn: 236740
* AVX-512: Added all forms of FP compare instructions for KNL and SKX.Elena Demikhovsky2015-05-073-189/+414
| | | | | | | | Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714
* llvm/test/CodeGen/X86/llc-override-mcpu-mattr.ll: Tweak not to be affected ↵NAKAMURA Takumi2015-05-071-4/+6
| | | | | | by x64 Calling Convention. llvm-svn: 236710
* Let llc and opt override "-target-cpu" and "-target-features" via command lineAkira Hatanaka2015-05-061-0/+19
| | | | | | | | | | | options. This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't override function attributes "-target-cpu" and "-target-features" in the IR. Differential Revision: http://reviews.llvm.org/D9537 llvm-svn: 236677
* Handle dead defs in the if converter.Pete Cooper2015-05-061-0/+55
| | | | | | | | | | | | | | | | | | | | | | | | We had code such as this: r2 = ... t2Bcc label1: ldr ... r2 label2; return r2<dead, def> The if converter was transforming this to r2<def> = ... return [pred] r2<dead,def> ldr <r2, kill> return which fails the machine verifier because the ldr now reads from a dead def. The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list. The caller then clears the dead flag from the def is the value is live. llvm-svn: 236660
* Fix incorrect kill flags in fastisel.Pete Cooper2015-05-061-0/+25
| | | | | | | | If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use. Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere. llvm-svn: 236650
* [x86] Fix register class of folded load index reg.Pete Cooper2015-05-061-0/+20
| | | | | | | When folding a load in to another instruction, we need to fix the class of the index register Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier. llvm-svn: 236644
* CodeGen: move over-zealous assert into actual if statement.Tim Northover2015-05-061-0/+19
| | | | | | | | | | | | | | | It's quite possible to encounter an insertvalue instruction that's more deeply nested than the value we're looking for, but when that happens we really mustn't compare beyond the end of the index array. Since I couldn't see any guarantees about what comparisons std::equal makes, we probably need to directly check the size beforehand. In practice, I suspect most std::equal implementations would probably bail early, which would be OK. But just in case... rdar://20834485 llvm-svn: 236635
* DwarfDebug: Emit number of bytes in .debug_loc entry directlyDuncan P. N. Exon Smith2015-05-062-10/+3
| | | | | | | | | | | | | | | | | | | | | | Emit the number of bytes in a `.debug_loc` entry directly. The old code created temp labels (expensive), emitted the difference between them, and then emitted one on each side of the relevant bytes. (I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc` (the optimized version of ld64's `-save-temps` when linking the `verify-uselistorder` executable in an LTO bootstrap). I've hacked `MCContext::Allocate()` to just call `malloc()` instead of using the `BumpPtrAllocator` so that the heap profile is easier to read. As far as peak memory is concerned, `MCContext::Allocate()` is equivalent to a leak, since it only gets freed at process teardown. In my heap profile, this patch drops memory usage of `DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB (2.7%) at peak memory. Some of that must be noise from `SmallVector` (or other) allocations -- peak memory only dropped from 1160 MB down to 1100 MB -- but this nevertheless shaves 5% off the top.) llvm-svn: 236629
* Readd the regression test from r236584. Calling convention fixed to linux.Pawel Bylica2015-05-061-0/+51
| | | | llvm-svn: 236610
* [ARM] Fast-Isel was incorrectly selecting <2 x double> adds.Pete Cooper2015-05-061-0/+33
| | | | | | | | | | With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add. However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it." This commit disables SelectBinaryFPOp for any vector types. There's already a FIXME to try handle neon. Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 || VT == MVT::i64' llvm-svn: 236609
* [PPC64LE] Adjust vector splats during VSX swap optimizationBill Schmidt2015-05-061-0/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606
OpenPOWER on IntegriCloud