summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [SystemZ] TargetTransformInfo cost functions implemented.Jonas Paulsson2017-04-1211-32/+620
| | | | | | | | | | | | | | | | getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
* [AMDGPU] SDWA: make pass globalSam Kolton2017-04-121-183/+175
| | | | | | | | | | | | Summary: Remove checks for basic blocks. Reviewers: vpykhtin, rampitec, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31935 llvm-svn: 300040
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-125-1/+1881
| | | | | | Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
* Revert "[WebAssembly] Update use of Attributes after r299875"Derek Schuff2017-04-121-14/+17
| | | | | | | | This reverts commit 2a0eb61dcccb15058d5b2a572bb3da0cf47fd550, r300015 I raced with rnk on the commit. llvm-svn: 300016
* [WebAssembly] Update use of Attributes after r299875Derek Schuff2017-04-121-17/+14
| | | | | | This fixes the failing WebAssemblyLowerEmscriptenEHSjLj tests llvm-svn: 300015
* AMDGPU: Insert wait at start of callee functionsMatt Arsenault2017-04-111-0/+14
| | | | llvm-svn: 300000
* AMDGPU: Refactor SIMachineFunctionInfo slightlyMatt Arsenault2017-04-113-16/+38
| | | | | | Prepare for handling non-entry functions. llvm-svn: 299999
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-1110-276/+375
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Fix folding reg_sequence into copy to phys regMatt Arsenault2017-04-111-0/+4
| | | | | | | This was producing an illegal reg_sequence defining a physical register with virtual register inputs. llvm-svn: 299997
* AMDGPU: Prune unecessary includeMatt Arsenault2017-04-111-2/+0
| | | | llvm-svn: 299996
* [AArch64] Fix scheduling info for INS(vector, general) instruction.Balaram Makam2017-04-112-1/+6
| | | | llvm-svn: 299994
* [x86] Relax the check in areLoadsFromSameBasePtrEaswaran Raman2017-04-111-19/+16
| | | | | | | | | Check if the scale operand is identical (doesn't have to be 1) and do not check the chaain operand. Differential revision: https://reviews.llvm.org/D31833 llvm-svn: 299986
* [AArch64] Simplify MacroFusionEvandro Menezes2017-04-111-79/+89
| | | | | | | | | | | | This patch assumes that the dependents to be scanned for the ExitSU are its predecessors; otherwise, the successors of the instr are scanned. Furthermore, sometimes the ExitSU was being fused twice, since it may be fused once when scanning the successors from the beginning of the BB and then again when scanning the predecessors of ExitSU. Thus, when scanning the successors of an instr, skip the ExitSU. llvm-svn: 299974
* [X86] Create the correct ADC/SBB SDNode when lowering add.Davide Italiano2017-04-111-2/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973
* Fix spelling compliment->complement. Mostly refering to 2s complement. NFCCraig Topper2017-04-113-4/+4
| | | | llvm-svn: 299970
* [AMDGPU] Add A5 to data layout for amdgiz environmentYaxun Liu2017-04-111-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964
* Module::getOrInsertFunction is using C-style vararg instead of variadic ↵Serge Guelton2017-04-113-4/+3
| | | | | | | | | | | templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. Differential Revision: https://reviews.llvm.org/D31070 llvm-svn: 299949
* Remove unused functions. Remove static qualifier from functions in header ↵Vassil Vassilev2017-04-111-10/+0
| | | | | | files. NFC. llvm-svn: 299947
* [AVR] Migrate to new MCAsmBackend applyFixupJonathan Roelofs2017-04-112-2/+2
| | | | | | | | https://reviews.llvm.org/D31875 Patch by Leslie Zhai! llvm-svn: 299946
* [ARM] Refactor Thumb2 sat instructionsSam Parker2017-04-111-48/+30
| | | | | | | | | Refactor the USAT, SSAT, USAT16 and SSAT16 instruction descriptions for Thumb2. Differential Revision: https://reviews.llvm.org/D31933 llvm-svn: 299945
* GlobalISel: Allow legalizing G_FADD to a libcallDiana Picus2017-04-111-0/+3
| | | | | | | | | Use the same handling in the generic legalizer code as for the other libcalls (G_FREM, G_FPOW). Enable it on ARM for float and double so we can test it. llvm-svn: 299931
* Revert "Turn some C-style vararg into variadic templates"Diana Picus2017-04-113-3/+4
| | | | | | | This reverts commit r299925 because it broke the buildbots. See e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008 llvm-svn: 299928
* Turn some C-style vararg into variadic templatesSerge Guelton2017-04-113-4/+3
| | | | | | | | | | | | Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. llvm-svn: 299925
* [PowerPC] multiply-with-overflow might use the CTR registerHal Finkel2017-04-111-9/+11
| | | | | | | | | | | | Check the legality of ISD::[US]MULO to see whether Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus, will use the CTR register). Fixes PR32485. Patch by Tim Neumann! Differential Revision: https://reviews.llvm.org/D31790 llvm-svn: 299910
* Allow DataLayout to specify addrspace for allocas.Matt Arsenault2017-04-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | LLVM makes several assumptions about address space 0. However, alloca is presently constrained to always return this address space. There's no real way to avoid using alloca, so without this there is no way to opt out of these assumptions. The problematic assumptions include: - That the pointer size used for the stack is the same size as the code size pointer, which is also the maximum sized pointer. - That 0 is an invalid, non-dereferencable pointer value. These are problems for AMDGPU because alloca is used to implement the private address space, which uses a 32-bit index as the pointer value. Other pointers are 64-bit and behave more like LLVM's notion of generic address space. By changing the address space used for allocas, we can change our generic pointer type to be LLVM's generic pointer type which does have similar properties. llvm-svn: 299888
* Get the TOC save offset off of PPCFrameLowering rather than a separate copy ↵Eric Christopher2017-04-101-1/+1
| | | | | | of the same data. llvm-svn: 299887
* [mips] Use Triple::isLittleEndian to check endianness. NFCSimon Atanasyan2017-04-101-3/+1
| | | | llvm-svn: 299872
* [ARM/AArch64] Ensure valid vector element types for interleaved accessesMatthew Simpson2017-04-106-39/+86
| | | | | | | | | | | This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864
* AMDGPU: Fix crash when disassembling VOP3 macMatt Arsenault2017-04-1010-19/+23
| | | | | | | | | | | | The unused dummy src2_modifiers is missing, so it crashes when trying to print it. I tried to fully remove src2_modifiers, but there are some irritations in the places where it is converted to mad since it starts to require modifying use lists while iterating over them. llvm-svn: 299861
* [X86][MMX] Add fast-isel support for MMX non-temporal writesSimon Pilgrim2017-04-101-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D31754 llvm-svn: 299852
* [ARM] GlobalISel: Support G_FPOW for float and doubleDiana Picus2017-04-101-2/+3
| | | | | | Legalize to a libcall. llvm-svn: 299841
* AMDGPU: Actually write nops for writeNopDataMatt Arsenault2017-04-081-1/+14
| | | | | | | Before this was just writing 0s, which ends up looking like a v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead. llvm-svn: 299816
* [AArch64] Refine Falkor Machine Model - Part 3Balaram Makam2017-04-085-26/+135
| | | | | | | | | This concludes the refinements to Falkor Machine Model. It includes SchedPredicates for immediate zero and LSL Fast. Forwarding logic is also modeled for vector multiply and accumulate only. llvm-svn: 299810
* [ARM] Prefer BIC over BFC in ARM mode.Eli Friedman2017-04-071-0/+1
| | | | | | | | | | | | BIC is generally faster, and it can put the output in a different register from the input. We already do this in Thumb2 mode; not sure why the equivalent fix never got applied to ARM mode. Differential Revision: https://reviews.llvm.org/D31797 llvm-svn: 299803
* [AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18Petr Hosek2017-04-071-0/+5
| | | | | | | | | | | | When using -ffixed-x18, the x18 (or w18) register can safely be used with the "global register variable" GCC extension, but the backend fails to recognize it. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31793 llvm-svn: 299799
* Revert "[SelectionDAG] Enable target specific vector scalarization of calls ↵Simon Dardis2017-04-076-198/+15
| | | | | | | | | | | | | and returns" This reverts commit r299766. This change appears to have broken the MIPS buildbots. Reverting while I investigate. Revert "[mips] Remove usage of debug only variable (NFC)" This reverts commit r299769. Follow up commit. llvm-svn: 299788
* [AMDGPU] Unroll more to eliminate phis and conditionsStanislav Mekhanoshin2017-04-071-2/+52
| | | | | | | | | | | | | Increase threshold to unroll a loop which contains an "if" statement whose condition defined by a PHI belonging to the loop. This may help to eliminate if region and potentially even PHI itself, saving on both divergence and registers used for the PHI. Add a small bonus for each of such "if" statements. Differential Revision: https://reviews.llvm.org/D31693 llvm-svn: 299779
* Use PMADDWD to expand reduction in a loopDehao Chen2017-04-071-0/+47
| | | | | | | | | | | | | | | | | | Summary: PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like: for (int i = 0; i < count; i++) a += x[i] * y[i]; Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31679 llvm-svn: 299776
* [GlobalISel] implement narrowing for G_CONSTANT.Igor Breger2017-04-071-0/+16
| | | | | | | | | | | | | | Summary: [GlobalISel] implement narrowing for G_CONSTANT. Reviewers: bogner, zvi, t.p.northover Reviewed By: t.p.northover Subscribers: llvm-commits, dberris, rovka, kristof.beyls Differential Revision: https://reviews.llvm.org/D31744 llvm-svn: 299772
* [mips] Remove usage of debug only variable (NFC)Simon Dardis2017-04-071-2/+2
| | | | | | | Fix the lld-x86_64-darwin13 buildbot by removing the declaration of a debug only variable and instead moving the value into the debug statement. llvm-svn: 299769
* [mips][msa] Fix generation of bm(n)zi and bins[lr]i instructionsPetar Jovanovic2017-04-072-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | We have two cases here, the first one being the following instruction selection from the builtin function: bm(n)zi builtin -> vselect node -> bins[lr]i machine instruction In case of bm(n)zi having an immediate which has either its high or low bits set, a bins[lr] instruction can be selected through the selectVSplatMask[LR] function. The function counts the number of bits set, and that value is being passed to the bins[lr]i instruction as its immediate, which in turn copies immediate modulo the size of the element in bits plus 1 as per specs, where we get the off-by-one-error. The other case is: bins[lr]i -> vselect node -> bsel.v In this case, a bsel.v instruction gets selected with a mask having one bit less set than required. Patch by Stefan Maksimovic. Differential Revision: https://reviews.llvm.org/D30579 llvm-svn: 299768
* [AMDGPU][MC] Fix for Bug 28211 + LIT testsDmitry Preobrazhensky2017-04-072-36/+48
| | | | | | | | | | | | | | | | | | | | - corrected DS_GWS_* opcodes (see VI_Shader_Programming#16.pdf for detailed description) - address operand is not used - several opcodes have data operand - all opcodes have offset modifier - DS_AND_SRC2_B32: corrected typo in mnemo - DS_WRAP_RTN_F32 replaced with DS_WRAP_RTN_B32 - added CI/VI opcodes: - DS_CONDXCHG32_RTN_B64 - DS_GWS_SEMA_RELEASE_ALL - added VI opcodes: - DS_CONSUME - DS_APPEND - DS_ORDERED_COUNT Differential Revision: https://reviews.llvm.org/D31707 llvm-svn: 299767
* [SelectionDAG] Enable target specific vector scalarization of calls and returnsSimon Dardis2017-04-076-15/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown, backends can request that LLVM to scalarize vector types for calls and returns. The MIPS vector ABI requires that vector arguments and returns are passed in integer registers. With SelectionDAG's new hooks, the MIPS backend can now handle LLVM-IR with vector types in calls and returns. E.g. 'call @foo(<4 x i32> %4)'. Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for calls and returns if vector types were not legal. If vector types were legal, a single 128bit vector argument would be assigned to a single 32 bit / 64 bit integer register. By teaching the MIPS backend to inspect the original types, it can now implement the MIPS vector ABI which requires a particular method of scalarizing vectors. Previously, the MIPS backend relied on clang to scalarize types such as "call @foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3, i32 inreg %4)". This patch enables the MIPS backend to take either form for vector types. Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur Differential Revision: https://reviews.llvm.org/D27845 llvm-svn: 299766
* [SystemZ] Check for presence of vector support in SystemZISelLoweringJonas Paulsson2017-04-072-2/+6
| | | | | | | | | | | | | | A test case was found with llvm-stress that caused DAGCombiner to crash when compiling for an older subtarget without vector support. SystemZTargetLowering::combineTruncateExtract() should do nothing for older subtargets. This check was placed in canTreatAsByteVector(), which also helps in a few other places. Review: Ulrich Weigand llvm-svn: 299763
* [SystemZ] Remove confusing comment in combineEXTRACT_VECTOR_ELT()Jonas Paulsson2017-04-071-2/+0
| | | | | | It isn't just one-element vectors that can appear here. llvm-svn: 299762
* [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passesSam Kolton2017-04-071-5/+5
| | | | | | | | | | | | | | Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757
* [ARM] GlobalISel: Support frem for 64-bit valuesDiana Picus2017-04-071-0/+1
| | | | | | Legalize to a libcall. llvm-svn: 299756
* [ARM] GlobalISel: Support frem for 32-bit valuesDiana Picus2017-04-072-5/+3
| | | | | | | | Legalize to a libcall. On this occasion, also start allowing soft float subtargets. For the moment G_FREM is the only legal floating point operation for them. llvm-svn: 299753
* [WebAssembly] Fix -Wcovered-switch-default warningDerek Schuff2017-04-061-2/+1
| | | | llvm-svn: 299736
* AMDGPU/GFX9: Fix shared and private aperture queriesKonstantin Zhuravlyov2017-04-063-14/+35
| | | | | | Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727
OpenPOWER on IntegriCloud