summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Implement __builtin_frame_address.Dan Gohman2016-02-164-8/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D17307 llvm-svn: 261032
* [X86] Remove the now-unused X86ISD::PSIGN. NFC.Ahmed Bougacha2016-02-166-46/+30
| | | | llvm-svn: 261025
* [X86] Generalize logic blend of (x, -x) combine to match (-x, x).Ahmed Bougacha2016-02-161-7/+17
| | | | | | I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024
* [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.Ahmed Bougacha2016-02-161-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) | (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023
* [X86] Extract PSIGN/BLENDVP combine. NFC.Ahmed Bougacha2016-02-161-77/+95
| | | | llvm-svn: 261021
* [X86] Extract ANDNP combine. NFC.Ahmed Bougacha2016-02-161-61/+57
| | | | | | This makes it IMO more readable and reduces indentation. llvm-svn: 261020
* [WebAssembly] Update torture test expectationsDerek Schuff2016-02-161-7/+0
| | | | | | These were fixed with r260978 llvm-svn: 261017
* [WebAssemly] Don't move calls or stores past intervening loadsDerek Schuff2016-02-161-0/+1
| | | | | | | | | | The register stackifier currently checks for intervening stores (and loads that may alias them) but doesn't account for the fact that the instruction being moved may affect intervening loads. Differential Revision: http://reviews.llvm.org/D17298 llvm-svn: 261014
* [Hexagon] Adding relocation for code size, cold path optimization allowing a ↵Colin LeMahieu2016-02-1611-1/+70
| | | | | | | | | | | | 23-bit 4-byte aligned relocation to be a valid instruction encoding. The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes. Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware. This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register. llvm-svn: 261006
* [AArch64] Add pass to remove redundant copy after RAJun Bum Lim2016-02-164-0/+181
| | | | | | | | | | | | | | | | | | | | | Summary: This change will add a pass to remove unnecessary zero copies in target blocks of cbz/cbnz instructions. E.g., the copy instruction in the code below can be removed because the cbz jumps to BB1 when x0 is zero : BB0: cbz x0, .BB1 BB1: mov x0, xzr Jun Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16203 llvm-svn: 261004
* [GlobalISel] Re-apply r260922-260923 with MSVC-friendly code.Quentin Colombet2016-02-167-88/+161
| | | | | | | | | Original message: Get rid of the ifdefs in TargetLowering. Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260998
* [WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodesDerek Schuff2016-02-163-24/+34
| | | | | | | | | | | | | | CopyToReg nodes don't support FrameIndex operands. Other targets select the FI to some LEA-like instruction, but since we don't have that, we need to insert some kind of instruction that can take an FI operand and produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy copy_local between Op and its FI operand. This results in a redundant copy which we should optimize away later (maybe in the post-FI-lowering peephole pass). Differential Revision: http://reviews.llvm.org/D17213 llvm-svn: 260987
* [AMDGPU] Rename $dst operand to $vdst for VOP instructions.Tom Stellard2016-02-166-77/+120
| | | | | | | | | | | | | | Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986
* [X86] Enable the LEA optimization pass by default.Andrey Turetskiy2016-02-161-4/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260979
* [WebAssembly] Switch from RPO sorting to topological sorting.Dan Gohman2016-02-161-109/+167
| | | | | | | | | | WebAssembly doesn't require full RPO; topological sorting is sufficient and can preserve more of the MachineBlockPlacement ordering. Unfortunately, this still depends a lot on heuristics, because while we use the MachineBlockPlacement ordering as a guide, we can't use it in places where it isn't topologically ordered. This area will require further attention. llvm-svn: 260978
* Reverting r260922-260923; they cause link failures with MSVC.Aaron Ballman2016-02-167-159/+88
| | | | | | | http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio llvm-svn: 260972
* [WebAssembly] Create new registers instead of reusing old ones in RegStackify.Dan Gohman2016-02-161-7/+9
| | | | | | | | This avoids some complications updating LiveIntervals to be aware of the new register lifetimes, because we can just compute new intervals from scratch rather than describe how the old ones have been changed. llvm-svn: 260971
* [WebAssembly] Implement support for custom NaN bit patterns.Dan Gohman2016-02-164-14/+36
| | | | llvm-svn: 260968
* [X86] PR26575: Fix LEA optimization pass.Andrey Turetskiy2016-02-161-0/+5
| | | | | | | | | | Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17261 llvm-svn: 260959
* [Hexagon] Hoist nonnull assert up.Benjamin Kramer2016-02-162-1/+1
| | | | | | | | Once a pointer is turned into a reference it cannot be nullptr, clang rightfully warns about this assert being a tautology. Put the assert before the reference is created. llvm-svn: 260949
* [X86] Fix typos. NFCCraig Topper2016-02-161-2/+2
| | | | llvm-svn: 260943
* [X86] Use range-based for loop. NFCCraig Topper2016-02-161-3/+2
| | | | llvm-svn: 260942
* [X86] Fix typo in comment. NFCCraig Topper2016-02-161-1/+1
| | | | llvm-svn: 260940
* Restore the capability to manipulate datalayout from the C APIAmaury Sechet2016-02-162-0/+12
| | | | | | | | | | | | | | | | | Summary: This consist in variosu addition to the C API: LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M); void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL); LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T); Reviewers: joker.eph, Wallbraker, echristo Subscribers: axw Differential Revision: http://reviews.llvm.org/D17255 llvm-svn: 260936
* [GlobalISel] Get rid of the ifdefs in TargetLowering.Quentin Colombet2016-02-167-88/+159
| | | | | | | Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260922
* Kill LLVMAddTargetDataAmaury Sechet2016-02-161-3/+0
| | | | | | | | | | | | Summary: It's red, it's dead. Reviewers: joker.eph, Wallbraker, echristo Subscribers: llvm-commits, axw Differential Revision: http://reviews.llvm.org/D17282 llvm-svn: 260919
* Implemented stack symbol table ordering/packing optimization to improve data ↵Zia Ansari2016-02-152-0/+149
| | | | | | | | locality and code size from SP/FP offset encoding. Differential Revision: http://reviews.llvm.org/D15393 llvm-svn: 260917
* [X86] Remove now-dead variable and redundant assert. NFC.Ahmed Bougacha2016-02-151-2/+0
| | | | | | | The variable was made dead in NDEBUG by r260901, but the assert was redundant anyway: get rid of both. llvm-svn: 260908
* [NFC] Fixing naming convention, lowercase start of function name.Colin LeMahieu2016-02-156-33/+33
| | | | llvm-svn: 260903
* [Hexagon] Wrapping all MCExprs inside MCOperands within HexagonMCExpr to ↵Colin LeMahieu2016-02-158-92/+169
| | | | | | simplify handling and allow flags on the expression. llvm-svn: 260902
* [CodeGen] Document and use getConstant's splat-building feature. NFC.Ahmed Bougacha2016-02-153-86/+32
| | | | | | Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901
* [mips] Implemented the .hword directive.Scott Egerton2016-02-152-0/+6
| | | | | | | | | | | | | | Summary: In order to pass the tests, this required marking R_MIPS_16 relocations as needing to point to the symbol and not the section. Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17200 llvm-svn: 260896
* [Hexagon] Use zero-extending loads for anyextKrzysztof Parzyszek2016-02-151-6/+6
| | | | llvm-svn: 260895
* Reverted r260879 as it caused test failures in lld.Scott Egerton2016-02-151-1/+1
| | | | llvm-svn: 260880
* [mips] Removed the SHF_ALLOC flag from the .pdr section.Scott Egerton2016-02-151-1/+1
| | | | | | | | | | | | | | | | | Summary: This section is used for debug information and has no need to be in memory at runtime. With this patch, LLVM now emits the same flags as the GNU assembler. This patch also fixes an error when compiling the Linux kernel, The error is that there are relocations within the .pdr section in a VDSO. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17199 llvm-svn: 260879
* AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 ↵Igor Breger2016-02-153-20/+40
| | | | | | | | | | are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878
* Support: Fix incremental build when re-configuring targetsDuncan P. N. Exon Smith2016-02-131-1/+1
| | | | | | | | | | | | | | r180893 added an indirect include of llvm/Config/Targets.def to llvm/Support/CodeGen.h, which in turn is included by things like llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to rebuild 1274 files after reconfiguring. This commit strips CodeGen.h back down to just a pile of enums and moves the expensive includes over to CodeGenCWrappers.h (which is only included in two places). This gets ninja down to 88 files if you reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86. llvm-svn: 260835
* [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shufflesSimon Pilgrim2016-02-131-0/+166
| | | | | | | | | | | | | | | | | | This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834
* Remove Proc feature flags for X86 processors that are used to inherit ↵Craig Topper2016-02-132-38/+34
| | | | | | features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats. llvm-svn: 260832
* [x86-64] allow mfence even with -mno-sse (PR23203)Sanjay Patel2016-02-134-10/+11
| | | | | | | | | | | | | As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828
* [Hexagon] Replace use of "std::map::emplace" with "insert"Krzysztof Parzyszek2016-02-131-1/+4
| | | | | | | Gcc 4.7.2-4 does not seem to have "emplace" in its implementation of map. This should fix the build failure on polly-amd64-linux. llvm-svn: 260816
* HexagonFrameLowering.cpp: Appease msc18 to give an explicit constructor ↵NAKAMURA Takumi2016-02-131-2/+4
| | | | | | SlotInfo() instead of member initializers. llvm-svn: 260812
* AMDGPU: Prepare for reducing private element size.Matt Arsenault2016-02-131-14/+48
| | | | | | | | | | | | Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804
* AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsicTom Stellard2016-02-131-0/+11
| | | | | | | This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792
* AMDGPU: Cleanup includes and random macrosMatt Arsenault2016-02-131-11/+4
| | | | llvm-svn: 260784
* AMDGPU: Add intrinsics for sin/cosMatt Arsenault2016-02-132-1/+18
| | | | | | | These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782
* AMDGPU: Rename intrinsic to better match instruction nameMatt Arsenault2016-02-137-9/+9
| | | | | | Also fixes missing f32 test. llvm-svn: 260780
* AMDGPU/SI: Add instruction defs for VOP1 DPP instructionsTom Stellard2016-02-132-0/+107
| | | | | | | | | | Reviewers: nhaustov, cfang, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17159 llvm-svn: 260774
* AMDGPU: Fix broken condition causing warningMatt Arsenault2016-02-131-1/+1
| | | | llvm-svn: 260773
* Fix Windows buildbot breakage.Alexey Samsonov2016-02-121-3/+4
| | | | llvm-svn: 260766
OpenPOWER on IntegriCloud