summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AVX512: Fix LowerMSCATTER() return value.Igor Breger2016-02-171-1/+1
| | | | | | | | | | | Bug description: The bug was discovered when test was compiled with -O0. In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result. Change LowerMSCATTER() to return chain as original node do. Differential Revision: http://reviews.llvm.org/D17331 llvm-svn: 261090
* [mips] Removed the SHF_ALLOC flag and the SHT_REL flag from the .pdr section.Scott Egerton2016-02-171-2/+1
| | | | | | | | | | | | This section is used for debug information and has no need to be in memory at runtime. This patch also fixes an error when compiling the Linux kernel. The error is that there are relocations within the .pdr section in a VDSO. SHT_REL was removed as it is a section type and not a section flag, therefore it does not make sense for it to be there. With this patch, LLVM now emits the same flags as the GNU assembler. llvm-svn: 261083
* [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectorsSimon Pilgrim2016-02-171-1/+3
| | | | | | | | | | | | AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour. Part 2 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261082
* [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectorsSimon Pilgrim2016-02-171-2/+6
| | | | | | | | | | | | AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back. This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors. Part 1 of 2 Differential Revision: http://reviews.llvm.org/D17292 llvm-svn: 261081
* [X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI.Simon Pilgrim2016-02-171-23/+20
| | | | | | | | Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow. Renamed from V to Ops to more closely match their purpose. llvm-svn: 261078
* [Hexagon] cast<> a reference instead of referencing + dereferencing.Benjamin Kramer2016-02-171-1/+1
| | | | llvm-svn: 261077
* Revert r260979 "[X86] Enable the LEA optimization pass by default."Hans Wennborg2016-02-171-5/+4
| | | | | | Asserts are still firing in Chromium builds. PR26575. llvm-svn: 261058
* WebAssembly: update expected failuresJF Bastien2016-02-171-3/+0
| | | | | | r261050 seems to inadvertently fix the assertion failure. llvm-svn: 261051
* [WebAssembly] Call memcpy for large byval copies.Dan Gohman2016-02-171-1/+1
| | | | | | | | | | This fixes very slow compilation on test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy and friends are not yet carefully tuned so the cutoff point is currently somewhat arbitrary. However, it's important that there be a cutoff point so that we don't emit unbounded quantities of loads and stores. llvm-svn: 261050
* WebAssembly: update expected test failuresJF Bastien2016-02-171-3/+0
| | | | | | r261032 adds frame address support. llvm-svn: 261044
* [X86] Fix a shrink-wrapping miscompile around __chkstkReid Kleckner2016-02-171-7/+6
| | | | | | | | | __chkstk clobbers EAX. If EAX is live across the prologue, then we have to take extra steps to save it. We already had code to do this if EAX was a register parameter. This change adapts it to work when shrink wrapping is used. llvm-svn: 261039
* [WebAssembly] Use SDValue::getConstantOperandVal. NFC.Dan Gohman2016-02-171-1/+1
| | | | llvm-svn: 261037
* [WebAssembly] Implement __builtin_frame_address.Dan Gohman2016-02-164-8/+23
| | | | | | Differential Revision: http://reviews.llvm.org/D17307 llvm-svn: 261032
* [X86] Remove the now-unused X86ISD::PSIGN. NFC.Ahmed Bougacha2016-02-166-46/+30
| | | | llvm-svn: 261025
* [X86] Generalize logic blend of (x, -x) combine to match (-x, x).Ahmed Bougacha2016-02-161-7/+17
| | | | | | I suspect this is what let PR26110 lie dormant for so long. llvm-svn: 261024
* [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.Ahmed Bougacha2016-02-161-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we sometimes miscompile this vector pattern: (c ? -v : v) We lower it to (because "c" is <4 x i1>, lowered as a vector mask): (~c & v) | (c & -v) When we have SSSE3, we incorrectly lower that to PSIGN, which does: (c < 0 ? -v : c > 0 ? v : 0) in other words, when c is either all-ones or all-zero: (c ? -v : 0) While this is an old bug, it rarely triggers because the PSIGN combine is too sensitive to operand order. This will be improved separately. Note that the PSIGN tests are also incorrect. Consider: %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31> %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %1 = and <4 x i32> %a, %0 %2 = and <4 x i32> %b.lobit, %sub %cond = or <4 x i32> %1, %2 ret <4 x i32> %cond if %b is zero: %b.lobit = <4 x i32> zeroinitializer %sub = sub nsw <4 x i32> zeroinitializer, %a %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1> %1 = <4 x i32> %a %2 = <4 x i32> zeroinitializer %cond = or <4 x i32> %a, zeroinitializer ret <4 x i32> %a whereas we currently generate: psignd %xmm1, %xmm0 retq which returns 0, as %xmm1 is 0. Instead, use a pure logic sequence, as described in: https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate Fixes PR26110. Differential Revision: http://reviews.llvm.org/D17181 llvm-svn: 261023
* [X86] Extract PSIGN/BLENDVP combine. NFC.Ahmed Bougacha2016-02-161-77/+95
| | | | llvm-svn: 261021
* [X86] Extract ANDNP combine. NFC.Ahmed Bougacha2016-02-161-61/+57
| | | | | | This makes it IMO more readable and reduces indentation. llvm-svn: 261020
* [WebAssembly] Update torture test expectationsDerek Schuff2016-02-161-7/+0
| | | | | | These were fixed with r260978 llvm-svn: 261017
* [WebAssemly] Don't move calls or stores past intervening loadsDerek Schuff2016-02-161-0/+1
| | | | | | | | | | The register stackifier currently checks for intervening stores (and loads that may alias them) but doesn't account for the fact that the instruction being moved may affect intervening loads. Differential Revision: http://reviews.llvm.org/D17298 llvm-svn: 261014
* [Hexagon] Adding relocation for code size, cold path optimization allowing a ↵Colin LeMahieu2016-02-1611-1/+70
| | | | | | | | | | | | 23-bit 4-byte aligned relocation to be a valid instruction encoding. The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes. Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware. This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register. llvm-svn: 261006
* [AArch64] Add pass to remove redundant copy after RAJun Bum Lim2016-02-164-0/+181
| | | | | | | | | | | | | | | | | | | | | Summary: This change will add a pass to remove unnecessary zero copies in target blocks of cbz/cbnz instructions. E.g., the copy instruction in the code below can be removed because the cbz jumps to BB1 when x0 is zero : BB0: cbz x0, .BB1 BB1: mov x0, xzr Jun Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin Differential Revision: http://reviews.llvm.org/D16203 llvm-svn: 261004
* [GlobalISel] Re-apply r260922-260923 with MSVC-friendly code.Quentin Colombet2016-02-167-88/+161
| | | | | | | | | Original message: Get rid of the ifdefs in TargetLowering. Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260998
* [WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodesDerek Schuff2016-02-163-24/+34
| | | | | | | | | | | | | | CopyToReg nodes don't support FrameIndex operands. Other targets select the FI to some LEA-like instruction, but since we don't have that, we need to insert some kind of instruction that can take an FI operand and produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy copy_local between Op and its FI operand. This results in a redundant copy which we should optimize away later (maybe in the post-FI-lowering peephole pass). Differential Revision: http://reviews.llvm.org/D17213 llvm-svn: 260987
* [AMDGPU] Rename $dst operand to $vdst for VOP instructions.Tom Stellard2016-02-166-77/+120
| | | | | | | | | | | | | | Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler. Reviewers: vpykhtin, tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, nhaustov Projects: #llvm-amdgpu-spb Differential Revision: http://reviews.llvm.org/D16920 llvm-svn: 260986
* [X86] Enable the LEA optimization pass by default.Andrey Turetskiy2016-02-161-4/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D16877 llvm-svn: 260979
* [WebAssembly] Switch from RPO sorting to topological sorting.Dan Gohman2016-02-161-109/+167
| | | | | | | | | | WebAssembly doesn't require full RPO; topological sorting is sufficient and can preserve more of the MachineBlockPlacement ordering. Unfortunately, this still depends a lot on heuristics, because while we use the MachineBlockPlacement ordering as a guide, we can't use it in places where it isn't topologically ordered. This area will require further attention. llvm-svn: 260978
* Reverting r260922-260923; they cause link failures with MSVC.Aaron Ballman2016-02-167-159/+88
| | | | | | | http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc2015/builds/15436/steps/build/logs/stdio http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc18-DA/builds/961/steps/build_llvm/logs/stdio llvm-svn: 260972
* [WebAssembly] Create new registers instead of reusing old ones in RegStackify.Dan Gohman2016-02-161-7/+9
| | | | | | | | This avoids some complications updating LiveIntervals to be aware of the new register lifetimes, because we can just compute new intervals from scratch rather than describe how the old ones have been changed. llvm-svn: 260971
* [WebAssembly] Implement support for custom NaN bit patterns.Dan Gohman2016-02-164-14/+36
| | | | llvm-svn: 260968
* [X86] PR26575: Fix LEA optimization pass.Andrey Turetskiy2016-02-161-0/+5
| | | | | | | | | | Add a missing check for a type of address displacement operand of the load/store instruction being a candidate for LEA substitution. Ref: https://llvm.org/bugs/show_bug.cgi?id=26575 Differential Revision: http://reviews.llvm.org/D17261 llvm-svn: 260959
* [Hexagon] Hoist nonnull assert up.Benjamin Kramer2016-02-162-1/+1
| | | | | | | | Once a pointer is turned into a reference it cannot be nullptr, clang rightfully warns about this assert being a tautology. Put the assert before the reference is created. llvm-svn: 260949
* [X86] Fix typos. NFCCraig Topper2016-02-161-2/+2
| | | | llvm-svn: 260943
* [X86] Use range-based for loop. NFCCraig Topper2016-02-161-3/+2
| | | | llvm-svn: 260942
* [X86] Fix typo in comment. NFCCraig Topper2016-02-161-1/+1
| | | | llvm-svn: 260940
* Restore the capability to manipulate datalayout from the C APIAmaury Sechet2016-02-162-0/+12
| | | | | | | | | | | | | | | | | Summary: This consist in variosu addition to the C API: LLVMTargetDataRef LLVMGetModuleDataLayout(LLVMModuleRef M); void LLVMSetModuleDataLayout(LLVMModuleRef M, LLVMTargetDataRef DL); LLVMTargetDataRef LLVMCreateTargetMachineData(LLVMTargetMachineRef T); Reviewers: joker.eph, Wallbraker, echristo Subscribers: axw Differential Revision: http://reviews.llvm.org/D17255 llvm-svn: 260936
* [GlobalISel] Get rid of the ifdefs in TargetLowering.Quentin Colombet2016-02-167-88/+159
| | | | | | | Introduce a new API used only by GlobalISel: CallLowering. This API will contain target hooks dedicated to call lowering. llvm-svn: 260922
* Kill LLVMAddTargetDataAmaury Sechet2016-02-161-3/+0
| | | | | | | | | | | | Summary: It's red, it's dead. Reviewers: joker.eph, Wallbraker, echristo Subscribers: llvm-commits, axw Differential Revision: http://reviews.llvm.org/D17282 llvm-svn: 260919
* Implemented stack symbol table ordering/packing optimization to improve data ↵Zia Ansari2016-02-152-0/+149
| | | | | | | | locality and code size from SP/FP offset encoding. Differential Revision: http://reviews.llvm.org/D15393 llvm-svn: 260917
* [X86] Remove now-dead variable and redundant assert. NFC.Ahmed Bougacha2016-02-151-2/+0
| | | | | | | The variable was made dead in NDEBUG by r260901, but the assert was redundant anyway: get rid of both. llvm-svn: 260908
* [NFC] Fixing naming convention, lowercase start of function name.Colin LeMahieu2016-02-156-33/+33
| | | | llvm-svn: 260903
* [Hexagon] Wrapping all MCExprs inside MCOperands within HexagonMCExpr to ↵Colin LeMahieu2016-02-158-92/+169
| | | | | | simplify handling and allow flags on the expression. llvm-svn: 260902
* [CodeGen] Document and use getConstant's splat-building feature. NFC.Ahmed Bougacha2016-02-153-86/+32
| | | | | | Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901
* [mips] Implemented the .hword directive.Scott Egerton2016-02-152-0/+6
| | | | | | | | | | | | | | Summary: In order to pass the tests, this required marking R_MIPS_16 relocations as needing to point to the symbol and not the section. Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17200 llvm-svn: 260896
* [Hexagon] Use zero-extending loads for anyextKrzysztof Parzyszek2016-02-151-6/+6
| | | | llvm-svn: 260895
* Reverted r260879 as it caused test failures in lld.Scott Egerton2016-02-151-1/+1
| | | | llvm-svn: 260880
* [mips] Removed the SHF_ALLOC flag from the .pdr section.Scott Egerton2016-02-151-1/+1
| | | | | | | | | | | | | | | | | Summary: This section is used for debug information and has no need to be in memory at runtime. With this patch, LLVM now emits the same flags as the GNU assembler. This patch also fixes an error when compiling the Linux kernel, The error is that there are relocations within the .pdr section in a VDSO. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17199 llvm-svn: 260879
* AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 ↵Igor Breger2016-02-153-20/+40
| | | | | | | | | | are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878
* Support: Fix incremental build when re-configuring targetsDuncan P. N. Exon Smith2016-02-131-1/+1
| | | | | | | | | | | | | | r180893 added an indirect include of llvm/Config/Targets.def to llvm/Support/CodeGen.h, which in turn is included by things like llvm/IR/Module.h. After a full build of LLVM and Clang, ninja had to rebuild 1274 files after reconfiguring. This commit strips CodeGen.h back down to just a pile of enums and moves the expensive includes over to CodeGenCWrappers.h (which is only included in two places). This gets ninja down to 88 files if you reconfigure with, e.g., -DLLVM_TARGETS_TO_BUILD=X86. llvm-svn: 260835
* [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shufflesSimon Pilgrim2016-02-131-0/+166
| | | | | | | | | | | | | | | | | | This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations. On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle. This patch has several benefits: * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling. * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure). * Matching the repeating shuffle makes use of a lot of existing shuffle lowering. There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review. Differential Revision: http://reviews.llvm.org/D16537 llvm-svn: 260834
OpenPOWER on IntegriCloud