summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-023-6/+11
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* [mips] Expansion of BEQL and BNEL with immediate operandsSimon Dardis2017-02-022-5/+30
| | | | | | | | | | | | Adds support for BEQL and BNEL macros with immediate operands. Patch by: Srdjan Obucina Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno Differential Revision: https://reviews.llvm.org/D17040 llvm-svn: 293905
* [SystemZ] Add comment for ISD::FP_TO_UINT expansion.Jonas Paulsson2017-02-021-0/+3
| | | | | | | (Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp) Review: Ulrich Weigand llvm-svn: 293900
* [Hexagon] Emitting individual instructions without copying themKrzysztof Parzyszek2017-02-022-97/+82
| | | | | | Patch by Colin LeMahieu. llvm-svn: 293899
* [Hexagon] Rename TypeCOMPOUND to TypeCJKrzysztof Parzyszek2017-02-028-16/+15
| | | | llvm-svn: 293894
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-023-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* [X86,ISEL] Fix X86 increment chain dependence calculationNirav Dave2017-02-021-0/+2
| | | | | | | | Merging Load-add-store pattern into a increment op previously dropped the load's chain from the instructions dependence if the store is chained to a TokenFactor. llvm-svn: 293892
* [ARM] GlobalISel: Lower pointer args and returnsDiana Picus2017-02-022-6/+35
| | | | | | | | | It is important to change the ArgInfo's type from pointer to integer, otherwise the CC assign function won't know what to do. Instead of hacking it up, we use ComputeValueVTs and introduce some of the helpers that we will need later on for lowering more complex types. llvm-svn: 293889
* [ARM] GlobalISel: Error out instead of assertingDiana Picus2017-02-021-1/+1
| | | | | | | | Allow unknown types in TLI.getValueType, otherwise we get asserts for certain types that we do not support yet (instead of returning that we don't support them and falling through the normal error path). llvm-svn: 293888
* [ARM] GlobalISel: Legalize loading pointersDiana Picus2017-02-021-1/+1
| | | | | | | Make it legal to load pointer values. Also check that pointers are assigned to the GPR reg bank by default. llvm-svn: 293886
* [X86][SSE] Use MOVMSK for all_of/any_of reduction patternsSimon Pilgrim2017-02-021-0/+83
| | | | | | | | | | This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880
* [X86] Remove some unused DAGCombinerInfo parameters. NFCCraig Topper2017-02-021-7/+4
| | | | llvm-svn: 293873
* [X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine.Craig Topper2017-02-021-53/+74
| | | | | | | | This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together. This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases. llvm-svn: 293872
* [AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.Craig Topper2017-02-021-1/+3
| | | | llvm-svn: 293862
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-028-8/+84
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AArch64RegisterInfo: Simplify getReservedReg(); NFCMatthias Braun2017-02-021-12/+4
| | | | | | | After marking a 32bit register and all its super registers the 64bit register does not need to be marked again. llvm-svn: 293855
* NVPTX: Fix not preserving volatile when expanding memsetMatt Arsenault2017-02-021-3/+4
| | | | llvm-svn: 293851
* X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).Peter Collingbourne2017-02-023-3/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-014-58/+25
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-013-34/+59
| | | | | | other minor fixes (NFC). llvm-svn: 293836
* Change debug-info-for-profiling from a TargetOption to a function attribute.Dehao Chen2017-02-011-6/+0
| | | | | | | | | | | | | | Summary: LTO requires the debug-info-for-profiling to be a function attribute. Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl Reviewed By: mehdi_amini, dblaikie, aprantl Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29203 llvm-svn: 293833
* AMDGPU: Allow clustering flat memory operationsMatt Arsenault2017-02-011-1/+2
| | | | llvm-svn: 293809
* [mips] Parse the 'bopt' and 'nobopt' directives in IAS.Simon Dardis2017-02-011-0/+8
| | | | | | | | | | | | | | | The GAS assembler supports the ".set bopt" directive but according to the sources it doesn't do anything. It's supposed to optimize branches by filling the delay slot of a branch with it's target. This patch teaches the MIPS asm parser to accept both and warn in the case of 'bopt' that the bopt directive is unsupported. This resolves PR/31841. Thanks to Sean Bruno for reporting the issue! llvm-svn: 293798
* [LV] Move interleaved access helper functions to VectorUtils (NFC)Matthew Simpson2017-02-012-28/+6
| | | | | | | | | | | | This patch moves some helper functions related to interleaved access vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like to use these functions in a follow-on patch that improves interleaved load and store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was already duplicated there and has been removed. Differential Revision: https://reviews.llvm.org/D29398 llvm-svn: 293788
* [X86][SSE] Remove unused argument. NFCI.Simon Pilgrim2017-02-011-9/+6
| | | | llvm-svn: 293777
* AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 opsMatt Arsenault2017-02-011-18/+41
| | | | | | | | | | | | These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. llvm-svn: 293776
* [mips] Fix an initialization issue with MipsABIInfo in MipsTargetELFStreamerSimon Dardis2017-02-011-0/+12
| | | | | | | | | | | | | | DebugInfoDWARFTests is the only user so far which initializes the MCObjectStreamer without initializing the ASMParser. The MIPS backend relies on the ASMParser to initialize the MipsABIInfo object and to update the target streamer with it. This should turn the mips buildbots green. Reviewers: atanasyan, zoran.jovanovic Differential Revision: https://reviews.llvm.org/D28025 llvm-svn: 293772
* [PowerPC] Fix sjlj pseduo instructions to use G8RC_NOX0 register classKit Barton2017-02-011-2/+3
| | | | | | | | | | | | | | The the following instructions: - LD/LWZ (expanded from sjLj pseudo-instructions) - LXVL/LXVLL vector loads - STXVL/STXVLL vector stores all require G8RC_NO0X class registers for RA. Differential Revision: https://reviews.llvm.org/D29289 Committed for Lei Huang llvm-svn: 293769
* [X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI.Simon Pilgrim2017-02-011-32/+21
| | | | | | These are identical apart from the extra SSE41 guard for PINSRB. llvm-svn: 293766
* [ARM] Enable Cortex-M23 and Cortex-M33 support.Javed Absar2017-02-011-0/+10
| | | | | | | | | | | | | Add both cores to the target parser and TableGen. Test that eabi attributes are set correctly for both cores. Additionally, test the absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively. Committed on behalf of Sanne Wouda. Reviewers : rengolin, olista01. Differential Revision: https://reviews.llvm.org/D29073 llvm-svn: 293761
* *MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation]NAKAMURA Takumi2017-02-012-3/+3
| | | | llvm-svn: 293744
* [X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit ↵Craig Topper2017-02-011-78/+74
| | | | | | | | | | loads/stores of integer types. For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal. This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too. llvm-svn: 293743
* [AArch64] Add new target feature to fuse literal generationEvandro Menezes2017-02-014-0/+46
| | | | | | | | | This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, sections 4.14 and 4.15. Differential revision: https://reviews.llvm.org/D28698 llvm-svn: 293739
* [AArch64] Add new subtarget feature to fuse AES crypto operationsEvandro Menezes2017-02-015-2/+26
| | | | | | | | | | This feature enables the fusion of such operations on Cortex A57, as recommended in its Software Optimisation Guide, section 4.13, and on Exynos M1. Differential revision: https://reviews.llvm.org/D28491 llvm-svn: 293738
* [CodeGen] Move MacroFusion to the targetEvandro Menezes2017-02-0112-249/+536
| | | | | | | | | | | | | This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737
* [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-0113-163/+227
| | | | | | other minor fixes (NFC). llvm-svn: 293729
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-014-13/+14
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* AMDGPU: Fix warningMatt Arsenault2017-01-311-1/+2
| | | | llvm-svn: 293717
* [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).Justin Lebar2017-01-311-3/+8
| | | | | | | | | | x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as desired. Verified that the particular nvptx approximate instructions here do in fact return 0 for x = 0. llvm-svn: 293713
* Shut up GCC warning about operator precedence. NFC.Michael Kuperstein2017-01-311-3/+3
| | | | | | | | Technically, this is actually changes the expression and the original assert was "wrong", but since the conjunction is with true, it doesn't matter in this case. llvm-svn: 293709
* MC: Introduce the ABS8 symbol modifier.Peter Collingbourne2017-01-311-0/+2
| | | | | | | | | | | @ABS8 can be applied to symbols which appear as immediate operands to instructions that have a 8-bit immediate form for that operand. It causes the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8 or R_X86_64_8) for the symbol. Differential Revision: https://reviews.llvm.org/D28688 llvm-svn: 293667
* AMDGPU: Use source mods with fcanonicalizeMatt Arsenault2017-01-311-6/+6
| | | | llvm-svn: 293654
* [X86] Implement -mfentryNirav Dave2017-01-312-0/+18
| | | | | | | | | | | | Summary: Insert calls to __fentry__ at function entry. Reviewers: hfinkel, craig.topper Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D28000 llvm-svn: 293648
* AMDGPU/SI: Fix inst-select-load-smrd.mir on some buildsTom Stellard2017-01-311-4/+10
| | | | | | | | | | | | | | Summary: For some reason instructions are being inserted in the wrong order with some builds. I'm not sure why this is happening. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D29325 llvm-svn: 293639
* [X86][SSE] Add support for combining PINSRB into a target shuffle.Simon Pilgrim2017-01-311-4/+7
| | | | llvm-svn: 293637
* [ARM] Avoid using ARM instructions in Thumb modeSam Parker2017-01-311-12/+13
| | | | | | | | | | | | | | The Requires class overrides the target requirements of an instruction, rather than adding to them, so all ARM instructions need to include the IsARM predicate when they have overwitten requirements. This caused the swp and swpb instructions to be allowed in thumb mode assembly, and the ARM encoding of CDP to be selected in codegen (which is different for conditional instructions). Differential Revision: https://reviews.llvm.org/D29283 llvm-svn: 293634
* [X86] Silence unused variable warning in Release builds.Benjamin Kramer2017-01-311-4/+5
| | | | llvm-svn: 293631
* [X86][SSE] Detect unary PBLEND shuffles.Simon Pilgrim2017-01-311-0/+1
| | | | | | These can appear during shuffle combining. llvm-svn: 293628
* [X86][SSE] Add support for combining PINSRW into a target shuffle.Simon Pilgrim2017-01-311-2/+31
| | | | | | | | | | Also add the ability to recognise PINSR(Vex, 0, Idx). Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat. The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch. llvm-svn: 293627
* [PowerPC][Altivec] Add vmr extended mnemonicNemanja Ivanovic2017-01-311-0/+3
| | | | | | | | | | | Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in the PPC back end. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29133 llvm-svn: 293626
OpenPOWER on IntegriCloud