summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Fix some Include What You Use warnings; other minor fixes (NFC).Eugene Zelenko2017-02-034-25/+46
| | | | | | This is preparation to reduce MCExpr.h dependencies. llvm-svn: 294053
* [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).Eugene Zelenko2017-02-032-2/+9
| | | | | | This is preparation to reduce MCExpr.h dependencies. llvm-svn: 294052
* [XCore] Fix some Include What You Use warnings; other minor fixes (NFC).Eugene Zelenko2017-02-032-3/+9
| | | | | | This is preparation to reduce MCExpr.h dependencies. llvm-svn: 294051
* AMDGPU: AsmParser cleanupsMatt Arsenault2017-02-031-17/+24
| | | | | | Use typedef, remove unnecessary enum, line wraps. llvm-svn: 294039
* [AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000Stanislav Mekhanoshin2017-02-031-1/+1
| | | | | | | | | | | This has quite positive performance impact according to measurements. Before previous fixes to limit the optimization that was too high and blowed compile time and scratch usage, but now this is gone and we can bump the threshold. Differential Revision: https://reviews.llvm.org/D29505 llvm-svn: 294032
* AMDGPU: Set MCAsmInfo::PointerSizeMatt Arsenault2017-02-031-0/+1
| | | | llvm-svn: 294031
* AMDGPU: Don't unroll for private with dynamic allocasMatt Arsenault2017-02-031-1/+1
| | | | | | | This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030
* [X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) ↵Simon Pilgrim2017-02-031-0/+14
| | | | | | | | into a target shuffle. Correctly flagging upper elements as undef. llvm-svn: 294020
* [mips] Remove absolute size assertion for end directiveSimon Dardis2017-02-031-4/+4
| | | | | | | | | | | | | | The .end <symbol> directive for MIPS marks the end of a symbol and sets the symbol's size. Previously, the corresponding emitDirective handler asserted that a function's size could be evaluated to an absolute value at that point in time. This cannot be done with when directives like .align have been encountered, instead set the function's size to the corresponding symbolic expression and let ELFObjectWriter resolve the expression to an absolute value. This avoids a redundant call to evaluateAsAbsolute. llvm-svn: 294012
* [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.Justin Lebar2017-02-031-0/+2
| | | | | | | | | | Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D29477 llvm-svn: 294011
* [AMDGPU][mc] Fix AddressSanitizer leftover issue in gfx7_asm_all testArtem Tamazov2017-02-033-9/+11
| | | | | | Issue occurs when assembling "ds_ordered_count v0, v0 gds". llvm-svn: 294004
* [ARM] Change TCReturn to tBL if tailcall optimization fails.Sanne Wouda2017-02-032-6/+16
| | | | | | | | | | | | | | | | | | Summary: The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code. Reviewers: rengolin, jmolloy, rovka, olista01 Reviewed By: olista01 Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D29020 llvm-svn: 294000
* [AMDGPU] Unroll preferences improvementsStanislav Mekhanoshin2017-02-031-1/+28
| | | | | | | | | | | Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991
* AMDGPU: Fold fneg into fmin/fmax_legacyMatt Arsenault2017-02-031-2/+24
| | | | llvm-svn: 293972
* [X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and ↵Craig Topper2017-02-031-27/+6
| | | | | | remove the custom lowering. llvm-svn: 293969
* AMDGPU: Fold fneg into fminnum/fmaxnumMatt Arsenault2017-02-031-0/+30
| | | | llvm-svn: 293968
* AMDGPU: Check if users of fneg can fold modsMatt Arsenault2017-02-021-4/+64
| | | | | | In multi-use cases this can save a few instructions. llvm-svn: 293962
* [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-0213-151/+240
| | | | | | minor fixes (NFC). llvm-svn: 293949
* [X86] Avoid sorted order check in release buildsReid Kleckner2017-02-021-4/+6
| | | | | | | Effectively reverts r290248 and fixes the unused function warning with ifndef NDEBUG. llvm-svn: 293945
* [X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to ↵Craig Topper2017-02-021-44/+39
| | | | | | | | DAG combine. On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI. llvm-svn: 293944
* [CodeGen] Remove dead call-or-prologue enum from CCStateReid Kleckner2017-02-021-30/+9
| | | | | | | This enum has been dead since Olivier Stannard re-implemented ARM byval handling in r202985 (2014). llvm-svn: 293943
* Change how we handle section symbols on ELF.Rafael Espindola2017-02-021-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | On ELF every section can have a corresponding section symbol. When in an assembly file we have .quad .text the '.text' refers to that symbol. The way we used to handle them is to leave .text an undefined symbol until the very end when the object writer would map them to the actual section symbol. The problem with that is that anything before the end would see an undefined symbol. This could result in bad diagnostics (test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s). Fixing this will also allow using the section symbol earlier for setting sh_link of SHF_METADATA sections. This patch includes a few hacks to avoid changing our behaviour when handling conflicts between section symbols and other symbols. I reported pr31850 to track that. llvm-svn: 293936
* [ARM] Classification Improvements to ARM Sched-Model. NFCI.Javed Absar2017-02-025-58/+160
| | | | | | | | | | | | This is the second in the series of patches to enable adding of machine sched-models for ARM processors easier and compact. This patch focuses on integer instructions and adds missing sched definitions. Reviewers: rovka, rengolin Differential Revision: https://reviews.llvm.org/D29127 llvm-svn: 293935
* [Hexagon] Adding opExtentBits and opExtentAlign to GPrel instructionsKrzysztof Parzyszek2017-02-024-12/+62
| | | | | | Patch by Colin LeMahieu. llvm-svn: 293933
* [X86] Add costs for non-AVX512 single-source permutation integer shufflesMichael Kuperstein2017-02-021-3/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932
* [Hexagon] Fix relocation kind for extended predicated callsKrzysztof Parzyszek2017-02-021-5/+7
| | | | | | Patch by Sid Manning. llvm-svn: 293931
* [Hexagon] Remove A4_ext_* pseudo instructionsKrzysztof Parzyszek2017-02-026-38/+35
| | | | | | Patch by Colin LeMahieu. llvm-svn: 293929
* [Hexagon] Fix insertBranch for loops with multiple ENDLOOP instructionsKrzysztof Parzyszek2017-02-021-18/+24
| | | | llvm-svn: 293925
* [WebAssembly] Add instruction definitions for drop and get/set_global.Dan Gohman2017-02-023-0/+18
| | | | llvm-svn: 293922
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-023-6/+11
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* [mips] Expansion of BEQL and BNEL with immediate operandsSimon Dardis2017-02-022-5/+30
| | | | | | | | | | | | Adds support for BEQL and BNEL macros with immediate operands. Patch by: Srdjan Obucina Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno Differential Revision: https://reviews.llvm.org/D17040 llvm-svn: 293905
* [SystemZ] Add comment for ISD::FP_TO_UINT expansion.Jonas Paulsson2017-02-021-0/+3
| | | | | | | (Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp) Review: Ulrich Weigand llvm-svn: 293900
* [Hexagon] Emitting individual instructions without copying themKrzysztof Parzyszek2017-02-022-97/+82
| | | | | | Patch by Colin LeMahieu. llvm-svn: 293899
* [Hexagon] Rename TypeCOMPOUND to TypeCJKrzysztof Parzyszek2017-02-028-16/+15
| | | | llvm-svn: 293894
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-023-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* [X86,ISEL] Fix X86 increment chain dependence calculationNirav Dave2017-02-021-0/+2
| | | | | | | | Merging Load-add-store pattern into a increment op previously dropped the load's chain from the instructions dependence if the store is chained to a TokenFactor. llvm-svn: 293892
* [ARM] GlobalISel: Lower pointer args and returnsDiana Picus2017-02-022-6/+35
| | | | | | | | | It is important to change the ArgInfo's type from pointer to integer, otherwise the CC assign function won't know what to do. Instead of hacking it up, we use ComputeValueVTs and introduce some of the helpers that we will need later on for lowering more complex types. llvm-svn: 293889
* [ARM] GlobalISel: Error out instead of assertingDiana Picus2017-02-021-1/+1
| | | | | | | | Allow unknown types in TLI.getValueType, otherwise we get asserts for certain types that we do not support yet (instead of returning that we don't support them and falling through the normal error path). llvm-svn: 293888
* [ARM] GlobalISel: Legalize loading pointersDiana Picus2017-02-021-1/+1
| | | | | | | Make it legal to load pointer values. Also check that pointers are assigned to the GPR reg bank by default. llvm-svn: 293886
* [X86][SSE] Use MOVMSK for all_of/any_of reduction patternsSimon Pilgrim2017-02-021-0/+83
| | | | | | | | | | This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880
* [X86] Remove some unused DAGCombinerInfo parameters. NFCCraig Topper2017-02-021-7/+4
| | | | llvm-svn: 293873
* [X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine.Craig Topper2017-02-021-53/+74
| | | | | | | | This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together. This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases. llvm-svn: 293872
* [AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.Craig Topper2017-02-021-1/+3
| | | | llvm-svn: 293862
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-028-8/+84
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AArch64RegisterInfo: Simplify getReservedReg(); NFCMatthias Braun2017-02-021-12/+4
| | | | | | | After marking a 32bit register and all its super registers the 64bit register does not need to be marked again. llvm-svn: 293855
* NVPTX: Fix not preserving volatile when expanding memsetMatt Arsenault2017-02-021-3/+4
| | | | llvm-svn: 293851
* X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).Peter Collingbourne2017-02-023-3/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844
* [AMDGPU] Account workgroup size in LDS occupancy limitsStanislav Mekhanoshin2017-02-014-58/+25
| | | | | | | | | | | | | | | | | | Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837
* [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-02-013-34/+59
| | | | | | other minor fixes (NFC). llvm-svn: 293836
* Change debug-info-for-profiling from a TargetOption to a function attribute.Dehao Chen2017-02-011-6/+0
| | | | | | | | | | | | | | Summary: LTO requires the debug-info-for-profiling to be a function attribute. Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl Reviewed By: mehdi_amini, dblaikie, aprantl Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29203 llvm-svn: 293833
OpenPOWER on IntegriCloud