summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Change exp with compr bit printingMatt Arsenault2017-02-221-3/+11
| | | | llvm-svn: 295873
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-224-16/+12
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-12/+16
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* [AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.Geoff Berry2017-02-221-43/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863
* [WebAssembly] Define a table of function signatures for runtime library calls.Dan Gohman2017-02-223-0/+1345
| | | | | | | | | | LLVM CodeGen emits references to external symbols that are never declared in LLVM IR level, so they have no declared signature. However, WebAssembly requires all functions be declared with signatures. This patch adds a table for providing signatures for known runtime libcalls that will be used in subsequent patches to emit declarations for such functions. llvm-svn: 295857
* [RDF] Skip undef uses when calculating kill flagsKrzysztof Parzyszek2017-02-221-1/+1
| | | | llvm-svn: 295856
* [RDF] Only access block live-ins when tracking livenessKrzysztof Parzyszek2017-02-221-2/+4
| | | | llvm-svn: 295855
* [WebAssembly] Configure codegen to legalize f16 values.Dan Gohman2017-02-221-0/+5
| | | | llvm-svn: 295850
* [X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly ↵Simon Pilgrim2017-02-221-18/+15
| | | | | | | | into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848
* [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() ↵Simon Pilgrim2017-02-222-8/+10
| | | | | | separately. NFCI. llvm-svn: 295845
* Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' ↵Simon Pilgrim2017-02-221-4/+0
| | | | | | checking llvm-svn: 295830
* [GlobalISel] Fix compiler warnings and make assert assert something.Benjamin Kramer2017-02-223-11/+7
| | | | llvm-svn: 295827
* [X86][GlobalISel] Initial implementation , select G_ADD gpr, gprIgor Breger2017-02-226-6/+194
| | | | | | | | | | | | | | Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824
* [ARM] Fix constant islands pass.Roger Ferrer Ibanez2017-02-221-0/+7
| | | | | | | | | | | | The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816
* [X86] Fix memory operands definition for some instructions.Ayman Musa2017-02-221-10/+14
| | | | | | | | Change integer memory operands to FP memory operands to some FP instructions. Differential Revision: https://reviews.llvm.org/D30201 llvm-svn: 295813
* [ARM] Classification Improvements to ARM Sched-Models. NFCI.Javed Absar2017-02-225-69/+111
| | | | | | | | | | | | | | This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811
* [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵Craig Topper2017-02-227-45/+85
| | | | | | | | | | | | when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
* [WebAssembly] Add skeleton MC support for the Wasm container formatDan Gohman2017-02-2213-14/+253
| | | | | | | | | This just adds the basic skeleton for supporting a new object file format. All of the actual encoding will be implemented in followup patches. Differential Revision: https://reviews.llvm.org/D26722 llvm-svn: 295803
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-227-5/+56
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Remove llvm.AMDGPU.clamp intrinsicMatt Arsenault2017-02-212-13/+0
| | | | llvm-svn: 295789
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-2112-29/+163
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* [NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵Artem Belevich2017-02-211-710/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
* AMDGPU: Formatting fixesMatt Arsenault2017-02-211-4/+5
| | | | llvm-svn: 295783
* [AArch64, X86] Add statistics for the MacroFusion passEvandro Menezes2017-02-212-0/+8
| | | | llvm-svn: 295777
* [AArch64, X86] Guard against both instrs being wild cardsEvandro Menezes2017-02-212-10/+12
| | | | | | If both instrs are wild cards, the result can be a crash. llvm-svn: 295776
* Fix PR31896.Evgeniy Stepanov2017-02-211-5/+8
| | | | | | Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762
* AMDGPU: Remove llvm.AMDGPU.flbit intrinsicMatt Arsenault2017-02-212-4/+0
| | | | llvm-svn: 295754
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-218-90/+225
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
* [CodeGenPrepare] Sink and duplicate more 'and' instructions.Geoff Berry2017-02-214-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746
* [X86] EltsFromConsecutiveLoads SDLoc argument should be const&.Simon Pilgrim2017-02-211-1/+1
| | | | | | There appears never to have been a time that the reference was updated. llvm-svn: 295739
* [X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.Simon Pilgrim2017-02-212-0/+16
| | | | | | As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733
* [ARM] Correct SP/PC handling in t2MOVrJohn Brawn2017-02-212-4/+20
| | | | | | | | | | PC isn't allowed in the source operand of t2MOVr, so change the register class to one without PC. SP handling is slightly trickier and changes depending on if we're in ARMv8, so do that in checkTargetMatchPredicate. Differential Revision: https://reviews.llvm.org/D30199 llvm-svn: 295732
* [X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.Simon Pilgrim2017-02-211-9/+9
| | | | | | This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723
* [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.Igor Breger2017-02-211-3/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718
* [ARM] GlobalISel: Lower calls to void() functionsDiana Picus2017-02-212-0/+39
| | | | | | | For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-215-1/+26
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* [X86] Fix formatting. NFCCraig Topper2017-02-211-1/+1
| | | | llvm-svn: 295695
* [AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load ↵Craig Topper2017-02-211-15/+18
| | | | | | in some patterns. llvm-svn: 295693
* [AVX-512] Fix the ExeDomain for vcmpss/vcmpsd.Craig Topper2017-02-211-0/+2
| | | | llvm-svn: 295691
* Add a wrapper around copy_if in STLExtras; NFCSanjoy Das2017-02-211-8/+5
| | | | | | I will add one more use for this in a later change. llvm-svn: 295685
* [AVX-512] Add a few more patterns for selecting masked vpternlog with ↵Craig Topper2017-02-201-0/+25
| | | | | | broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673
* [X86] Tidyup combineExtractVectorElt. NFCI.Simon Pilgrim2017-02-201-8/+9
| | | | | | | | Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670
* [ARM] GlobalISel: Don't select atomic loadsDiana Picus2017-02-201-0/+6
| | | | | | | | | | | | | | | There used to be a check in the IRTranslator that prevented us from having to deal with atomic loads/stores. That check has been removed in r294993 and the AArch64 backend was updated accordingly. This commit does the same thing for the ARM backend. In general, in the ARM backend we introduce fences during the atomic expand pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8 target, which handles atomics more like AArch64. Since we don't want to worry about that yet, just bail out of instruction selection if we find any atomic loads. llvm-svn: 295662
* [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.Igor Breger2017-02-205-49/+30
| | | | | | | | | | | | Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660
* [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLXSimon Pilgrim2017-02-202-1/+28
| | | | | | | | Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656
* AArch64AsmParser: tablegen the isBranchTarget helper functionsSjoerd Meijer2017-02-202-37/+18
| | | | | | | | | Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup that removes almost identical functions that differ only in a few constants. Differential Revision: https://reviews.llvm.org/D30160 llvm-svn: 295649
* [X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update ↵Ayman Musa2017-02-202-304/+306
| | | | | | | | | | | all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643
* [AVX-512] Add more patterns to fold masked VPTERNLOG with load when the ↵Craig Topper2017-02-201-0/+50
| | | | | | passthru isn't operand 0. llvm-svn: 295640
* [AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG ↵Craig Topper2017-02-201-2/+2
| | | | | | patterns. llvm-svn: 295638
* [AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads ↵Craig Topper2017-02-201-0/+39
| | | | | | that aren't in operand 2. llvm-svn: 295634
OpenPOWER on IntegriCloud