summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Avoid IMPLICIT_DEFs as new-value producersKrzysztof Parzyszek2017-02-231-0/+2
| | | | llvm-svn: 295997
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-232-6/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSEKrzysztof Parzyszek2017-02-233-23/+16
| | | | llvm-svn: 295981
* [ARM] GlobalISel: Lower call returnsDiana Picus2017-02-231-11/+52
| | | | | | | | Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973
* [ARM] GlobalISel: Lower call parameters in regsDiana Picus2017-02-231-15/+39
| | | | | | | | Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971
* [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register ↵Ayman Musa2017-02-232-6/+7
| | | | | | | | class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970
* [mips][ias] Further relax operands of certain assembly instructionsSimon Dardis2017-02-233-80/+84
| | | | | | | | | | | | | | | | This patch adjusts the most relaxed predicate of immediate operands to accept immediate forms such as ~(0xf0000000|0x000f00000). Previously these forms would be accepted by GAS and rejected by IAS. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: slthakur, seanbruno Differential Revision: https://reviews.llvm.org/D29218 llvm-svn: 295965
* Fix assertion failure in ARMConstantIslandPass.Kristof Beyls2017-02-231-0/+1
| | | | | | | | | | The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964
* [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as ↵Ayman Musa2017-02-231-4/+0
| | | | | | | | they introduce new read dependency when folding. (Quick fix to buildbot fail). llvm-svn: 295946
* [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency ↵Ayman Musa2017-02-233-26/+74
| | | | | | | | | | between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940
* LoadStoreVectorizer: Split even sized illegal chains properlyMatt Arsenault2017-02-232-0/+36
| | | | | | | | | | | | | | | | | | | | Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-233-39/+52
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-223-7/+11
| | | | llvm-svn: 295908
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-226-6/+89
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-13/+17
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* AMDGPU: Add replacement bfe intrinsicsMatt Arsenault2017-02-221-0/+6
| | | | llvm-svn: 295899
* [Hexagon] Implement @llvm.readcyclecounter()Krzysztof Parzyszek2017-02-226-9/+34
| | | | llvm-svn: 295892
* AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPRMatt Arsenault2017-02-221-36/+55
| | | | | | | | | This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891
* [RDF] Support for partial structural aliases in RegisterAggrKrzysztof Parzyszek2017-02-222-61/+67
| | | | llvm-svn: 295883
* [Hexagon] Add intrinsics for masked vector storesKrzysztof Parzyszek2017-02-221-0/+19
| | | | | | Patch by Harsha Jagasia. llvm-svn: 295879
* AMDGPU: Don't look at chain users when adjusting writemaskMatt Arsenault2017-02-221-0/+4
| | | | | | Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878
* AMDGPU: Always allocate emergency stack slot at offset 0Matt Arsenault2017-02-221-5/+19
| | | | | | | | | This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877
* AMDGPU: Change exp with compr bit printingMatt Arsenault2017-02-221-3/+11
| | | | llvm-svn: 295873
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-224-16/+12
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-224-12/+16
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* [AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.Geoff Berry2017-02-221-43/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863
* [WebAssembly] Define a table of function signatures for runtime library calls.Dan Gohman2017-02-223-0/+1345
| | | | | | | | | | LLVM CodeGen emits references to external symbols that are never declared in LLVM IR level, so they have no declared signature. However, WebAssembly requires all functions be declared with signatures. This patch adds a table for providing signatures for known runtime libcalls that will be used in subsequent patches to emit declarations for such functions. llvm-svn: 295857
* [RDF] Skip undef uses when calculating kill flagsKrzysztof Parzyszek2017-02-221-1/+1
| | | | llvm-svn: 295856
* [RDF] Only access block live-ins when tracking livenessKrzysztof Parzyszek2017-02-221-2/+4
| | | | llvm-svn: 295855
* [WebAssembly] Configure codegen to legalize f16 values.Dan Gohman2017-02-221-0/+5
| | | | llvm-svn: 295850
* [X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly ↵Simon Pilgrim2017-02-221-18/+15
| | | | | | | | into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848
* [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() ↵Simon Pilgrim2017-02-222-8/+10
| | | | | | separately. NFCI. llvm-svn: 295845
* Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' ↵Simon Pilgrim2017-02-221-4/+0
| | | | | | checking llvm-svn: 295830
* [GlobalISel] Fix compiler warnings and make assert assert something.Benjamin Kramer2017-02-223-11/+7
| | | | llvm-svn: 295827
* [X86][GlobalISel] Initial implementation , select G_ADD gpr, gprIgor Breger2017-02-226-6/+194
| | | | | | | | | | | | | | Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824
* [ARM] Fix constant islands pass.Roger Ferrer Ibanez2017-02-221-0/+7
| | | | | | | | | | | | The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816
* [X86] Fix memory operands definition for some instructions.Ayman Musa2017-02-221-10/+14
| | | | | | | | Change integer memory operands to FP memory operands to some FP instructions. Differential Revision: https://reviews.llvm.org/D30201 llvm-svn: 295813
* [ARM] Classification Improvements to ARM Sched-Models. NFCI.Javed Absar2017-02-225-69/+111
| | | | | | | | | | | | | | This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811
* [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵Craig Topper2017-02-227-45/+85
| | | | | | | | | | | | when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
* [WebAssembly] Add skeleton MC support for the Wasm container formatDan Gohman2017-02-2213-14/+253
| | | | | | | | | This just adds the basic skeleton for supporting a new object file format. All of the actual encoding will be implemented in followup patches. Differential Revision: https://reviews.llvm.org/D26722 llvm-svn: 295803
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-227-5/+56
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Remove llvm.AMDGPU.clamp intrinsicMatt Arsenault2017-02-212-13/+0
| | | | llvm-svn: 295789
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-2112-29/+163
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* [NVPTX] Unify vectorization of load/stores of aggregate arguments and return ↵Artem Belevich2017-02-211-710/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784
* AMDGPU: Formatting fixesMatt Arsenault2017-02-211-4/+5
| | | | llvm-svn: 295783
* [AArch64, X86] Add statistics for the MacroFusion passEvandro Menezes2017-02-212-0/+8
| | | | llvm-svn: 295777
* [AArch64, X86] Guard against both instrs being wild cardsEvandro Menezes2017-02-212-10/+12
| | | | | | If both instrs are wild cards, the result can be a crash. llvm-svn: 295776
* Fix PR31896.Evgeniy Stepanov2017-02-211-5/+8
| | | | | | Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762
* AMDGPU: Remove llvm.AMDGPU.flbit intrinsicMatt Arsenault2017-02-212-4/+0
| | | | llvm-svn: 295754
* AMDGPU: Don't use stack space for SGPR->VGPR spillsMatt Arsenault2017-02-218-90/+225
| | | | | | | | | | | | | | | | Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753
OpenPOWER on IntegriCloud