summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-1318-238/+617
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* [AVX-512] Remove the remaining masked shift by immediate or by single value. ↵Craig Topper2016-11-121-22/+0
| | | | | | | | Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725
* [AVX-512] Add unmasked version of shift by immediate and shift by single ↵Craig Topper2016-11-121-0/+22
| | | | | | | | | | | | | | | | | | | element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711
* [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQCraig Topper2016-11-121-28/+96
| | | | | | | | | | | | Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709
* AMDGPU/SI: Promote i16 = fp_[us]int f32 for VITom Stellard2016-11-121-0/+6
| | | | | | | | | | | | Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687
* AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopiesTom Stellard2016-11-111-24/+44
| | | | | | | | | | | | | | | | | | | | Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676
* [PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portionNemanja Ivanovic2016-11-112-5/+23
| | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638
* [AArch64] Update a FIXME comment to reflect current state. NFC.Chad Rosier2016-11-111-2/+4
| | | | llvm-svn: 286625
* [AArch64] Fix bugs in isel lowering replaceSplatVectorStore.Geoff Berry2016-11-111-11/+27
| | | | | | | | | | | | | | | | | Summary: Fix off-by-one indexing error in loop checking that inserted value was a splat vector. Add code to check that INSERT_VECTOR_ELT nodes constructing the splat vector have the expected constant index values. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26409 llvm-svn: 286616
* [AArch64] Remove lots of redundant code. NFC.Chad Rosier2016-11-111-30/+41
| | | | llvm-svn: 286606
* [AArch64] Early return and minor renaming/refactoring to ease code review. NFC.Chad Rosier2016-11-111-43/+43
| | | | llvm-svn: 286601
* [PowerPC] Add vector conversion builtins to altivec.h - LLVM portionNemanja Ivanovic2016-11-111-8/+16
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596
* [AArch64] Enable merging of adjacent zero stores for all subtargets.Chad Rosier2016-11-113-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592
* [AMDGPU] TargetStreamer: Fix .note section nameSam Kolton2016-11-111-2/+2
| | | | llvm-svn: 286591
* [SystemZ] Support CL(G)T instructionsUlrich Weigand2016-11-116-3/+58
| | | | | | | | This adds support for the compare logical and trap (memory) instructions that were added as part of the miscellaneous instruction extensions feature with zEC12. llvm-svn: 286587
* [SystemZ] Support load-and-zero-rightmost-byte facilityUlrich Weigand2016-11-116-3/+49
| | | | | | | | | | This adds support for the LZRF/LZRG/LLZRGF instructions that were added on z13, and uses them for code generation were appropriate. SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF over RISBG where both would be possible. llvm-svn: 286586
* [SystemZ] Use LLGT(R) instructionsUlrich Weigand2016-11-115-46/+50
| | | | | | | | | | | | | This adds support for the 31-to-64-bit zero extension instructions LLGT and LLGTR and uses them for code generation where appropriate. Since this operation can also be performed via RISBG, we have to update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT over RISBG in case both are possible. The patch includes some simplification to the tryRISBGZero code; this is not intended to cause any (further) functional change in codegen. llvm-svn: 286585
* [ARM] Add plumbing for GlobalISelDiana Picus2016-11-1113-4/+407
| | | | | | Add GlobalISel skeleton, up to the point where we can select a ret void. llvm-svn: 286573
* AMDGPU: Attempt to fix build failure on x86-64 selfhost buildYaxun Liu2016-11-111-2/+0
| | | | | | Remove redundant include file. llvm-svn: 286552
* Add a blank line for a test commit.Sean Fertile2016-11-111-0/+1
| | | | llvm-svn: 286550
* Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate ↵Stanislav Mekhanoshin2016-11-112-26/+5
| | | | | | | | condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530
* Fix requirements.Joerg Sonnenberger2016-11-101-1/+1
| | | | llvm-svn: 286527
* Timer: Remove group-less NamedRegionTimer constructor.Matthias Braun2016-11-101-2/+0
| | | | | | | | | | | | | | | The NamedRegionTimer initializer without a group name puts the Timer into the "Misc" group and is (nearly) unused. Remove it. The only user of this constructor appears to be the HexagonGenInsert pass, which creates a counter without group to count the complete execution time of that pass, however since every pass gets a counter by the PassManager anyway this should be unnecessary. Also removed the pointless TimerGroup there. Differential Revision: https://reviews.llvm.org/D25582 llvm-svn: 286524
* [DAG Combiner] Fix the native computation of the Newton series for reciprocalsEvandro Menezes2016-11-108-26/+31
| | | | | | | | | | | | The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
* AMDGPU: Emit runtime metadata as a note element in .note sectionYaxun Liu2016-11-106-348/+450
| | | | | | | | | | | | Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502
* [Target] Rename X86/ARM Assembly printer to reflect reality.Davide Italiano2016-11-102-2/+2
| | | | | | | This shows up a lot profiling LTO testcases with -time-passes, so better have a non confusing name. llvm-svn: 286488
* AMDGPU: Add VI i16 supportTom Stellard2016-11-1015-78/+409
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
* [ARM] Thumb2 LDR (literal) should accept PC as the destinationOliver Stannard2016-11-101-1/+1
| | | | | | | | | The version of this instruction with the .w suffix already correctly accepts this, but the alias without the .w did not. Differential Revision: https://reviews.llvm.org/D26499 llvm-svn: 286446
* [AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded ↵Craig Topper2016-11-102-8/+12
| | | | | | instruction when available. llvm-svn: 286435
* [AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ↵Craig Topper2016-11-102-54/+28
| | | | | | | | ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions. This allows these intrinsics to also use EVEX instructons. llvm-svn: 286434
* [X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics ↵Craig Topper2016-11-102-7/+5
| | | | | | table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available. llvm-svn: 286433
* [X86] Move some custom patterns into the currently empty pattern of their ↵Craig Topper2016-11-101-46/+37
| | | | | | corresponding instructions. NFC llvm-svn: 286432
* [X86] Remove some patterns still referencing int_x86_sse2_cvttpd2dq that ↵Craig Topper2016-11-101-9/+5
| | | | | | should have been removed in r286344. NFC llvm-svn: 286431
* Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-094-52/+35
| | | | | | | | | represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420
* GlobalISel: translate invoke and landingpad instructionsTim Northover2016-11-091-1/+1
| | | | | | | Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj etc), but it should get things going. llvm-svn: 286407
* Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-093-31/+52
| | | | | | | | | represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385
* X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable ↵Peter Collingbourne2016-11-093-52/+31
| | | | | | | | | | | | | | | immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384
* [Hexagon] Silence "sometimes uninitialized" warning in HexagonCopyToCombineKrzysztof Parzyszek2016-11-091-1/+3
| | | | llvm-svn: 286383
* [Hexagon] Separate Hexagon subreg indices for different register classesKrzysztof Parzyszek2016-11-0923-204/+255
| | | | | | | | | | | For pairs of 32-bit registers: isub_lo, isub_hi. For pairs of vector registers: vsub_lo, vsub_hi. Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg) that returns the appropriate subreg index for RegClass. llvm-svn: 286377
* [Hexagon] Eliminate Insert4 pseudo-instruction, use combines insteadKrzysztof Parzyszek2016-11-093-48/+2
| | | | llvm-svn: 286368
* [SystemZ] A few fixes in scheduler files.Jonas Paulsson2016-11-093-11/+11
| | | | | Review: U Weigand llvm-svn: 286362
* [MachineScheduler] Comments fixing.Jonas Paulsson2016-11-091-1/+1
| | | | | | | | The name/comment of the third argument to the ScheduleDAGMI constructor is RemoveKillFlags and not IsPostRA. Only the comments are changed. Review: A Trick llvm-svn: 286350
* [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32Craig Topper2016-11-095-8/+26
| | | | | | | | | | | | This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345
* [X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.Craig Topper2016-11-093-30/+34
| | | | | | | | | | | | Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26330 llvm-svn: 286344
* [AVX-512] Use alignedstore256 in patterns that look for stores of the lower ↵Craig Topper2016-11-091-10/+10
| | | | | | | | 256-bits of a 512-bit vector to use a 256-bit aligned store. Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947. llvm-svn: 286342
* [AVX-512] Make VBMI instruction set enabling imply that the BWI instruction ↵Craig Topper2016-11-091-2/+2
| | | | | | | | | | | | | | | set is also enabled. Summary: This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912. Reviewers: delena, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26322 llvm-svn: 286339
* AArch64DeadRegisterDefinitionsPass: Fix Changed flagMatthias Braun2016-11-081-1/+0
| | | | | | Fix a bug in the calculation of the changed flag introduced in r285488. llvm-svn: 286293
* [SystemZ] Add missing FP extension instructionsUlrich Weigand2016-11-084-18/+42
| | | | | | | | This completes assembler / disassembler support for all BFP instructions provided by the floating-point extensions facility. The instructions added here are not currently used for codegen. llvm-svn: 286285
* [SystemZ] Add program mask and addressing mode instructionsUlrich Weigand2016-11-085-11/+109
| | | | | | | | | Add several instructions that operate on the program mask or the addressing mode. These are not really needed for code generation under Linux, but are provided for completeness for the assembler/disassembler. llvm-svn: 286284
* [SystemZ] Model access registers as LLVM registersUlrich Weigand2016-11-0817-102/+126
| | | | | | | | | | | | | Add the 16 access registers as LLVM registers. This allows removing a lot of special cases in the assembler and disassembler where we were handling access registers; this can all just use the generic register code now. Also add a bunch of instructions to operate on access registers, for assembler/disassembler use only. No change in code generation intended. llvm-svn: 286283
OpenPOWER on IntegriCloud