summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header ↵David Blaikie2018-03-282-2/+2
| | | | | | | | dependency Thanks to echristo for the pointers on direction. llvm-svn: 328737
* [X86][SkylakeServer] Remove checks for 'k', 'z', '_Int' and 'b' from ↵Craig Topper2018-03-281-2116/+2116
| | | | | | | | | | scheduler regexs. Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end. I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future. llvm-svn: 328730
* [Hexagon] Add support for "new" circular buffer intrinsicsKrzysztof Parzyszek2018-03-285-91/+280
| | | | | | | | | | | | | | | | | | | These instructions have been around for a long time, but we haven't supported intrinsics for them. The "new" versions use the CSx register for the start of the buffer instead of the K field in the Mx register. We need to use pseudo instructions for these instructions until after register allocation. The problem is that these instructions allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for the CSx set-up until after register allocation when the Mx register has been fixed for the instruction. There is a related clang patch. Patch by Brendon Cahoon. llvm-svn: 328724
* [MachineOutliner] Simplify call outlining + require valid callee save info ↵Jessica Paquette2018-03-281-31/+18
| | | | | | | | | | for call outlining This commit simplifies the call outlining logic by removing references to the Function associated with the callee. To do this, it requires that valid callee save info is available to the outliner. llvm-svn: 328719
* Transforms: Introduce Transforms/Utils.h rather than spreading the ↵David Blaikie2018-03-286-0/+6
| | | | | | | | | declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717
* [AMDGPU][MC] Added ds_add_src2_f32Dmitry Preobrazhensky2018-03-281-0/+3
| | | | | | | | | See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833 Differential Revision: https://reviews.llvm.org/D44779 Reviewers: arsenm, artem.tamazov, timcorringham llvm-svn: 328713
* [AMDGPU][MC] Added PCK variants of image load/store instructionsDmitry Preobrazhensky2018-03-281-26/+40
| | | | | | | | | See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834 Differential Revision: https://reviews.llvm.org/D44795 Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle llvm-svn: 328710
* [AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_xDmitry Preobrazhensky2018-03-281-0/+10
| | | | | | | | | See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835 Differential Revision: https://reviews.llvm.org/D44825 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328707
* [AMDGPU][MC][GFX9] Added s_scratch* instructionsDmitry Preobrazhensky2018-03-281-0/+15
| | | | | | | | | See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836 Differential Revision: https://reviews.llvm.org/D44832 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 328704
* [X86][Btver2] Moved JWriteFCmp/JWriteFCmpY classes next to each other. NFCISimon Pilgrim2018-03-281-14/+14
| | | | | | Renamed JWriteFPAY22 to JWriteFCmpY - we've tended to avoid latency based names llvm-svn: 328701
* [X86][BtVer2] Fix the number of micro opcodes for AES[ENC|DEC] and other YMM ↵Andrea Di Biagio2018-03-281-1/+4
| | | | | | | | | | | instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698
* Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"Tim Renouf2018-03-281-4/+2
| | | | | | | | | | | This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981. It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a release-with-asserts build. I will resubmit the change when I have fixed that. Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a llvm-svn: 328695
* [X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.Andrea Di Biagio2018-03-281-0/+12
| | | | | | | | | | | | | The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694
* [ARM] Support float literals under XOChristof Douma2018-03-283-3/+6
| | | | | | | | | | Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691
* AMDGPU: Really implement getFrameRegisterMatt Arsenault2018-03-271-1/+2
| | | | | | | Currently this seems to only really be used for debug info. llvm-svn: 328677
* [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operandsJessica Paquette2018-03-271-11/+7
| | | | | | | | | If an ADRP appears with, say, a CPI operand, we shouldn't outline it. This moves the check for unsafe operands so that it occurs before the special-case for ADRPs. Also add a test for outlining ADRPs. llvm-svn: 328674
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-03-271-2/+4
| | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673
* Initialize variable added in r328617.Sterling Augustine2018-03-271-0/+1
| | | | llvm-svn: 328667
* [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classesSimon Pilgrim2018-03-2711-60/+57
| | | | | | | | Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664
* AMDGPU: Fix not preserving CSR VGPR if used for SGPR spillsMatt Arsenault2018-03-271-4/+3
| | | | | | | | Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659
* AMDGPU: Set natural stack alignment in DataLayoutMatt Arsenault2018-03-271-2/+2
| | | | | | | Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656
* AMDGPU: Fix crash when MachinePointerInfo invalidMatt Arsenault2018-03-271-1/+1
| | | | | | | | The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652
* AMDGPU: Fix FP restore from being reordered with stack opsMatt Arsenault2018-03-271-1/+6
| | | | | | | | | | | | | | | | | In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650
* [Hexagon] Implement TTI::shouldMaximizeVectorBandwidthKrzysztof Parzyszek2018-03-271-0/+1
| | | | llvm-svn: 328648
* [Power9] Fix the resource list for the COPY instruction.Stefan Pintilie2018-03-271-1/+1
| | | | | | | The COPY instruction was listed as a 4 cycle instruction. It is now listed correctly as a 2 cycle ALU instruction. llvm-svn: 328647
* [Hexagon] Rudimentary support for auto-vectorization for HVXKrzysztof Parzyszek2018-03-272-12/+155
| | | | | | | | This implements a set of TTI functions that the loop vectorizer uses. The only purpose of this is to enable testing. Auto-vectorization is disabled by default, enabled by -hexagon-autohvx. llvm-svn: 328639
* [AArch64] Decorate AArch64 instrs with OPERAND_PCRELRafael Auler2018-03-272-1/+27
| | | | | | | | | | | | | | Summary: This is a canonical way to teach objdump to print the target symbols for branches when disassembling AArch64 code. Reviewers: evandro, t.p.northover, espindola Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D44851 llvm-svn: 328638
* [X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler classSimon Pilgrim2018-03-271-1/+1
| | | | llvm-svn: 328620
* [PowerPC] Secure PLT supportStrahinja Petrovic2018-03-275-26/+91
| | | | | | | | This patch supports secure PLT mode for PowerPC 32 architecture. Differential Revision: https://reviews.llvm.org/D42112 llvm-svn: 328617
* [MIPS] Add static_assert that all Fixups are handled in getFixupKindAlexander Richardson2018-03-271-2/+7
| | | | | | | | | | | | | | | | | | Summary: I recently added a new Fixup kind to our fork of LLVM but forgot to add it to the table in MipsAsmBackend.cpp. With this static_assert the error would have been caught instead of zero-initializing the array entries for the new fixups. Reviewers: sdardis, atanasyan Reviewed By: atanasyan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44895 llvm-svn: 328616
* [X86] Add WriteCRC32 scheduler classSimon Pilgrim2018-03-2610-24/+15
| | | | | | | | Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582
* [Hexagon] Assertion failure in HexagonSubtarget.cppKrzysztof Parzyszek2018-03-261-7/+7
| | | | | | | | In restoreLatency, replace range-for loop with std::find. Patch by Jyotsna Verma. llvm-svn: 328574
* [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costsSimon Pilgrim2018-03-261-0/+10
| | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
* [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32Reid Kleckner2018-03-262-6/+16
| | | | | | | | | | | | | | | | | | | | | | | Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570
* [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵Simon Pilgrim2018-03-2611-125/+93
| | | | | | | | | | | | (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
* [XCore] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-03-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: dblaikie, RKSimon, robertlytton Reviewed By: robertlytton Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44875 llvm-svn: 328564
* [Power9]Legalize and emit code for quad-precision convert from double-precisionLei Huang2018-03-262-5/+13
| | | | | | | | | Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558
* [PowerPC] Infrastructure work. Implement getting the opcode for a spill in ↵Stefan Pintilie2018-03-268-509/+621
| | | | | | | | | | | one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556
* [AMDGPU] Improve disassembler error handlingTim Corringham2018-03-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-261-4/+17
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* Remove an unneeded (& mislayered) include from ↵David Blaikie2018-03-261-1/+0
| | | | | | Target/TargetLoweringObjectFile on a CodeGen header llvm-svn: 328549
* Remove unneeded (& mislayered) include from TargetMachine.cpp on a CodeGen ↵David Blaikie2018-03-261-1/+0
| | | | | | header llvm-svn: 328548
* [Pipeliner] Use latency to compute RecMIIKrzysztof Parzyszek2018-03-262-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542
* [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costsSimon Pilgrim2018-03-261-0/+14
| | | | llvm-svn: 328541
* [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵Simon Pilgrim2018-03-261-20/+11
| | | | | | JALU0 for GPR PRF write) llvm-svn: 328536
* [Hexagon] Give priority to post-incremementing memory accesses in LSRKrzysztof Parzyszek2018-03-262-1/+8
| | | | llvm-svn: 328506
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-261-0/+12
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costsSimon Pilgrim2018-03-261-4/+8
| | | | | | These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
* [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costsSimon Pilgrim2018-03-261-0/+16
| | | | | | The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
* AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classesNicolai Haehnle2018-03-266-67/+56
| | | | | | | Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496
OpenPOWER on IntegriCloud