summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.Andrea Di Biagio2018-03-281-0/+12
| | | | | | | | | | | | | The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694
* [ARM] Support float literals under XOChristof Douma2018-03-283-3/+6
| | | | | | | | | | Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691
* AMDGPU: Really implement getFrameRegisterMatt Arsenault2018-03-271-1/+2
| | | | | | | Currently this seems to only really be used for debug info. llvm-svn: 328677
* [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operandsJessica Paquette2018-03-271-11/+7
| | | | | | | | | If an ADRP appears with, say, a CPI operand, we shouldn't outline it. This moves the check for unsafe operands so that it occurs before the special-case for ADRPs. Also add a test for outlining ADRPs. llvm-svn: 328674
* [AMDGPU] For OS type AMDPAL, fixed scratch on compute shaderTim Renouf2018-03-271-2/+4
| | | | | | | | | | | | | | | | | | Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 328673
* Initialize variable added in r328617.Sterling Augustine2018-03-271-0/+1
| | | | llvm-svn: 328667
* [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classesSimon Pilgrim2018-03-2711-60/+57
| | | | | | | | Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664
* AMDGPU: Fix not preserving CSR VGPR if used for SGPR spillsMatt Arsenault2018-03-271-4/+3
| | | | | | | | Before this was not done if the function had no calls in it. This is still a possible issue with any callable function, regardless of calls present. llvm-svn: 328659
* AMDGPU: Set natural stack alignment in DataLayoutMatt Arsenault2018-03-271-2/+2
| | | | | | | Only 4 byte alignment is ever useful, so increasing anything beyond this may require realigning the stack. llvm-svn: 328656
* AMDGPU: Fix crash when MachinePointerInfo invalidMatt Arsenault2018-03-271-1/+1
| | | | | | | | The combine on a select of a load only triggers for addrspace 0, and discards the MachinePointerInfo. The conservative default needs to be used for this. llvm-svn: 328652
* AMDGPU: Fix FP restore from being reordered with stack opsMatt Arsenault2018-03-271-1/+6
| | | | | | | | | | | | | | | | | In a function, s5 is used as the frame base SGPR. If a function is calling another function, during the call sequence it is copied to a preserved SGPR and restored. Before it was possible for the scheduler to move stack operations before the restore of s5, since there's nothing to associate a frame index access with the restore. Add an implicit use of s5 to the adjcallstack pseudo which ends the call sequence to preven this from happening. I'm not 100% satisfied with this solution, but I'm not sure what else would be better. llvm-svn: 328650
* [Hexagon] Implement TTI::shouldMaximizeVectorBandwidthKrzysztof Parzyszek2018-03-271-0/+1
| | | | llvm-svn: 328648
* [Power9] Fix the resource list for the COPY instruction.Stefan Pintilie2018-03-271-1/+1
| | | | | | | The COPY instruction was listed as a 4 cycle instruction. It is now listed correctly as a 2 cycle ALU instruction. llvm-svn: 328647
* [Hexagon] Rudimentary support for auto-vectorization for HVXKrzysztof Parzyszek2018-03-272-12/+155
| | | | | | | | This implements a set of TTI functions that the loop vectorizer uses. The only purpose of this is to enable testing. Auto-vectorization is disabled by default, enabled by -hexagon-autohvx. llvm-svn: 328639
* [AArch64] Decorate AArch64 instrs with OPERAND_PCRELRafael Auler2018-03-272-1/+27
| | | | | | | | | | | | | | Summary: This is a canonical way to teach objdump to print the target symbols for branches when disassembling AArch64 code. Reviewers: evandro, t.p.northover, espindola Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D44851 llvm-svn: 328638
* [X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler classSimon Pilgrim2018-03-271-1/+1
| | | | llvm-svn: 328620
* [PowerPC] Secure PLT supportStrahinja Petrovic2018-03-275-26/+91
| | | | | | | | This patch supports secure PLT mode for PowerPC 32 architecture. Differential Revision: https://reviews.llvm.org/D42112 llvm-svn: 328617
* [MIPS] Add static_assert that all Fixups are handled in getFixupKindAlexander Richardson2018-03-271-2/+7
| | | | | | | | | | | | | | | | | | Summary: I recently added a new Fixup kind to our fork of LLVM but forgot to add it to the table in MipsAsmBackend.cpp. With this static_assert the error would have been caught instead of zero-initializing the array entries for the new fixups. Reviewers: sdardis, atanasyan Reviewed By: atanasyan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44895 llvm-svn: 328616
* [X86] Add WriteCRC32 scheduler classSimon Pilgrim2018-03-2610-24/+15
| | | | | | | | Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582
* [Hexagon] Assertion failure in HexagonSubtarget.cppKrzysztof Parzyszek2018-03-261-7/+7
| | | | | | | | In restoreLatency, replace range-for loop with std::find. Patch by Jyotsna Verma. llvm-svn: 328574
* [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costsSimon Pilgrim2018-03-261-0/+10
| | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
* [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32Reid Kleckner2018-03-262-6/+16
| | | | | | | | | | | | | | | | | | | | | | | Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570
* [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵Simon Pilgrim2018-03-2611-125/+93
| | | | | | | | | | | | (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
* [XCore] Change std::sort to llvm::sort in response to r327219Mandeep Singh Grang2018-03-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: dblaikie, RKSimon, robertlytton Reviewed By: robertlytton Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44875 llvm-svn: 328564
* [Power9]Legalize and emit code for quad-precision convert from double-precisionLei Huang2018-03-262-5/+13
| | | | | | | | | Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558
* [PowerPC] Infrastructure work. Implement getting the opcode for a spill in ↵Stefan Pintilie2018-03-268-509/+621
| | | | | | | | | | | one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556
* [AMDGPU] Improve disassembler error handlingTim Corringham2018-03-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-261-4/+17
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* Remove an unneeded (& mislayered) include from ↵David Blaikie2018-03-261-1/+0
| | | | | | Target/TargetLoweringObjectFile on a CodeGen header llvm-svn: 328549
* Remove unneeded (& mislayered) include from TargetMachine.cpp on a CodeGen ↵David Blaikie2018-03-261-1/+0
| | | | | | header llvm-svn: 328548
* [Pipeliner] Use latency to compute RecMIIKrzysztof Parzyszek2018-03-262-15/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542
* [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costsSimon Pilgrim2018-03-261-0/+14
| | | | llvm-svn: 328541
* [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵Simon Pilgrim2018-03-261-20/+11
| | | | | | JALU0 for GPR PRF write) llvm-svn: 328536
* [Hexagon] Give priority to post-incremementing memory accesses in LSRKrzysztof Parzyszek2018-03-262-1/+8
| | | | llvm-svn: 328506
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-261-0/+12
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costsSimon Pilgrim2018-03-261-4/+8
| | | | | | These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
* [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costsSimon Pilgrim2018-03-261-0/+16
| | | | | | The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
* AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classesNicolai Haehnle2018-03-266-67/+56
| | | | | | | Differential revision: https://reviews.llvm.org/D44820 Change-Id: I732979e2964006aa15d78a333d8886e6855f319a llvm-svn: 328496
* [X86][Btver2] Double the AGU and schedule pipe resources for YMMSimon Pilgrim2018-03-261-31/+31
| | | | | | Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491
* Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead ↵Hans Wennborg2018-03-262-15/+5
| | | | | | | | | | | | | | | of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482
* [ARM] Simplify constructing the ARMArchFeature string. NFC.Martin Storsjo2018-03-261-12/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D44819 llvm-svn: 328478
* [X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.Craig Topper2018-03-261-2/+2
| | | | llvm-svn: 328474
* [X86] Merge the SSE and AVX versions of fp divs and sqrts in the ↵Craig Topper2018-03-265-141/+81
| | | | | | | | SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473
* [X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss ↵Craig Topper2018-03-261-2/+2
| | | | | | instructions. llvm-svn: 328472
* [X86] Correct the itineraries for the dot production instructions.Craig Topper2018-03-261-2/+2
| | | | llvm-svn: 328471
* [X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the ↵Craig Topper2018-03-261-8/+10
| | | | | | generated scheduler classes will merge. llvm-svn: 328470
* [X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.Craig Topper2018-03-261-2/+2
| | | | | | They were backwards. llvm-svn: 328469
* [X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse ↵Craig Topper2018-03-261-11/+6
| | | | | | the same generated scheduler class. llvm-svn: 328468
* [X86] Give vpmsadbw the same itinerary as the SSE version so they'll be able ↵Craig Topper2018-03-251-7/+2
| | | | | | to share the same generated scheduler class. llvm-svn: 328466
* [X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 ↵Craig Topper2018-03-252-11/+11
| | | | | | | | for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465
OpenPOWER on IntegriCloud