summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [x86] add RUN for target before roundss; NFCSanjay Patel2018-03-271-30/+232
| | | | llvm-svn: 328601
* [x86] add tests for ftrunc; NFCSanjay Patel2018-03-262-12/+404
| | | | llvm-svn: 328592
* Fix newlines. NFCI.Simon Pilgrim2018-03-264-18305/+18305
| | | | llvm-svn: 328583
* [X86] Add WriteCRC32 scheduler classSimon Pilgrim2018-03-261-6/+6
| | | | | | | | Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582
* Use local symbols for creating .stack-size.Rafael Espindola2018-03-263-7/+14
| | | | llvm-svn: 328581
* [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32Reid Kleckner2018-03-268-2/+143
| | | | | | | | | | | | | | | | | | | | | | | Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570
* [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵Simon Pilgrim2018-03-264-18305/+18305
| | | | | | | | | | | | (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
* [Hexagon] Add more lit testsKrzysztof Parzyszek2018-03-2612-0/+1367
| | | | llvm-svn: 328561
* [Power9]Legalize and emit code for quad-precision convert from double-precisionLei Huang2018-03-261-1/+29
| | | | | | | | | Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558
* [PowerPC] Infrastructure work. Implement getting the opcode for a spill in ↵Stefan Pintilie2018-03-261-8/+8
| | | | | | | | | | | one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-262-16/+16
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* [Pipeliner] Add missing loop carried dependencesKrzysztof Parzyszek2018-03-261-0/+54
| | | | | | | | | | | | | | | | | | | | | | The pipeliner is not adding a dependence edge for a loop carried dependence, and ends up scheduling a load from iteration n prior to an aliased store in iteration n-1. The code that adds the loop carried dependences in the pipeliner doesn't check if the memory objects for loads and stores are "identified" (i.e., distinct) objects. If they are not, then the code that adds the dependences needs to be conservative. The objects can be used to check dependences only when they are distinct objects. The code that checks for loop carried dependences has been updated to classify loads and stores that are not identified as "unknown" values. A store with an "unknown" value can potentially create a loop carried dependence with any pending load. Patch by Brendon Cahoon. llvm-svn: 328547
* [Pipeliner] Use latency to compute RecMIIKrzysztof Parzyszek2018-03-262-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542
* [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costsSimon Pilgrim2018-03-261-8/+8
| | | | llvm-svn: 328541
* [Pipeliner] Fix assert caused by pipeliner serializationKrzysztof Parzyszek2018-03-261-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The pipeliner is asserting because the serialization step that occurs at the end is deleting an instruction. The assert occurs later on because there is a use without a definition. The problem occurs when an instruction defines a value used by a REQ_SEQUENCE and that value is used by a COPY instruction. The latencies between these instructions are zero, so they are put in to the same packet. The serialization code is unable to handle this correctly, and ends up putting the REG_SEQUENCE before its definition. There is special code in the serialization step that attempts to handle zero-cost instructions (phis, copy, reg_sequence) differently than regular instructions. Unfortunately, this means the order does not come out correct. This patch simplifies the code by changing the seperate steps for handling zero-cost and regular instructions. Only phis are handled separate now, since they should occurs first. Then, this patch adds checks to make use the MoveUse is set to the smallest value if there are multiple uses in a cycle. Patch by Brendon Cahoon. llvm-svn: 328540
* [Pipeliner] Fix check for order dependences when finalizing instructionsKrzysztof Parzyszek2018-03-261-0/+51
| | | | | | | | | | | | | | | | | The code in orderDepdences that looks at the order dependences between instructions was processing all the successor and predecessor order dependences. However, we really only want to check for an order dependence for instructions scheduled in the same cycle. Also, fixed how the pipeliner handles output dependences. An output dependence is also a potential loop carried dependence. The pipeliner didn't handle this case properly so an invalid schedule could be created that allowed an output dependence to be scheduled in the next iteration at the same cycle. Patch by Brendon Cahoon. llvm-svn: 328516
* [Pipeliner] Fix in the pipeliner phi reuse codeKrzysztof Parzyszek2018-03-261-0/+44
| | | | | | | | | | | When the definition of a phi is used by a phi in the next iteration, the pipeliner was assuming that the definition is processed first. Because of the assumption, an incorrect phi name was used. This patch has a check to see if the phi definition has been processed already. Patch by Brendon Cahoon. llvm-svn: 328510
* [Pipeliner] Correctly update memoperands in the epilogKrzysztof Parzyszek2018-03-261-0/+102
| | | | | | | | | | | | | | | | The pipeliner needs to be conservative when updating the memoperands of instructions in the epilog. Previously, the pipeliner was changing the offset of the memoperand based upon the scheduling stage. However, that is incorrect when control flow branches around the kernel code. The bug enabled a load and store to the same stack offset to be swapped. This patch fixes the bug by updating the size of the memoperands to be UINT_MAX. This conservative value means that dependences will be created between other loads and stores. Patch by Brendon Cahoon. llvm-svn: 328508
* [Hexagon] Give priority to post-incremementing memory accesses in LSRKrzysztof Parzyszek2018-03-264-43/+47
| | | | llvm-svn: 328506
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-262-32/+32
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costsSimon Pilgrim2018-03-261-12/+12
| | | | | | These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
* [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costsSimon Pilgrim2018-03-261-8/+8
| | | | | | The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
* [X86][Btver2] Double the AGU and schedule pipe resources for YMMSimon Pilgrim2018-03-261-15/+15
| | | | | | Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491
* Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead ↵Hans Wennborg2018-03-267-76/+2
| | | | | | | | | | | | | | | of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482
* [X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.Craig Topper2018-03-262-36/+36
| | | | llvm-svn: 328474
* [X86] Merge the SSE and AVX versions of fp divs and sqrts in the ↵Craig Topper2018-03-263-62/+62
| | | | | | | | SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473
* [X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss ↵Craig Topper2018-03-262-8/+8
| | | | | | instructions. llvm-svn: 328472
* [X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.Craig Topper2018-03-261-3/+4
| | | | | | They were backwards. llvm-svn: 328469
* [X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 ↵Craig Topper2018-03-251-2/+2
| | | | | | | | for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465
* [RISCV] Use init_array instead of ctors for RISCV target, by defaultMandeep Singh Grang2018-03-241-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: LLVM defaults to the newer .init_array/.fini_array scheme for static constructors rather than the less desirable .ctors/.dtors (the UseCtors flag defaults to false). This wasn't being respected in the RISC-V backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate flag for UseInitArray. This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call InitializeELF(TM.Options.UseInitArray). Reviewers: asb, apazos Reviewed By: asb Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits Differential Revision: https://reviews.llvm.org/D44750 llvm-svn: 328433
* [X86][AES] Ensure we're testing both non-VEX/VEX variants of AES ↵Simon Pilgrim2018-03-241-7/+321
| | | | | | | | instructions on AVX targets Add skylake server tests as well llvm-svn: 328424
* [X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE ↵Simon Pilgrim2018-03-246-112/+12529
| | | | | | | | instructions on AVX targets And ensure we don't use later instruction sets in SSE schedule tests llvm-svn: 328423
* [X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule testsSimon Pilgrim2018-03-241-9/+9
| | | | llvm-svn: 328421
* [X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule testsSimon Pilgrim2018-03-241-9/+13
| | | | llvm-svn: 328420
* [X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodesCraig Topper2018-03-241-18/+16
| | | | | | | | These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them. Differential Revision: https://reviews.llvm.org/D44375 llvm-svn: 328405
* [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32Reid Kleckner2018-03-237-2/+76
| | | | | | | Both GCC and MSVC only look at the low byte of a boolean when it is passed. llvm-svn: 328386
* [Hexagon] Boost profit for word-mask immediates, reduce for othersKrzysztof Parzyszek2018-03-232-0/+145
| | | | | | This avoids unnecessary splitting due to uninteresting immediates. llvm-svn: 328364
* [Hexagon] Fold offset in base+immediate loads/storesKrzysztof Parzyszek2018-03-231-0/+55
| | | | | | | | Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz => memw(Rx,#n) = Rz. Patch by Jyotsna Verma. llvm-svn: 328355
* [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPUTony Tye2018-03-232-7/+7
| | | | | | | | Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44697 llvm-svn: 328351
* [AMDGPU] Remove use of OpenCL triple environment and replace with function ↵Tony Tye2018-03-232-20/+137
| | | | | | | | | | | attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349
* [Hexagon] Always generate mux out of predicated transfers if possibleKrzysztof Parzyszek2018-03-236-3/+64
| | | | | | | | | | | | HexagonGenMux would collapse pairs of predicated transfers if it assumed that the predicated .new forms cannot be created. Turns out that generating mux is preferable in almost all cases. Introduce an option -hexagon-gen-mux-threshold that controls the minimum distance between the instruction defining the predicate and the later of the two transfers. If the distance is closer than the threshold, mux will not be generated. Set the threshold to 0 by default. llvm-svn: 328346
* [Hexagon] Avoid early if-conversion for one sided branchesKrzysztof Parzyszek2018-03-231-0/+80
| | | | | | Patch by Anand Kodnani. llvm-svn: 328344
* [ARM] Fix "Constant pool entry out of range!" in Thumb1 modeAna Pazos2018-03-231-0/+359
| | | | | | | | | | | | | | | | | | | This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode. In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode, adjustBBOffsetsAfter() is not calculating postOffset correctly by properly accounting for the padding that is required for the constant pool that immediately follows the jump table branch instruction. Reviewers: t.p.northover, eli.friedman Reviewed By: t.p.northover Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44709 llvm-svn: 328341
* [Hexagon] Two fixes in early if-conversionKrzysztof Parzyszek2018-03-231-0/+75
| | | | | | | | | - Fix checking for vector predicate registers. - Avoid speculating llvm.lifetime.end intrinsic. Patch by Harsha Jagasia and Brendon Cahoon. llvm-svn: 328339
* [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unitSimon Pilgrim2018-03-231-1/+1
| | | | | | Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338
* Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant storesZaara Syeda2018-03-233-4/+129
| | | | | | | | | | | | | | | | This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 328326
* [AArch64] Don't reduce the width of loads if it prevents combining a shiftJohn Brawn2018-03-232-2/+307
| | | | | | | | | | | | | | | Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321
* [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to ↵Simon Pilgrim2018-03-231-4/+4
| | | | | | | | correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318
* [ARM] Support float literals under XOChristof Douma2018-03-231-0/+118
| | | | | | | | | | | | | When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313
* [GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.Amara Emerson2018-03-231-3/+3
| | | | | | | This was being masked because GISel is enabled by default for -O0 and the abort was disabled. Modified test to explicitly enable abort. llvm-svn: 328311
OpenPOWER on IntegriCloud