summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Scalarize vector argument types to callsMatt Arsenault2018-07-311-31/+15
| | | | | | | | | | | | | | | | | When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416
* [X86] WriteBSWAP sched classes are reg-reg only.Simon Pilgrim2018-07-3110-20/+20
| | | | | | | Don't declare them as X86SchedWritePair when the folded class will never be used. Note: MOVBE (load/store endian conversion) instructions tend to have a very different behaviour to BSWAP. llvm-svn: 338412
* [X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151)Simon Pilgrim2018-07-311-0/+18
| | | | | | | | | | As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering. Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW. Differential Revision: https://reviews.llvm.org/D49562 llvm-svn: 338407
* [X86] Add pattern matching for PMADDUBSWCraig Topper2018-07-311-0/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned. A C example that triggers this pattern ``` static const int N = 128; int8_t A[2*N]; uint8_t B[2*N]; int16_t C[N]; void foo() { for (int i = 0; i != N; ++i) C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767); } ``` Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49829 llvm-svn: 338402
* [X86] Preserve more liveness information in emitStackProbeInlineFrancis Visoiu Mistrih2018-07-311-18/+37
| | | | | | | | | | | | | | | | This commit fixes two issues with the liveness information after the call: 1) The code always spills RCX and RDX if InProlog == true, which results in an use of undefined phys reg. 2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to the basic blocks that use them, therefore they are seen undefined. https://llvm.org/PR38376 Differential Revision: https://reviews.llvm.org/D50020 llvm-svn: 338400
* Revert Enrich inline messagesDavid Bolvansky2018-07-311-11/+6
| | | | llvm-svn: 338389
* Enrich inline messagesDavid Bolvansky2018-07-311-6/+11
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch improves Inliner to provide causes/reasons for negative inline decisions. 1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message. 2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision. 3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost. 4. Adjusted tests for changed printing. Patch by: yrouban (Yevgeny Rouban) Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00 Reviewed By: tejohnson, xbolva00 Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith Differential Revision: https://reviews.llvm.org/D49412 llvm-svn: 338387
* AMDGPU: Don't handle FP16_TO_FP in isCanonicalizedMatt Arsenault2018-07-311-4/+0
| | | | | | | This needs more special handling to do correctly. Fixes test in subsequent commit. llvm-svn: 338381
* AMDGPU: Fold undef fcanonicalize to qNaNMatt Arsenault2018-07-311-2/+10
| | | | | | | | | | We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376
* [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.Andrea Di Biagio2018-07-311-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch teaches llvm-mca how to identify dependency breaking instructions on btver2. An example of dependency breaking instructions is the zero-idiom XOR (example: `XOR %eax, %eax`), which always generates zero regardless of the actual value of the input register operands. Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the computation is not dependent on the inputs. Not all dependency breaking idioms are also zero-latency instructions. For example, `CMPEQ %xmm1, %xmm1` is independent on the value of XMM1, and it generates a vector of all-ones. That instruction is not eliminated at register renaming stage, and its opcode is issued to a pipeline for execution. So, the latency is not zero. This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo. The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior for specific CPUs, and return a value which better matches the subtarget behavior. In future, we should teach to Tablegen how to automatically generate the body of isDependencyBreaking from scheduling predicate definitions. This would allow us to expose the knowledge about dependency breaking instructions to the machine schedulers (and, potentially, other codegen passes). Differential Revision: https://reviews.llvm.org/D49310 llvm-svn: 338372
* Revert r338365: [X86] Improved sched models for X86 BT*rr instructions.Simon Pilgrim2018-07-3111-18/+48
| | | | | | | | https://reviews.llvm.org/D49243 Contains WIP code that should not have been included. llvm-svn: 338369
* [SystemZ] Improve decoding in case of instructions with four register operands.Jonas Paulsson2018-07-313-11/+46
| | | | | | | | | | | | | | Since z13, the max group size will be 2 if any μop has more than 3 register sources. This has been ignored sofar in the SystemZHazardRecognizer, but is now handled by recognizing those instructions and adjusting the tracking of decoding and the cost heuristic for grouping. Review: Ulrich Weigand https://reviews.llvm.org/D49847 llvm-svn: 338368
* [X86] Improved sched models for X86 BT*rr instructions.Andrew V. Tischenko2018-07-3111-48/+18
| | | | | | https://reviews.llvm.org/D49243 llvm-svn: 338365
* [X86] Improved sched models for X86 SHLD/SHRD* instructions.Andrew V. Tischenko2018-07-3111-196/+70
| | | | | | Differential Revision: https://reviews.llvm.org/D9611 llvm-svn: 338359
* [X86][SSE] isFNEG - Use getTargetConstantBitsFromNode to handle all constant ↵Simon Pilgrim2018-07-311-31/+7
| | | | | | | | | | cases isFNEG was duplicating much of what was done by getTargetConstantBitsFromNode in its own calls to getTargetConstantFromNode. Noticed while reviewing D48467. llvm-svn: 338358
* [ARM] Allow automatically deducing the thumb instruction size for .instMartin Storsjo2018-07-311-3/+14
| | | | | | | | This matches GAS, that allows unsuffixed .inst for thumb. Differential Revision: https://reviews.llvm.org/D49937 llvm-svn: 338357
* [ARM] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+43
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .short/.long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49936 llvm-svn: 338356
* [AArch64] Support the .inst directive for MachO and COFF targetsMartin Storsjo2018-07-312-7/+18
| | | | | | | | | | Contrary to ELF, we don't add any markers that distinguish data generated with .long from normal instructions, so the .inst directive only adds compatibility with assembly that uses it. Differential Revision: https://reviews.llvm.org/D49935 llvm-svn: 338355
* [ARM] Revert r337821Sam Parker2018-07-311-1/+1
| | | | | | | Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354
* [X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont.Craig Topper2018-07-311-1/+1
| | | | | | In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path. llvm-svn: 338342
* [AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.Amara Emerson2018-07-311-31/+65
| | | | | | | | | | | Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337
* [AArch64][GlobalISel] Make G_BLOCK_ADDR legal.Amara Emerson2018-07-311-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336
* [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to ↵Craig Topper2018-07-304-5/+5
| | | | | | | | BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329
* [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to ↵Craig Topper2018-07-304-15/+10
| | | | | | BuildSDIVPow2. llvm-svn: 338303
* Remove trailing spaceFangrui Song2018-07-3082-195/+195
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [MachineOutliner][AArch64] Add support for saving LR to a registerJessica Paquette2018-07-302-84/+163
| | | | | | | | | | | | | | | | | | | | | | This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278
* [X86] Fix typo in comment. NFCCraig Topper2018-07-301-1/+1
| | | | llvm-svn: 338274
* Recommit r338204 "[X86] Correct the immediate cost for 'add/sub i64 %x, ↵Craig Topper2018-07-301-1/+7
| | | | | | | | 0x80000000'." This checks in a more direct way without triggering a UBSAN error. llvm-svn: 338273
* Fix uninitialized read in ARM's PrintAsmOperandThomas Preud'homme2018-07-301-2/+3
| | | | | | | | | | | | | | | | | Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268
* [AArch64][SVE] Asm: Enable instructions to be prefixed.Sander de Smalen2018-07-302-48/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables instructions that are destructive on their destination- and first source operand, to be prefixed with a MOVPRFX instruction. This patch also adds a variety of tests: - positive tests for all instructions and forms that accept a movprfx for either or both predicated and unpredicated forms. - negative tests for all instructions and forms that do not accept an unpredicated or predicated movprfx. - negative tests for the diagnostics that get emitted when a MOVPRFX instruction is used incorrectly. This is patch [2/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49593 llvm-svn: 338261
* [AArch64][SVE] Asm: Add MOVPRFX instructions.Sander de Smalen2018-07-306-30/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds predicated and unpredicated MOVPRFX instructions, which can be prepended to SVE instructions that are destructive on their first source operand, to make them a constructive operation, e.g. add z1.s, p0/m, z1.s, z2.s <=> z1 = z1 + z2 can be made constructive: movprfx z0, z1 add z0.s, p0/m, z0.s, z2.s <=> z0 = z1 + z2 The predicated MOVPRFX instruction can additionally be used to zero inactive elements, e.g. movprfx z0.s, p0/z, z1.s add z0.s, p0/m, z0.s, z2.s Not all instructions can be prefixed with the MOVPRFX instruction which is why this patch also adds a mechanism to validate prefixed instructions. The exact rules when a MOVPRFX applies is detailed in the SVE supplement of the Architectural Reference Manual. This is patch [1/2] in a series to add MOVPRFX instructions: - Patch [1/2]: https://reviews.llvm.org/D49592 - Patch [2/2]: https://reviews.llvm.org/D49593 Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D49592 llvm-svn: 338258
* [Hexagon] Simplify A4_rcmp[n]eqi R, 0Krzysztof Parzyszek2018-07-303-3/+157
| | | | | | | Consider cases when register R is known to be zero/non-zero, or when it is defined by a C2_muxii instruction. llvm-svn: 338251
* AMDGPU: Reduce code size with fcanonicalize (fneg x)Matt Arsenault2018-07-302-0/+11
| | | | | | | | When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244
* AMDGPU: Make fneg combine handle fcanonicalizeMatt Arsenault2018-07-301-0/+2
| | | | llvm-svn: 338243
* [MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall constructionFrancis Visoiu Mistrih2018-07-301-1/+1
| | | | | | | | | | | | | | | | | | The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237
* Revert "[X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'."Dean Michael Berris2018-07-301-7/+1
| | | | | | This reverts commit r338204. llvm-svn: 338236
* AMDGPU: Force skip over s_sendmsg and exp instructionsNicolai Haehnle2018-07-303-20/+35
| | | | | | | | | | | | | | | | | | | | | Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235
* [ARM] Fix over-alignment in arguments that are HA of 128-bit vectorsPetr Pavlu2018-07-301-5/+6
| | | | | | | | | | | | | | | | | | | | Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233
* [AVR] Re-enable expansion of ADDE/ADDC/SUBE/SUBC in ISelDylan McKay2018-07-291-0/+7
| | | | | | | | | This was disabled in r333748, which broke four tests. In the future, these need to be updated to UADDO/ADDCARRY or USUBO/SUBCARRY. llvm-svn: 338212
* [AArch64][SVE] Asm: Support for WHILE(LE|LO|LS|LT) instructions.Sander de Smalen2018-07-292-0/+45
| | | | | | | | | | | | | | | | | | | | The WHILE instructions generate a predicate that is true while the comparison of the first scalar operand (incremented for each predicate element) with the second scalar operand is true and false thereafter. WHILELE While incrementing signed scalar less than or equal to scalar WHILELO While incrementing unsigned scalar lower than scalar WHILELS While incrementing unsigned scalar lower than or same as scalar WHILELT While incrementing signed scalar less than scalar e.g. whilele p0.s, x0, x1 generates predicate p0 (for 32bit elements) by incrementing (signed) x0 and comparing that vector to splat(x1). llvm-svn: 338211
* [AArch64][SVE] Asm: Instructions to perform serialized operations.Sander de Smalen2018-07-292-0/+63
| | | | | | | | | | | | The instructions added in this patch permit active elements within a vector to be processed sequentially without unpacking the vector. PFIRST Set the first active element to true. PNEXT Find next active element in predicate. CTERMEQ Compare and terminate loop when equal. CTERMNE Compare and terminate loop when not equal. llvm-svn: 338210
* [X86] Correct the immediate cost for 'add/sub i64 %x, 0x80000000'.Craig Topper2018-07-281-1/+7
| | | | | | X86 normally requires immediates to be a signed 32-bit value which would exclude i64 0x80000000. But for add/sub we can negate the constant and use the opposite instruction. llvm-svn: 338204
* [X86] Use alignTo and divideCeil to make some code more readable. NFCCraig Topper2018-07-281-3/+3
| | | | llvm-svn: 338203
* [AArch64][SVE] Asm: Support for PFALSE and PTEST instructions.Sander de Smalen2018-07-282-0/+45
| | | | | | | | This patch adds PFALSE (unconditionally sets all elements of the predicate to false) and PTEST (set the status flags for the predicate). llvm-svn: 338198
* AMDGPU: Stop wasting argument registers with v3i32/v3f32Matt Arsenault2018-07-282-0/+59
| | | | | | | | | | SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
* [AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions.Sander de Smalen2018-07-282-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for instructions that partition a predicate based on data-dependent termination conditions in a loop. BRKA Break after the first true condition BRKAS Break after the first true condition, setting condition flags BRKB Break before the first true condition BRKBS Break before the first true condition, setting condition flags BRKPA Break after the first true condition, propagating from the previous partition BRKPAS Break after the first true condition, propagating from the previous partition, setting condition flags BRKPB Break before the first true condition, propagating from the previous partition BRKPBS Break before the first true condition, propagating from the previous partition, setting condition flags BRKN Propagate break to next partition BKRNS Propagate break to next partition, setting condition flags llvm-svn: 338196
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-2813-17/+27
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* AMDGPU: Stop trying to extend arguments for cloverMatt Arsenault2018-07-282-31/+1
| | | | | | | This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193
* Revert "[WebAssembly] Added default stack-only instruction mode for MC."Wouter van Oortmerssen2018-07-275-475/+252
| | | | | | | This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f. (SVN revision 338164) llvm-svn: 338176
* [X86] Add support expanding multiplies by constant where the constant is ↵Craig Topper2018-07-271-18/+31
| | | | | | | | -3/-5/-9 multplied by a power of 2. These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do. llvm-svn: 338174
OpenPOWER on IntegriCloud