summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [MC][ARM] Delete MCSection::HasData and move SHF_ARM_PURECODE logic to ↵Fangrui Song2020-01-051-2/+5
| | | | | | | | ARMELFObjectWriter::addTargetSectionFlags This simplifies the generic interface and also makes SHF_ARM_PURECODE more robust (fixes a TODO). Inspecting MCDataFragment contents covers more cases than MCObjectStreamer::EmitBytes.
* [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes ↵Simon Pilgrim2020-01-051-2/+13
| | | | | | | | (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
* [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.Simon Pilgrim2020-01-051-62/+62
| | | | Updates function order in preparation of future fix for PR43660
* [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END ↵Simon Pilgrim2020-01-052-27/+4
| | | | | | (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
* [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectorsDavid Green2020-01-055-18/+45
| | | | | | | | | | | This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139
* [ARM] Fill in FP16 FMA patternsDavid Green2020-01-051-0/+21
| | | | | | This adds fp16 variants of all the fma patterns in the ARM backend. Differential Revision: https://reviews.llvm.org/D72138
* GlobalISel: Scalarize all division operationsMatt Arsenault2020-01-041-0/+7
| | | | | | This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.
* Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."Florian Hahn2020-01-043-3/+3
| | | | | This reverts commit 51ef53f3bd23559203fe9af82ff2facbfedc1db3, as it breaks some bots.
* [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).Florian Hahn2020-01-043-3/+3
| | | | | | | | | | | | SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
* [SCEV] Remove unused ScalarEvolutionExpander.h includes (NFC).Florian Hahn2020-01-041-1/+0
|
* AMDGPU/GlobalISel: Refine SMRD selection rulesMatt Arsenault2020-01-041-4/+22
| | | | | Fix selecting these for volatile global loads, and ensure the loads are constant enough.
* AMDGPU/GlobalISel: Legalize more odd sized loadsMatt Arsenault2020-01-041-5/+9
| | | | | The attempts to widen sufficently aligned, odd sized loads wasn't consistently applied.
* AMDGPU/GlobalISel: Assume vcc phis for any vcc inputMatt Arsenault2020-01-041-2/+3
| | | | | | | This produces more intelligible looking results, more comparabble to the DAG output in the simplest cases. This is probably wrong in complex control flow, but RegBankSelect doesn't attempt analyzing if this is on a masked path for selecting the bank yet.
* AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectlyMatt Arsenault2020-01-041-13/+23
| | | | | | | | | | | We're checking the current register bank of the registers in the instruction, but the mapping may have inserted cross bank copies and is expecting to replace the registers. We mostly get away with this currently, because VGPR->SGPR copies are illegal, and we assume this won't happen. In a future change, we'll start relying on more cross register bank copies being inserted, and this starts to break down.
* [AMDGPU] need to insert wait between the scalar load and vector store to the ↵alex-t2020-01-041-0/+21
| | | | | | | | | | same address to avoid WAR conflict. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D71934
* [AMDGPU] Revert scheduling to reduce spillingStanislav Mekhanoshin2020-01-031-2/+11
| | | | | | | | | | We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule. Differential Revision: https://reviews.llvm.org/D72181
* GlobalISel: Add type argument to getRegBankFromRegClassMatt Arsenault2020-01-0310-20/+25
| | | | | | AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.
* [amdgpu] Skip non-instruction values in CF user tracing.Michael Liao2020-01-031-0/+2
| | | | | | | | | | | | | | | | Summary: - CF users won't be non-instruction values. Skip them to save the compilation time. It's especially true when there are multiple functions in that module, where, says, a constant may be used in most functions. The current CF user tracing adds significant overhead. Reviewers: alex-t, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72174
* [SystemZ] Don't allow CL option -mpacked-stack with -mbackchain.Jonas Paulsson2020-01-031-0/+2
| | | | | | | -mpacked-stack is currently not supported with -mbackchain, so this should result in a compilation error message instead of being silently ignored. Review: Ulrich Weigand
* AMDGPU/GlobalISel: Add new utils fileMatt Arsenault2020-01-034-33/+77
| | | | | | There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.
* AMDGPU: Only allow regs for s_movrel_{b32|b64}Matt Arsenault2020-01-031-2/+13
| | | | | This would incorrectly allowing folding immediates. These currently aren't selectable, but will be from GlobalISel soon.
* [X86] Improve for v2i32->v2f64 uint_to_fpCraig Topper2020-01-031-36/+14
| | | | | | | | | | | | | | This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
* Move tail call disabling code to target independent codeReid Kleckner2020-01-036-35/+6
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* AMDGPU/GlobalISel: Fix off by one in operand indexMatt Arsenault2020-01-031-4/+4
| | | | This should be looking at the RHS of the add for a constant.
* Fix typo "psuedo" in commentsJay Foad2020-01-034-5/+5
|
* [ARM][NFC] Move tail predication checksSam Parker2020-01-031-69/+76
| | | | | Extract the tail predication validation checks out into their own LowOverHeadLoop method.
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-034-10/+10
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG ↵QingShan Zhang2020-01-034-0/+23
| | | | | | | | | | | for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* [PowerPC]: Fix predicate handling with SPEJustin Hibbits2020-01-021-9/+21
| | | | | SPE floating-point compare instructions only update the GT bit in the CR field. All predicates must therefore be reduced to GT/LE.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [PowerPC][AIX] Enable sret arguments.Sean Fertile2020-01-021-3/+0
| | | | | | Removes the fatal error for sret arguments and adds lit testing. Differential Revision: https://reviews.llvm.org/D71504
* AMDGPU/GlobalISel: Remove manual G_FENCE selectionMatt Arsenault2020-01-021-5/+0
| | | | | The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.
* DAG: Use TargetConstant for FENCE operandsMatt Arsenault2020-01-027-16/+16
|
* [SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes.Jonas Paulsson2020-01-021-1/+1
| | | | | | | For consistency with GCC, the target label is moved to the brcl itself instead of the next instruction. Review: Ulrich Weigand
* [X86] Move STRICT_ ISD nodes into the new section of X86ISelLowering.h where ↵Craig Topper2020-01-021-4/+17
| | | | STRICT nodes are collected after D71841
* [PowerPC] Only legalize FNEARBYINT with unsafe fp mathNemanja Ivanovic2020-01-021-2/+7
| | | | | | | Commit 0f0330a78709 legalized these nodes on PPC without consideration of unsafe math which means that we get inexact exceptions raised for nearbyint. Since this doesn't conform to the standard, switch this legalization to depend on unsafe fp math.
* X86: remove unused variableSaleem Abdulrasool2020-01-021-1/+0
| | | | | Remove the now unused-variable from aa17d31edb00c66461093b5a7cd2f4a35dc143e9. This breaks `-Werror` builds.
* [X86] Remove FP0-6 operands from call instructions in FPStackifier pass. ↵Craig Topper2020-01-021-9/+11
| | | | | | | | | | | | | | | Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.
* [FPEnv] Default NoFPExcept SDNodeFlag to falseUlrich Weigand2020-01-022-16/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841
* Remove unneeded extra variable realArgIdx. NFC.Jay Foad2020-01-022-10/+8
|
* [AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32Andrzej Warzynski2020-01-021-14/+21
| | | | | | | | | | | | | | | | | | | Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724
* [NFC] Make the type of X86AlignBranchBoundary compatibleShengchen Kan2020-01-021-1/+1
| | | | | | Change the type of X86AlignBranchBoundary from cl::opt<uint64_t> to cl::opt<unsigned> since the template class cl::opt is only instantiated with type unsigned, int, std::string, char and bool.
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
* add strict float for round operationLiu, Chen32020-01-016-41/+86
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [MC][TargetMachine] Delete MCTargetOptions::MCPIECopyRelocationsFangrui Song2020-01-011-8/+7
| | | | | | | | | | | | clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations check and sets dso_local on applicable global variables. We don't need to duplicate the work in TargetMachine shouldAssumeDSOLocal. Verified that -mpie-copy-relocations can still emit PC relative relocations for external variable accesses. clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32 clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC
* [X86] Fix typo in getCMovOpcode.Craig Topper2019-12-311-1/+1
| | | | | | The 64-bit HasMemoryOperand line was using CMOV32rm instead of CMOV64rm. Not sure how to test this. We have no test coverage that passes true for HasMemoryOperand.
* [X86] Add X87 FCMOV support to X86FlagsCopyLowering.Craig Topper2019-12-311-0/+73
| | | | Fixes PR44396
* [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.Craig Topper2019-12-311-0/+3
|
OpenPOWER on IntegriCloud