summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Address post commit review comments on revision 366727.Sean Fertile2019-07-301-5/+5
| | | | | | | | | | | | | Addresses number of comment made on D64652 after commiting: - Reorders function decls in the TargetLoweringObjectFileXCOFF class. - Fix comment in MCSectionXCOFF to include description of external reference csects. - Convert several llvm_unreachables to report_fatal_error - Convert several dyn_casts to casts as they are expected not to fail. - Avoid copying DataLayout object. llvm-svn: 367324
* [X86] SimplifyDemandedVectorEltsForTargetNode should be calling ↵Simon Pilgrim2019-07-301-0/+1
| | | | | | | | resolveTargetShuffleInputs not getTargetShuffleMask Add TODO comment. llvm-svn: 367318
* [X86][AVX] SimplifyDemandedVectorElts - handle extraction from ↵Simon Pilgrim2019-07-301-8/+10
| | | | | | | | | | X86ISD::SUBV_BROADCAST source (PR42819) PR42819 showed an issue that we couldn't handle the case where we demanded a 'sub-sub-vector' of the SUBV_BROADCAST 'sub-vector' source. This patch recognizes these cases and extracts the sub-sub-vector instead of trying to broadcast to a type smaller than the 'sub-vector' source. llvm-svn: 367306
* [ARM][LowOverheadLoops] Enable by defaultSam Parker2019-07-301-1/+1
| | | | | | | | | The code is now in a good enough state to pass the bunch of tests that I have run (after fixing the bugs), so let's enable it by default. Differential Revision: https://reviews.llvm.org/D65277 llvm-svn: 367297
* [ARM][LowOverheadLoops] Revert non-header LE targetSam Parker2019-07-301-3/+9
| | | | | | | | | Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296
* [PowerPC][NFC]Fix a typo in comment.Jinsong Ji2019-07-291-1/+1
| | | | llvm-svn: 367252
* [X86] Fix typo in comment. We're looking at a right shift not a left shift. NFCCraig Topper2019-07-291-1/+1
| | | | llvm-svn: 367251
* [X86] resolveTargetShuffleInputs - add depth to limit recursion.Simon Pilgrim2019-07-291-15/+19
| | | | | | Avoids slow downs from calls to ComputeNumSignBits/computeKnownBits going too deep. llvm-svn: 367240
* AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructionsTom Stellard2019-07-291-3/+38
| | | | | | | | | | | | | | | | | | | | | | | Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237
* [AMDGPU] Fix typo in error messageJay Foad2019-07-291-1/+1
| | | | llvm-svn: 367235
* [X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.Simon Pilgrim2019-07-291-18/+18
| | | | | | | | As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth. This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc. llvm-svn: 367232
* [RISCV] Fix uninitialized variable after call to evaluateConstantImmFrancis Visoiu Mistrih2019-07-291-22/+22
| | | | | | | | | | | | | | | | | For llvm/test/MC/RISCV/rv64i-aliases-invalid.s, UBSan reports: lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9: runtime error: load of value 3879186881, which is not a valid value for type 'RISCVMCExpr::VariantKind' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9 in It turns out that evaluateConstantImm does not set `VK` and it remains unitialized when doing comparisons in `isImmXLenLI()`. Differential Revision: https://reviews.llvm.org/D65347 llvm-svn: 367230
* [DivergenceAnalysis] Add methods for querying divergence at useJay Foad2019-07-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use. This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis. This might allow D63953 or other similar workarounds to be removed. Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue Reviewed By: nhaehnle Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65141 llvm-svn: 367218
* [NFC][ARM[ParallelDSP] Cleanup of BinOpChainSam Parker2019-07-291-81/+58
| | | | | | | | | | | | - Remove some unused typedefs. - Rename BinOpChain struct to MulCandidate. - Remove the size method of MulCandidate. - Store only the first input of the ValueList provided to MulCandidate, as it's the only value we care about. This means we don't have to perform any ugly (and unnecessary) iterations of the list later on. llvm-svn: 367208
* [AMDGPU] Enable v4f16 and above for v_pk_fma instructionsDavid Stuttard2019-07-292-0/+28
| | | | | | | | | | | | | | | | | | | | Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206
* [NFC][ARM][ParallelDSP] Remove AreSymmetricalSam Parker2019-07-291-43/+0
| | | | | | | We explicitly search for a parallel mac and we only care about its inputs, checking for symmetry doesn't add anything here. llvm-svn: 367205
* [NFC][ARM][ParallelDSP] Remove PopulateLoadsSam Parker2019-07-291-9/+0
| | | | | | | | We no longer have to check what loads are used, all this is performed at the start of the transform, so it's not doing anything now. llvm-svn: 367204
* [X86] Don't use PMADDWD for vector add reductions of multiplies if the mul ↵Craig Topper2019-07-291-12/+22
| | | | | | | | | | | | | | | | | | | | | inputs have an additional user. The pmaddwd inserts a truncate, if that truncate would end up creating additional instructions instead of making a zext narrower, then we shouldn't do it. I've restricted this to only sse4.1 targets since on prior targets the zext will be done in stages. So the truncate will probably not create additional instructions. Might need some more investigation of mul shrinking and the other pmaddwd transform to be sure this is the right decision. There might be a slight regression on AVX1 targets due to add splitting. Hard to say for sure. Maybe we need to look into using the vector reduction flag to use 2 narrow loads and a blend instead of extracting and inserting. llvm-svn: 367198
* [X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the ↵Craig Topper2019-07-281-78/+63
| | | | | | | | | | vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit. This reverts r340478 and r340631 and replaces them with a simpler method of just letting DAG combine revisit the nodes to handle the other operand. llvm-svn: 367195
* [ARM] MVE VPNOTDavid Green2019-07-282-3/+22
| | | | | | | | | | This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192
* [ARM] Better patterns for fp <> predicate vectorsDavid Green2019-07-282-4/+26
| | | | | | | | | | These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191
* [X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Simon Pilgrim2019-07-271-9/+12
| | | | | | | | SimplifyMultipleUseDemandedBits handler (Reapplied) Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines. llvm-svn: 367172
* [AArch64][GlobalISel] Implement narrowing of G_SEXT.Amara Emerson2019-07-261-20/+26
| | | | | | | | We need this to narrow a sext to s128. Differential Revision: https://reviews.llvm.org/D65357 llvm-svn: 367164
* [AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointersJessica Paquette2019-07-261-3/+21
| | | | | | | | | | | | | | | | | | | Add partial instruction selection for intrinsics like this: ``` declare i32 @llvm.aarch64.stlxr(i64, i32*) ``` (This only handles the case where a G_ZEXT is feeding the intrinsic.) Also make sure that the added store instruction actually has the memory op from the original G_STORE. Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D65355 llvm-svn: 367163
* Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Vlad Tsyrklevich2019-07-261-12/+9
| | | | | | | | | SimplifyMultipleUseDemandedBits handler." This reverts r367100, it appears to be causing test failures after Nico's revert of r367091. llvm-svn: 367141
* [PowerPC][AIX]Add lowering of MCSymbol MachineOperand.Sean Fertile2019-07-261-0/+3
| | | | | | | | | | | Adds machine operand lowering for MCSymbolSDNodes to the PowerPC backend. This is needed to produce call instructions in assembly for AIX because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet. Differential Revision: https://reviews.llvm.org/D63738 llvm-svn: 367133
* [AMDGPU] Fix typo.Michael Liao2019-07-261-2/+2
| | | | llvm-svn: 367131
* [AArch64][SVE2] Rename bitperm feature to sve2-bitpermCullen Rhodes2019-07-263-3/+3
| | | | | | | | | | | | | | | | Summary: The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2 extensions Patch by Maciej Gabka. Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin Reviewed By: SjoerdMeijer, rengolin Differential Revision: https://reviews.llvm.org/D65327 llvm-svn: 367124
* [ARM][ParallelDSP] Combine structsSam Parker2019-07-261-19/+15
| | | | | | | Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114
* [PowerPC] Add getCRSaveOffset to improve readability. [NFC]Sean Fertile2019-07-262-6/+17
| | | | | | | | | | In preperation for AIX support in FrameLowering: replace a number of literal '8' that represent the stack offset of the condition register save area with a member in PPCFrameLowering. Patch by Chris Bowler. llvm-svn: 367111
* [MIPS GlobalISel] Fix check for void return during lowerCallPetar Avramovic2019-07-261-2/+2
| | | | | | | | | | | Void return used to have unsigned with value 0 for virtual register but with addition of Register class and changes to arguments to lowerCall this is no longer valid. Check for void return by inspecting the Ty field in OrigRet. Differential Revision: https://reviews.llvm.org/D65321 llvm-svn: 367107
* [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAGCarl Ritson2019-07-262-10/+6
| | | | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105
* [MIPS GlobalISel] Select inttoptr and ptrtointPetar Avramovic2019-07-263-1/+11
| | | | | | | | Select G_INTTOPTR and G_PTRTOINT for MIPS32. Differential Revision: https://reviews.llvm.org/D65217 llvm-svn: 367104
* [X86][SSE] Replace PMULDQ GetDemandedBits combine with ↵Simon Pilgrim2019-07-261-9/+12
| | | | | | | | SimplifyMultipleUseDemandedBits handler. This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits. llvm-svn: 367100
* [NFC][ARM][ParallelDSP] Cleanup isNarrowSequenceSam Parker2019-07-261-26/+5
| | | | | | Remove unused logic. llvm-svn: 367099
* [AMDGPU] Add llvm.amdgcn.softwqm intrinsicCarl Ritson2019-07-265-1/+38
| | | | | | | | | | | | | | | | | Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097
* [AArch64] Define ETE and TRBE system registersMomchil Velikov2019-07-264-1/+45
| | | | | | | | | | | | | | | | | | | | Embedded Trace Extension and Trace Buffer Extension are optional future architecture extensions. (cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools) Their system registers are documented here: https://developer.arm.com/docs/ddi0601/a ETE shares register names with ETM. One exception is the ETE TRCEXTINSELR0 register, which has the same encoding as the ETM TRCEXTINSELR register (but different semantics). This patch treats them as aliases: the assembler will accept both names, emitting identical encoding, and the disassembler will keep disassembling to TRCEXRINSELR. Differential Revision: https://reviews.llvm.org/D63707 llvm-svn: 367093
* [ARM][LowOverheadLoops] Add CPSR defsSam Parker2019-07-261-2/+4
| | | | | | | | | | Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089
* [WinEH] Allocate space in funclets stack to save XMM CSRsPengfei Wang2019-07-263-23/+127
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This is an alternate approach to D57970. Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper, RKSimon Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63396 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 367088
* AMDGPU/GlobalISel: Handle most function return typesMatt Arsenault2019-07-262-32/+141
| | | | | | | | | handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082
* [AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC.Amara Emerson2019-07-261-32/+28
| | | | llvm-svn: 367075
* [BPF] fix typedef issue for offset relocationYonghong Song2019-07-252-9/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the CO-RE offset relocation does not work if any struct/union member or array element is a typedef. For example, typedef const int arr_t[7]; struct input { arr_t a; }; func(...) { struct input *in = ...; ... __builtin_preserve_access_index(&in->a[1]) ... } The BPF backend calculated default offset is 0 while 4 is the correct answer. Similar issues exist for struct/union typedef's. When getting struct/union member or array element type, we should trace down to the type by skipping typedef and qualifiers const/volatile as this is what clang did to generate getelementptr instructions. (const/volatile member type qualifiers are already ignored by clang.) This patch fixed this issue, for each access index, skipping typedef and const/volatile/restrict BTF types. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65259 llvm-svn: 367062
* [AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943.Amara Emerson2019-07-251-1/+1
| | | | | | Changes the order of legalization of G_ICMP suggested by Petar in D65079. llvm-svn: 367060
* [BPF] fix CO-RE incorrect index access stringYonghong Song2019-07-252-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we expect the CO-RE offset relocation records a string encoding the original getelementptr access index, so kernel bpf loader can decode it correctly. For example, struct s { int a; int b; }; struct t { int c; int d; }; #define _(x) (__builtin_preserve_access_index(x)) int get_value(const void *addr1, const void *addr2); int test(struct s *arg1, struct t *arg2) { return get_value(_(&arg1->b), _(&arg2->d)); } We expect two offset relocations: reloc 1: type s, access index 0, 1 reloc 2: type t, access index 0, 1 Two globals are created to retain access indexes for the above two relocations with global variable names. The first global has a name "0:1:". Unfortunately, the second global has the name "0:1:.1" as the llvm internals automatically add suffix ".1" to a global with the same name. Later on, the BPF peels the last character and record "0:1" and "0:1:." in the relocation table. This is not desirable. BPF backend could use the global variable suffix knowledge to generate correct access str. This patch rather took an approach not relying on that knowledge. It generates "s:0:1:" and "t:0:1:" to avoid global variable suffixes and later on generate correct index access string "0:1" for both records. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65258 llvm-svn: 367030
* [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.Michael Liao2019-07-251-0/+3
| | | | | | | | | | | | | | | | | | | | Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023
* [AArch64][SVE] Allow explicit size specifier for predicate operandMomchil Velikov2019-07-251-8/+15
| | | | | | | | | | | | | ... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue supporting the exsting behaviour of not requiring an explicit size specifier. The preferred disasembly is *with* the specifier. This is implemented by redefining intruction forms to require vector predicates with explicit size and adding aliases, which allow a predicate with no size. Differential Revision: https://reviews.llvm.org/D65145 llvm-svn: 367019
* AMDGPU: Don't assert on v4f16 arguments to shader calling conventionsMatt Arsenault2019-07-251-1/+2
| | | | llvm-svn: 367018
* [X86] concatSubVectors - remove unnecessary args. NFCI.Simon Pilgrim2019-07-251-9/+12
| | | | | | All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper. llvm-svn: 367014
* [ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1Pablo Barrio2019-07-256-2/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007
* [PowerPC][NFC] Make `getDefMIPostRA` publicKai Luo2019-07-251-5/+5
| | | | llvm-svn: 366995
OpenPOWER on IntegriCloud