summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to ↵Tim Shen2017-03-081-5/+43
| | | | | | | | | | | | | reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256
* Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce ↵Tim Shen2017-03-071-43/+5
| | | | | | | | | | | stack frame size" This reverts commit r296771. We found some wide spread test failures internally. I'm working on a testcase. Politely revert the patch in the mean time. :) llvm-svn: 297124
* [PowerPC] Fix failure with STBRX when store is narrower than the bswapNemanja Ivanovic2017-03-061-2/+5
| | | | | | | | | | | Fixes a crash caused by r296811 by truncating the input of the STBRX node when the bswap is wider than i32. Fixes https://bugs.llvm.org/show_bug.cgi?id=32140 Differential Revision: https://reviews.llvm.org/D30615 llvm-svn: 297001
* [PPC] Fix code generation for bswap(int32) followed by store16Guozhi Wei2017-03-021-2/+10
| | | | | | | | | | | | | | | | | | | This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362 llvm-svn: 296811
* [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame sizeNemanja Ivanovic2017-03-021-5/+43
| | | | | | | | | | | | | | This patch reduces the stack frame size by not allocating the parameter area if it is not required. In the current implementation LowerFormalArguments_64SVR4 already handles the parameter area, but LowerCall_64SVR4 does not (when calculating the stack frame size). What this patch does is make LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29881 llvm-svn: 296771
* [PowerPC] Expand ISEL instruction into if-then-else sequence.Tony Jiang2017-01-161-3/+2
| | | | | | | | | Generally, the ISEL is expanded into if-then-else sequence, in some cases (like when the destination register is the same with the true or false value register), it may just be expanded into just the if or else sequence. llvm-svn: 292154
* Revert "[PowerPC] Expand ISEL instruction into if-then-else sequence."Tony Jiang2017-01-161-2/+3
| | | | | | This reverts commit 1d0e0374438ca6e153844c683826ba9b82486bb1. llvm-svn: 292131
* [PowerPC] Expand ISEL instruction into if-then-else sequence.Tony Jiang2017-01-161-3/+2
| | | | | | | | | Generally, the ISEL is expanded into if-then-else sequence, in some cases (like when the destination register is the same with the true or false value register), it may just be expanded into just the if or else sequence. llvm-svn: 292128
* [PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-131-112/+233
| | | | | | other minor fixes (NFC). llvm-svn: 291872
* [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)Hal Finkel2017-01-041-40/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts: 1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration. 2. When determining when we need to add a nop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction). First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places. Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or come from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections). There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return. Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and I don't think we should try to make it appear to work in the backend in spite of that fact. Unfortunately, because of the way that the ABI works, we need to generate code as if we supported interposition whenever the linker might insert stubs for the purpose of supporting it. Differential Revision: https://reviews.llvm.org/D27231 llvm-svn: 291003
* Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and ↵Chandler Carruth2016-12-161-25/+40
| | | | | | | | | | | | | tail-call eligibility) This patch appears to result in trampolines in vtables being miscompiled when they in turn tail call a method. I've posted some preliminary details about the failure on the thread for this commit and talked to Hal. He was comfortable going ahead and reverting until we sort out what is wrong. llvm-svn: 289928
* [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)Hal Finkel2016-12-141-40/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change aims to unify and correct our logic for when we need to allow for the possibility of the linker adding a TOC restoration instruction after a call. This comes up in two contexts: 1. When determining tail-call eligibility. If we make a tail call (i.e. directly branch to a function) then there is no place for the linker to add a TOC restoration. 2. When determining when we need to add a nop instruction after a call. Likewise, if there is a possibility that the linker might need to add a TOC restoration after a call, then we need to put a nop after the call (the bl instruction). First problem: We were using similar, but different, logic to decide (1) and (2). This is just wrong. Both the resideInSameModule function (used when determining tail-call eligibility) and the isLocalCall function (used when deciding if the post-call nop is needed) were supposed to be determining the same underlying fact (i.e. might a TOC restoration be needed after the call). The same logic should be used in both places. Second problem: The logic in both places was wrong. We only know that two functions will share the same TOC when both functions come from the same section of the same object. Otherwise the linker might cause the functions to use different TOC base addresses (unless the multi-TOC linker option is disabled, in which case only shared-library boundaries are relevant). There are a number of factors that can cause functions to be placed in different sections or come from different objects (-ffunction-sections, explicitly-specified section names, COMDAT, weak linkage, etc.). All of these need to be checked. The existing logic only checked properties of the callee, but the properties of the caller must also be checked (for example, calling from a function in a COMDAT section means calling between sections). There was a conceptual error in the resideInSameModule function in that it allowed tail calls to functions with weak linkage and protected/hidden visibility. While protected/hidden visibility does prevent the function implementation from being replaced at runtime (via interposition), it does not prevent the linker from using an alternate implementation at link time (i.e. using some strong definition to replace the provided weak one during linking). If this happens, then we're still potentially looking at a required TOC restoration upon return. Otherwise, in general, the post-call nop is needed wherever ELF interposition needs to be supported. We don't currently support ELF interposition at the IR level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html for more information), and I don't think we should try to make it appear to work in the backend in spite of that fact. This will yield subtle bugs if interposition is attempted. As a result, regardless of whether we're in PIC mode, we don't assume that we need to add the nop to support the possibility of ELF interposition. However, the necessary check is in place (i.e. calling GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for which interposition is allowed at the IR level, we'll add the nop as necessary. In the mean time, we'll generate more tail calls and fewer nops when compiling position-independent code. Differential Revision: https://reviews.llvm.org/D27231 llvm-svn: 289638
* [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSRGuozhi Wei2016-12-121-1/+7
| | | | | | | | | Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473
* [PowerPC] Improvements for BUILD_VECTOR Vol. 2Nemanja Ivanovic2016-11-291-4/+98
| | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26023 This patch adds support for converting a vector of loads into a single load if the loads are consecutive (in either direction). llvm-svn: 288219
* [PowerPC] Improvements for BUILD_VECTOR Vol. 2Nemanja Ivanovic2016-11-291-1/+94
| | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D25980 This is the 2nd patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector of fp-to-int conversions into an fp-to-int conversion of a build vector of fp values. For example: Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...) Into (fp_to_[su]i (build_vector $A, $B, ...))). Which is a natural match for much cleaner code. llvm-svn: 288218
* Revert https://reviews.llvm.org/rL287679Nemanja Ivanovic2016-11-291-15/+0
| | | | | | | This commit caused some miscompiles that did not show up on any of the bots. Reverting until we can investigate the cause of those failures. llvm-svn: 288214
* [PowerPC] Improvements for BUILD_VECTOR Vol. 1Nemanja Ivanovic2016-11-291-23/+70
| | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D25912 This is the first patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. llvm-svn: 288152
* [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LENemanja Ivanovic2016-11-221-0/+15
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D26861 It also fixes PR30730. Committing on behalf of Lei Huang. llvm-svn: 287679
* [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extendedEhsan Amiri2016-11-181-1/+83
| | | | | | | | | | | | | | | | | When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329
* Always use relative jump table encodings on PowerPC64.Joerg Sonnenberger2016-11-161-0/+50
| | | | | | | | | | | | | | | | | For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
* [PowerPC] Implement BE VSX load/store builtins - llvm portion.Tony Jiang2016-11-151-0/+8
| | | | | | | | | | | | This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967
* [DAG Combiner] Fix the native computation of the Newton series for reciprocalsEvandro Menezes2016-11-101-3/+4
| | | | | | | | | | | | The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
* [PPC] Generate positive FP zero using xor insn instead of loading from ↵Ehsan Amiri2016-10-241-0/+21
| | | | | | | | | | | constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995
* [Target] remove TargetRecip class; 2nd trySanjay Patel2016-10-201-46/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746
* revert r284495: [Target] remove TargetRecip classSanjay Patel2016-10-181-19/+46
| | | | | | There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
* [Target] remove TargetRecip class; move reciprocal estimate isel ↵Sanjay Patel2016-10-181-46/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
* [Target] move reciprocal estimate settings from TargetOptions to TargetLoweringSanjay Patel2016-10-041-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO. Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current state. The ingredients of this patch are: Remove the reciprocal estimate command-line debug option. Add TargetRecip to TargetLowering. Remove TargetRecip from TargetOptions. Clean up the TargetRecip implementation to work with this new scheme. Set the default reciprocal settings in TargetLoweringBase (everything is off). Update the PowerPC defaults, users, and tests. Update the x86 defaults, users, and tests. Note that if this patch needs to be reverted, the related clang patch checked in at r283251 should be reverted too. Differential Revision: https://reviews.llvm.org/D24816 llvm-svn: 283252
* [Power9] Part-word VSX integer scalar loads/stores and sign extend instructionsNemanja Ivanovic2016-10-041-38/+69
| | | | | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190
* [PowerPC] Refactor soft-float support, and enable PPC64 soft floatHal Finkel2016-10-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | This change enables soft-float for PowerPC64, and also makes soft-float disable all vector instruction sets for both 32-bit and 64-bit modes. This latter part is necessary because the PPC backend canonicalizes many Altivec vector types to floating-point types, and so soft-float breaks scalarization support for many operations. Both for embedded targets and for operating-system kernels desiring soft-float support, it seems reasonable that disabling hardware floating-point also disables vector instructions (embedded targets without hardware floating point support are unlikely to have Altivec, etc. and operating system kernels desiring not to use floating-point registers to lower syscall cost are unlikely to want to use vector registers either). If someone needs this to work, we'll need to change the fact that we promote many Altivec operations to act on v4f32. To make it possible to disable Altivec when soft-float is enabled, hardware floating-point support needs to be expressed as a positive feature, like the others, and not a negative feature, because target features cannot have dependencies on the disabling of some other feature. So +soft-float has now become -hard-float. Fixes PR26970. llvm-svn: 283060
* [Power9] Builtins for ELF v.2 API conformance - back end portionNemanja Ivanovic2016-09-271-4/+56
| | | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D24396 This patch adds support for the "vector count trailing zeroes", "vector compare not equal" and "vector compare not equal or zero instructions" as well as "scalar count trailing zeroes" instructions. It also changes the vector negation to use XXLNOR (when VSX is enabled) so as not to increase register pressure (previously this was done with a splat immediate of all ones followed by an XXLXOR). This was done because the altivec.h builtins (patch to follow) use vector negation and the use of an additional register for the splat immediate is not optimal. llvm-svn: 282478
* [Power9] Exploit move and splat instructions for build_vector improvementNemanja Ivanovic2016-09-231-1/+39
| | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D21135 This patch exploits the following instructions: mtvsrws lxvwsx mtvsrdd mfvsrld In order to improve some build_vector and extractelement patterns. llvm-svn: 282246
* [PowerPC] Sign extend sub-word values for atomic comparisonsNemanja Ivanovic2016-09-221-2/+11
| | | | | | | | Atomic comparison instructions use the sub-word load instruction on Power8 and up but the value is not sign extended prior to the signed word compare instruction. This patch adds that sign extension. llvm-svn: 282182
* [Power9] Add exploitation of non-permuting memory opsNemanja Ivanovic2016-09-221-4/+8
| | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D19825 The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the patterns for the old instruction have a predicate that ensures they won't be selected on Power9 and newer CPUs. llvm-svn: 282143
* getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCISanjay Patel2016-09-141-1/+1
| | | | llvm-svn: 281493
* getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCISanjay Patel2016-09-141-5/+3
| | | | llvm-svn: 281490
* getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-141-1/+1
| | | | llvm-svn: 281489
* Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)Nemanja Ivanovic2016-09-141-2/+15
| | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D24021 In the initial implementation of this instruction, I forgot to account for variable indices. This patch fixes PR30189 and should probably be merged into 3.9.1 (I'll open a bug according to the new instructions). llvm-svn: 281479
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-111-10/+6
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPCHal Finkel2016-09-011-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is lowered using: ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET) where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86, FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not work for PowerPC. Because of the way that the stack layout works, the canonical frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC (there is a lower save-area offset as well), so it is not just a matter of implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its semantics -- We can do that, since it is currently used only for @llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips currently does this, but by using a custom lowering for ADD that specifically recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern. This change introduces a ISD::EH_DWARF_CFA node, which by default expands using the existing logic, but can be directly lowered by the target. Mips is updated to use this method (which simplifies its implementation, and I suspect makes it more robust), and updates PowerPC to do the same. Fixes PR26761. Differential Revision: https://reviews.llvm.org/D24038 llvm-svn: 280350
* [PowerPC] Add support for -mlongcallHal Finkel2016-08-301-1/+10
| | | | | | | | | | | The "long call" option forces the use of the indirect calling sequence for all calls (even those that don't really need it). GCC provides this option; This is helpful, under certain circumstances, for building very-large binaries, and some other specialized use cases. Fixes PR19098. llvm-svn: 280040
* [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomicsHal Finkel2016-08-291-6/+12
| | | | | | | | | | | | | | For little-Endian PowerPC, we generally target only P8 and later by default. However, generic (older) 64-bit configurations are still an option, and in that case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics without true i8/i16 atomic operations, we emulate using i32 atomics in combination with a bunch of shifting and masking, etc. The amount by which to shift in little-Endian mode is different from the amount in big-Endian mode (it is inverted -- meaning we can leave off the xor when computing the amount). Fixes PR22923. llvm-svn: 280022
* [PowerPC] Implement lowering for atomicrmw min/max/umin/umaxHal Finkel2016-08-281-3/+98
| | | | | | Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818. llvm-svn: 279933
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* [ppc64] Don't apply sibling call optimization if callee has any byval argChuang-Yu Cheng2016-08-171-1/+8
| | | | | | | | | | | | | | | This is a quick work around, because in some cases, e.g. caller's stack size > callee's stack size, we are still able to apply sibling call optimization even callee has any byval arg. This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328 Reviewers: hfinkel kbarton nemanjai amehsan Subscribers: hans, tjablin https://reviews.llvm.org/D23441 llvm-svn: 278900
* [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).Pierre Gousseau2016-08-161-13/+3
| | | | | | | | Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets. Differential Revision: https://reviews.llvm.org/D23445 llvm-svn: 278799
* Use range algorithms instead of unpacking begin/endDavid Majnemer2016-08-111-2/+1
| | | | | | No functionality change is intended. llvm-svn: 278417
* [PowerPC] fix passing long double arguments to function (soft-float)Strahinja Petrovic2016-08-051-0/+26
| | | | | | | | | | This patch fixes passing long double type arguments to function in soft float mode. If there is less than 4 argument registers free (long double type is mapped in 4 gpr registers in soft float mode) long double type argument must be passed through stack. Differential Revision: https://reviews.llvm.org/D20114. llvm-svn: 277804
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-52/+52
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* TargetInstrInfo: rename GetInstSizeInBytes to getInstSizeInBytes. NFCSjoerd Meijer2016-07-281-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D22925 llvm-svn: 276997
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-151-225/+171
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
OpenPOWER on IntegriCloud