summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Support SHLD/SHRD masked shift-counts (PR34641)Simon Pilgrim2019-01-021-8/+29
| | | | | | | | | | Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222
* [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)Hal Finkel2019-01-022-49/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220
* Extend Module::getOrInsertGlobal to control the construction of thePhilip Pfaffe2019-01-021-8/+14
| | | | | | | | | | | | | | | | | | GlobalVariable Summary: Extend Module::getOrInsertGlobal to accept a callback for creating a new GlobalVariable if necessary instead of calling the GV constructor directly using default arguments. Additionally overload getOrInsertGlobal for the previous default behavior. Reviewers: chandlerc Subscribers: hiraditya, llvm-commits, bollu Differential Revision: https://reviews.llvm.org/D56130 llvm-svn: 350219
* [MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCIAndrea Di Biagio2019-01-021-18/+21
| | | | | | | | | | | | | | | | | Common code used by the default resource strategy to select pipeline resources has been moved to an helper function. The new selection logic has been slightly rewritten to get rid of a redundant zero check on the `ReadyMask` value. Before this patch, method select internally called function `PowerOf2Floor` to compute the next ready pipeline resource. However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input value. By construction, `ReadyMask` can never be zero. This patch replaces the call to `PowerOf2Floor` with an equivalent block of code which avoids the redundant zero check. This gives a minor 3-3.5% speedup on a release build. No functional change intended. llvm-svn: 350218
* [AMDGPU] Handle OR as operand of raw load/storePiotr Sobczak2019-01-021-4/+6
| | | | | | | | | | | | | | | | | | Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208
* [X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with ↵Craig Topper2019-01-023-47/+25
| | | | | | | | SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206
* [X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.Craig Topper2019-01-021-4/+2
| | | | | | | | These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205
* [InstCombine] canonicalize raw IR rotate patterns to funnel shiftSanjay Patel2019-01-011-13/+8
| | | | | | | | | | | | | | | The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199
* [X86] Factor the core code out of LowerXALUO into a helper function. Use it ↵Craig Topper2019-01-011-138/+95
| | | | | | | | | | | | in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198
* [LLVM-C] bool -> LLVMBoolRobert Widmann2019-01-011-2/+2
| | | | llvm-svn: 350197
* [LLVM-C] Add Accessors for Discarding Value Names in the IRRobert Widmann2019-01-011-0/+8
| | | | | | | | | | | | | | Summary: Add accessors so the performance improvement from this setting is accessible to third parties. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56179 llvm-svn: 350196
* [x86] move/rename helper for horizontal op codegen; NFCSanjay Patel2019-01-011-16/+16
| | | | | | Preliminary commit as suggested in D56011. llvm-svn: 350193
* Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"Nikita Popov2019-01-012-17/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188
* Reversing the commit in revision 350186. Revision causes regression in 4Ayonam Ray2019-01-012-33/+53
| | | | | | tests. llvm-svn: 350187
* Omit range checks from jump tables when lowering switches with unreachableAyonam Ray2019-01-012-53/+33
| | | | | | | | | | | | | | | default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186
* [InstCombine] canonicalize MUL with NEG operandChen Zheng2019-01-011-0/+5
| | | | | | | | | -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185
* [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.Craig Topper2018-12-311-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179
* [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.Craig Topper2018-12-311-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178
* Keep tablegen commands in alphabetical order. NFCI.Simon Pilgrim2018-12-311-1/+1
| | | | | | Mentioned on D56167. llvm-svn: 350176
* [AArch64] Accept "sve" as arch feature in assemblerMartin Storsjo2018-12-311-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174
* [MSan] Handle llvm.is.constant intrinsicAlexander Potapenko2018-12-311-0/+6
| | | | | | | | | | MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173
* [DAGCombiner] Add missing one use check on the shuffle in the ↵Craig Topper2018-12-311-1/+1
| | | | | | | | bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172
* [AArch64] Implement the .arch_extension directiveMartin Storsjo2018-12-301-0/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169
* [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction ↵Kang Zhang2018-12-301-3/+8
| | | | | | | | | | | | | | | | | | that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165
* [NFC] Fixed extra semicolon warningDavid Bolvansky2018-12-301-1/+1
| | | | | | | | -This line, and those below, will be ignored-- M lib/Support/Error.cpp llvm-svn: 350162
* [PowerPC] Fix ADDE, SUBE do not know how to promote operatorKang Zhang2018-12-301-0/+5
| | | | | | | | | | | | | | Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161
* [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.Craig Topper2018-12-301-6/+0
| | | | | | | | This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159
* [X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from ↵Craig Topper2018-12-301-2/+47
| | | | | | | | | | 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158
* [PowerPC][NFC] Macro for register set defs for the Asm ParserNemanja Ivanovic2018-12-293-355/+79
| | | | | | | | | | | We have some unfortunate code in the back end that defines a bunch of register sets for the Asm Parser. Every time another class is needed in the parser, we have to add another one of those definitions with explicit lists of registers. This NFC patch simply provides macros to use to condense that code a little bit. Differential revision: https://reviews.llvm.org/D54433 llvm-svn: 350156
* [PowerPC] Complete the custom legalization of vector int to fp conversionNemanja Ivanovic2018-12-292-45/+93
| | | | | | | | | | | | | | | A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155
* [PowerPC] Fix CR Bit spill pseudo expansionNemanja Ivanovic2018-12-291-5/+8
| | | | | | | | | | | | | | | | The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153
* [mips] Show an error on attempt to use 64-bit PC-relative relocationSimon Atanasyan2018-12-291-0/+4
| | | | | | | | | | | | The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152
* [mips] Show a regular error message on attempt to use one byte relocationSimon Atanasyan2018-12-291-1/+4
| | | | llvm-svn: 350151
* Drop SE cache early because loop parent can change in LoopSimplifyCFGMax Kazantsev2018-12-291-3/+7
| | | | llvm-svn: 350145
* [WebAssembly] Fix comments in ExplicitLocals (NFC)Heejin Ahn2018-12-291-3/+4
| | | | llvm-svn: 350144
* Add vtable anchor to classes.Richard Trieu2018-12-298-0/+22
| | | | llvm-svn: 350142
* [X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector ↵Craig Topper2018-12-291-8/+0
| | | | | | | | widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141
* [X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of ↵Craig Topper2018-12-281-25/+47
| | | | | | | | | | | | | | v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134
* [UnrollRuntime] NFC: Add comment and verify LCSSAAnna Thomas2018-12-281-2/+2
| | | | | | | Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131
* [AArch64] Add command-line option for SBDiogo N. Sampaio2018-12-283-9/+9
| | | | | | | | | | | | | | | SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126
* [PowerPC] handle ISD:TRUNCATE in BitPermutationSelectorHiroshi Inoue2018-12-281-8/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118
* Temporarily disable term folding in LoopSimplifyCFG, add testsMax Kazantsev2018-12-281-1/+1
| | | | llvm-svn: 350117
* [LoopSimplifyCFG] Delete dead blocks in RPOMax Kazantsev2018-12-281-5/+8
| | | | | | | | | Deletion of dead blocks in arbitrary order may lead to failure of assertion in `DeleteDeadBlock` that requires that we have deleted all predecessors before we can delete the current block. We should instead delete them in RPO order. llvm-svn: 350116
* [PowerPC] Remove the implicit use of the register if it is replaced by ImmQingShan Zhang2018-12-282-6/+37
| | | | | | | | If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115
* [NFC] clang-format functions related to r350113Zi Xuan Wu2018-12-281-87/+146
| | | | llvm-svn: 350114
* [PowerPC] Fix assert from machine verify pass that atomic pseudo expanding ↵Zi Xuan Wu2018-12-281-35/+46
| | | | | | | | | | | | causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113
* [PowerPC] fix register class after converting X-FORM instruction to D-FORM ↵Chen Zheng2018-12-281-7/+12
| | | | | | | | instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111
* [CallSite removal] Add and flesh out APIs on the new `CallBase` base class ↵Chandler Carruth2018-12-271-0/+21
| | | | | | | | | | | | | | | | | | | that previously were only available on the `CallSite` wrapper. Summary: This will make migrating code easier and generally seems like a good collection of API improvements. Some of these APIs seem like more consistent / better naming of existing ones. I've retained the old names for migration simplicit and am just adding the new ones in this commit. I'll try to garbage collect these once CallSite is gone. Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55638 llvm-svn: 350109
* [X86] Remove check that avoids creating PMULDQ with illegal types. Rely on ↵Craig Topper2018-12-271-1/+2
| | | | | | | | | | SplitOpsAndApply to legalize it. Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2. This seems to give some improvements in our ability to use SimplifyDemandedBits. llvm-svn: 350084
* [X86] Factor the core code out of LowerSETCC into a helper that can create ↵Craig Topper2018-12-272-39/+65
| | | | | | | | | | CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself. Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through. llvm-svn: 350082
OpenPOWER on IntegriCloud