summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel timeCraig Topper2019-01-027-208/+177
| | | | | | | | | | | | | | INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag. This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used. I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag. This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes. Differential Revision: https://reviews.llvm.org/D55975 llvm-svn: 350245
* [MS Demangler] Add a flag for dumping types without tag specifier.Zachary Turner2019-01-021-8/+10
| | | | | | | | Sometimes it's useful to be able to output demangled names without tag specifiers like "struct", "class", etc. This patch adds a flag enabling this. llvm-svn: 350241
* [DAGCombiner] After performing the division by constant optimization for a ↵Craig Topper2019-01-021-2/+29
| | | | | | | | | | | | DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239
* [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt ↵Craig Topper2019-01-021-2/+20
| | | | | | | | | | | | | | also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236
* [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold ↵Craig Topper2019-01-021-2/+5
| | | | | | | | | | | | (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235
* [PowerPC] Remove SeenUse check when optimizing conditional branch inWei Mi2019-01-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223
* [X86] Support SHLD/SHRD masked shift-counts (PR34641)Simon Pilgrim2019-01-021-8/+29
| | | | | | | | | | Peek through shift modulo masks while matching double shift patterns. I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now. Differential Revision: https://reviews.llvm.org/D56199 llvm-svn: 350222
* [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)Hal Finkel2019-01-022-49/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220
* Extend Module::getOrInsertGlobal to control the construction of thePhilip Pfaffe2019-01-021-8/+14
| | | | | | | | | | | | | | | | | | GlobalVariable Summary: Extend Module::getOrInsertGlobal to accept a callback for creating a new GlobalVariable if necessary instead of calling the GV constructor directly using default arguments. Additionally overload getOrInsertGlobal for the previous default behavior. Reviewers: chandlerc Subscribers: hiraditya, llvm-commits, bollu Differential Revision: https://reviews.llvm.org/D56130 llvm-svn: 350219
* [MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCIAndrea Di Biagio2019-01-021-18/+21
| | | | | | | | | | | | | | | | | Common code used by the default resource strategy to select pipeline resources has been moved to an helper function. The new selection logic has been slightly rewritten to get rid of a redundant zero check on the `ReadyMask` value. Before this patch, method select internally called function `PowerOf2Floor` to compute the next ready pipeline resource. However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input value. By construction, `ReadyMask` can never be zero. This patch replaces the call to `PowerOf2Floor` with an equivalent block of code which avoids the redundant zero check. This gives a minor 3-3.5% speedup on a release build. No functional change intended. llvm-svn: 350218
* [AMDGPU] Handle OR as operand of raw load/storePiotr Sobczak2019-01-021-4/+6
| | | | | | | | | | | | | | | | | | Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208
* [X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with ↵Craig Topper2019-01-023-47/+25
| | | | | | | | SMUL/UMUL. Remove the second result from X86ISD::UMUL. All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering. llvm-svn: 350206
* [X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.Craig Topper2019-01-021-4/+2
| | | | | | | | These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code. Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll llvm-svn: 350205
* [InstCombine] canonicalize raw IR rotate patterns to funnel shiftSanjay Patel2019-01-011-13/+8
| | | | | | | | | | | | | | | The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199
* [X86] Factor the core code out of LowerXALUO into a helper function. Use it ↵Craig Topper2019-01-011-138/+95
| | | | | | | | | | | | in LowerBRCOND and LowerSELECT to avoid some duplicated code. This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions. This is inspired by the helper functions used by ARM and AArch64 for the same purpose. The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO. llvm-svn: 350198
* [LLVM-C] bool -> LLVMBoolRobert Widmann2019-01-011-2/+2
| | | | llvm-svn: 350197
* [LLVM-C] Add Accessors for Discarding Value Names in the IRRobert Widmann2019-01-011-0/+8
| | | | | | | | | | | | | | Summary: Add accessors so the performance improvement from this setting is accessible to third parties. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56179 llvm-svn: 350196
* [x86] move/rename helper for horizontal op codegen; NFCSanjay Patel2019-01-011-16/+16
| | | | | | Preliminary commit as suggested in D56011. llvm-svn: 350193
* Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"Nikita Popov2019-01-012-17/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188
* Reversing the commit in revision 350186. Revision causes regression in 4Ayonam Ray2019-01-012-33/+53
| | | | | | tests. llvm-svn: 350187
* Omit range checks from jump tables when lowering switches with unreachableAyonam Ray2019-01-012-53/+33
| | | | | | | | | | | | | | | default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186
* [InstCombine] canonicalize MUL with NEG operandChen Zheng2019-01-011-0/+5
| | | | | | | | | -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185
* [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.Craig Topper2018-12-311-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179
* [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.Craig Topper2018-12-311-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D56169 llvm-svn: 350178
* Keep tablegen commands in alphabetical order. NFCI.Simon Pilgrim2018-12-311-1/+1
| | | | | | Mentioned on D56167. llvm-svn: 350176
* [AArch64] Accept "sve" as arch feature in assemblerMartin Storsjo2018-12-311-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D56128 llvm-svn: 350174
* [MSan] Handle llvm.is.constant intrinsicAlexander Potapenko2018-12-311-0/+6
| | | | | | | | | | MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173
* [DAGCombiner] Add missing one use check on the shuffle in the ↵Craig Topper2018-12-311-1/+1
| | | | | | | | bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172
* [AArch64] Implement the .arch_extension directiveMartin Storsjo2018-12-301-0/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D56131 llvm-svn: 350169
* [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction ↵Kang Zhang2018-12-301-3/+8
| | | | | | | | | | | | | | | | | | that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165
* [NFC] Fixed extra semicolon warningDavid Bolvansky2018-12-301-1/+1
| | | | | | | | -This line, and those below, will be ignored-- M lib/Support/Error.cpp llvm-svn: 350162
* [PowerPC] Fix ADDE, SUBE do not know how to promote operatorKang Zhang2018-12-301-0/+5
| | | | | | | | | | | | | | Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161
* [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.Craig Topper2018-12-301-6/+0
| | | | | | | | This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better. Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it. llvm-svn: 350159
* [X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from ↵Craig Topper2018-12-301-2/+47
| | | | | | | | | | 16i16/v32i8 to v4i64 when v4i64 needs splitting. This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh. We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization. llvm-svn: 350158
* [PowerPC][NFC] Macro for register set defs for the Asm ParserNemanja Ivanovic2018-12-293-355/+79
| | | | | | | | | | | We have some unfortunate code in the back end that defines a bunch of register sets for the Asm Parser. Every time another class is needed in the parser, we have to add another one of those definitions with explicit lists of registers. This NFC patch simply provides macros to use to condense that code a little bit. Differential revision: https://reviews.llvm.org/D54433 llvm-svn: 350156
* [PowerPC] Complete the custom legalization of vector int to fp conversionNemanja Ivanovic2018-12-292-45/+93
| | | | | | | | | | | | | | | A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155
* [PowerPC] Fix CR Bit spill pseudo expansionNemanja Ivanovic2018-12-291-5/+8
| | | | | | | | | | | | | | | | The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153
* [mips] Show an error on attempt to use 64-bit PC-relative relocationSimon Atanasyan2018-12-291-0/+4
| | | | | | | | | | | | The following code requests 64-bit PC-relative relocations unsupported by MIPS ABI. Now it triggers an assertion. It's better to show an error message. ``` foo: .quad bar - foo ``` llvm-svn: 350152
* [mips] Show a regular error message on attempt to use one byte relocationSimon Atanasyan2018-12-291-1/+4
| | | | llvm-svn: 350151
* Drop SE cache early because loop parent can change in LoopSimplifyCFGMax Kazantsev2018-12-291-3/+7
| | | | llvm-svn: 350145
* [WebAssembly] Fix comments in ExplicitLocals (NFC)Heejin Ahn2018-12-291-3/+4
| | | | llvm-svn: 350144
* Add vtable anchor to classes.Richard Trieu2018-12-298-0/+22
| | | | llvm-svn: 350142
* [X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector ↵Craig Topper2018-12-291-8/+0
| | | | | | | | widening legalization. This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better. llvm-svn: 350141
* [X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of ↵Craig Topper2018-12-281-25/+47
| | | | | | | | | | | | | | v2i8/v2i16/v2i32 multiply. Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly. Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs. I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization llvm-svn: 350134
* [UnrollRuntime] NFC: Add comment and verify LCSSAAnna Thomas2018-12-281-2/+2
| | | | | | | Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131
* [AArch64] Add command-line option for SBDiogo N. Sampaio2018-12-283-9/+9
| | | | | | | | | | | | | | | SB (Speculative Barrier) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SB, as it was previously only possible to enable by selecting -march=armv8.5-a. This patch also moves to FeatureSB the old FeatureSpecRestrict. Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman Differential Revision: https://reviews.llvm.org/D55921 llvm-svn: 350126
* [PowerPC] handle ISD:TRUNCATE in BitPermutationSelectorHiroshi Inoue2018-12-281-8/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118
* Temporarily disable term folding in LoopSimplifyCFG, add testsMax Kazantsev2018-12-281-1/+1
| | | | llvm-svn: 350117
* [LoopSimplifyCFG] Delete dead blocks in RPOMax Kazantsev2018-12-281-5/+8
| | | | | | | | | Deletion of dead blocks in arbitrary order may lead to failure of assertion in `DeleteDeadBlock` that requires that we have deleted all predecessors before we can delete the current block. We should instead delete them in RPO order. llvm-svn: 350116
* [PowerPC] Remove the implicit use of the register if it is replaced by ImmQingShan Zhang2018-12-282-6/+37
| | | | | | | | If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115
OpenPOWER on IntegriCloud