summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.Chad Rosier2017-05-111-0/+28
| | | | | | Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822
* [AMDGPU] Placate unused variable warning in release builds.Davide Italiano2017-05-111-0/+1
| | | | llvm-svn: 302821
* [MSP430] Generate EABI-compliant libcallsVadzim Dambrouski2017-05-114-38/+237
| | | | | | | | | | | | | Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820
* [LiveVariables] Switch Kill/Defs sets to be DenseSet(s).Davide Italiano2017-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The testcase in PR32984 shows a non linear compile time increase after a change that made the LoopUnroll pass more aggressive (increasing the threshold). My profiling shows all the time of PHI elimination goes to llvm::LiveVariables::addNewBlock. This is because we keep Defs/Kills registers in a SmallSet and vfind(const T &V); is O(N). Switching to a DenseSet reduces the time spent in the pass from 297 seconds to 97 seconds. Profiling still shows a lot of time is spent iterating the data structure, so I guess there's room for improvement. Dan tells me GCC uses real set operations for live registers and it takes no-time on this testcase. Matthias points out we might want to switch all this to LiveIntervalAnalysis so it's not entirely sure if a rewrite is worth it. Differential Revision: https://reviews.llvm.org/D33088 llvm-svn: 302819
* [APInt] Remove an APInt copy from the return of APInt::multiplicativeInverse.Craig Topper2017-05-111-1/+4
| | | | llvm-svn: 302816
* [APInt] Fix typo in comment. NFCCraig Topper2017-05-111-1/+1
| | | | llvm-svn: 302815
* AMDGPU: Remove tfe bit from flat instruction definitionsMatt Arsenault2017-05-113-23/+22
| | | | | | | | | | We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-114-1/+31
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* [AMDGPU] Fix incorrect register pressure calculationStanislav Mekhanoshin2017-05-111-2/+3
| | | | | | | | | Earlier fix D32572 introduced a bug where live-ins were calculated for basic block instead of scheduling region. This change fixes it. Differential Revision: https://reviews.llvm.org/D33086 llvm-svn: 302812
* [SLP] Emit optimization remarksAdam Nemet2017-05-111-6/+36
| | | | | | | | | | | | | | | | | | The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
* [PowerPC] Eliminate integer compare instructions - vol. 1Nemanja Ivanovic2017-05-115-5/+284
| | | | | | | | | | | | | This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810
* [DAGCombine] Use SelectionDAG::getAnyExtOrTrunc helper. NFCI.Simon Pilgrim2017-05-111-18/+4
| | | | llvm-svn: 302808
* Fix -DLLVM_ENABLE_THREADS=OFF build after r302748Hans Wennborg2017-05-111-0/+2
| | | | llvm-svn: 302806
* [IR] Allow attributes with global variablesJaved Absar2017-05-117-7/+107
| | | | | | | | | | | | | This patch extends llvm-ir to allow attributes to be set on global variables. An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html A key part of that proposal was to extend LLVM-IR to carry attributes on global variables. This generic feature could be useful for multiple purposes. In our present context, it would be useful to carry user specified sections for bss/rodata/data. Reviewed by: Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D32009 llvm-svn: 302794
* [GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection.Igor Breger2017-05-111-105/+0
| | | | | | Now it handle by TableGen. llvm-svn: 302793
* [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFCAyal Zaks2017-05-111-80/+101
| | | | | | | | | | | | | | Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790
* [msan] Fix PR32842Alexander Potapenko2017-05-111-2/+5
| | | | | | | | | | | | | | | | | | | | It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless. This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero). Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0. This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact. For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code: orq %rcx, %rax jne .LBB0_6 , instead of: orl %ecx, %eax testb $1, %al jne .LBB0_6 llvm-svn: 302787
* [x86] Fix a failure to select with AVX-512 when the type legalizerChandler Carruth2017-05-111-5/+29
| | | | | | | | | | | | | | | | | | | | | | | | | manages to form a VSELECT with a non-i1 element type condition. Those are technically allowed in SDAG (at least, the generic type legalization logic will form them and I wouldn't want to try to audit everything te preclude forming them) so we need to be able to lower them. This isn't too hard to implement. We mark VSELECT as custom so we get a chance in C++, add a fast path for i1 conditions to get directly handled by the patterns, and a fallback when we need to manually force the condition to be an i1 that uses the vptestm instruction to turn a non-mask into a mask. This, unsurprisingly, generates awful code. But it at least doesn't crash. This was actually impacting open source packages built with LLVM for AVX-512 in the wild, so quickly landing a patch that at least stops the immediate bleeding. I think I've found where to fix the codegen quality issue, but less confident of that change so separating it out from the thing that doesn't change the result of any existing test case but causes mine to not crash. llvm-svn: 302785
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-05-112-2/+2
| | | | llvm-svn: 302784
* [ARM][GlobalISel] Legalize narrow scalar ops by wideningDiana Picus2017-05-111-3/+5
| | | | | | | | | | | | | | This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB and MUL to 32 bits since we only have TableGen patterns for 32 bits. See the commit message for r292827 for more details. At this point we could just remove some of the tests for regbankselect and instruction-select, since we're not going to see any narrow operations at those levels anymore. Instead I decided to update them with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences generated by the legalizer. llvm-svn: 302782
* Remove spurious cast of nullptr. NFC.Serge Guelton2017-05-115-6/+6
| | | | | | Conversion rules allow automatic casting of nullptr to any pointer type. llvm-svn: 302780
* Remove now useless trailing nullptr in StructType::getSerge Guelton2017-05-111-1/+1
| | | | llvm-svn: 302779
* [ARM][GlobalISel] Support for G_ANYEXTDiana Picus2017-05-112-10/+4
| | | | | | | | | | | | | | G_ANYEXT can be introduced by the legalizer when widening scalars. Add support for it in the register bank info (same mapping as everything else) and in the instruction selector. When selecting it, we treat it as a COPY, just like G_TRUNC. On this occasion we get rid of some assertions in selectCopy so we can reuse it. This shouldn't be a problem at the moment since we're not supporting any complicated cases (e.g. FPR, different register banks). We might want to separate the paths when we do. llvm-svn: 302778
* [GlobalISel][X86] G_ICMP support.Igor Breger2017-05-112-0/+69
| | | | | | | | | | | | | | Summary: support G_ICMP for scalar types i8/i16/i64. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D32995 llvm-svn: 302774
* [APInt] Remove an unneeded extra temporary APInt from toString.Craig Topper2017-05-111-5/+1
| | | | | | Turns out udivrem can write its output to the same location as one of its inputs so the extra temporary isn't needed. llvm-svn: 302772
* [APInt] Use negate() instead of copying an APInt to negate it and then ↵Craig Topper2017-05-111-3/+3
| | | | | | writing back over the original value. llvm-svn: 302770
* [SCEV] Reduce possible APInt allocations a bit.Craig Topper2017-05-111-7/+11
| | | | llvm-svn: 302769
* [SCEV] Remove unneeded 'using namespace APIntOps'.Craig Topper2017-05-111-37/+34
| | | | llvm-svn: 302768
* [X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFCIgor Breger2017-05-113-42/+46
| | | | | | | | | | | | | | | | Summary: Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector. This is a pre-commit for a patch I'm working on to support G_ICMP. NFC. Reviewers: zvi, guyblank, delena Reviewed By: guyblank, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33038 llvm-svn: 302767
* Final (hopefully) fix for the build bots.Zachary Turner2017-05-111-1/+1
| | | | | | | | This time it actually occurred to me to change the #defines to actually test the pre-processed out codepath. Hopefully this time it works. llvm-svn: 302752
* Try again to fix the buildbots.Zachary Turner2017-05-111-1/+1
| | | | | | | TaskGroup and Latch need to be in llvm::parallel::detail, not in llvm::detail. llvm-svn: 302751
* Fix build errors with Parallel.Zachary Turner2017-05-111-1/+1
| | | | llvm-svn: 302749
* [Support] Move Parallel algorithms from LLD to LLVM.Zachary Turner2017-05-112-0/+137
| | | | | | Differential Revision: https://reviews.llvm.org/D33024 llvm-svn: 302748
* [libFuzzer] fix a compiler warningKostya Serebryany2017-05-101-1/+2
| | | | llvm-svn: 302747
* Revert "[SDAG] Relax conditions under stores of loaded values can be merged"David L. Jones2017-05-101-22/+10
| | | | | | | | | | | | | | | | | | | | | This reverts r302712. The change fails with ASAN enabled: ERROR: AddressSanitizer: use-after-poison on address ... at ... READ of size 2 at ... thread T0 #0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42 #1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3 #2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17 #3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7 Reviewers: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33081 llvm-svn: 302746
* [IR] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).Eugene Zelenko2017-05-102-20/+67
| | | | llvm-svn: 302744
* [PHIElimination] Use the same name for DEBUG_TYPE and pass name.Davide Italiano2017-05-101-1/+1
| | | | | | In an attempt to reduce the confusion. llvm-svn: 302742
* [InstCombine] remove fold that swaps xor/or with constants; NFCISanjay Patel2017-05-101-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | // (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2) This canonicalization was added at: https://reviews.llvm.org/rL7264 By moving xors out/down, we can more easily combine constants. I'm adding tests that do not change with this patch, so we can verify that those kinds of transforms are still happening. This is no-functional-change-intended because there's a later fold: // (X^C)|Y -> (X|Y)^C iff Y&C == 0 ...and demanded-bits appears to guarantee that any fold that would have hit the fold we're removing here would be caught by that 2nd fold. Similar reasoning was used in: https://reviews.llvm.org/rL299384 The larger motivation for removing this code is that it could interfere with the fix for PR32706: https://bugs.llvm.org/show_bug.cgi?id=32706 Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. Differential Revision: https://reviews.llvm.org/D33050 llvm-svn: 302733
* AMDGPU: Make some packed shuffles freeMatt Arsenault2017-05-102-1/+36
| | | | | | | VOP3P instructions can encode access to either half of the register. llvm-svn: 302730
* AMDGPU: Add new subtarget features for gfx9 flat instructionsMatt Arsenault2017-05-103-1/+38
| | | | | | | Flat instructions gain an immediate offset, and 2 new sets of segment specific flat instructions are added. llvm-svn: 302729
* [ConstantRange] Fix the early out in ConstantRange::multiply for positive ↵Craig Topper2017-05-101-1/+2
| | | | | | | | | | numbers to really do what the comment says r271020 added an early out to skip the signed multiply portion of ConstantRange::multiply. The comment says we don't need to do signed multiply if the range is only positive numbers, but the implemented check only ensures that the start of the range is positive. It doesn't look at the end of the range. This patch checks the end of the range instead. Because Upper is one more than the end we have to see if its positive or if its one past the last positive number. llvm-svn: 302717
* [APInt] Add negate helper method to implement twos complement. Use it to ↵Craig Topper2017-05-101-6/+3
| | | | | | shorten code. llvm-svn: 302716
* [NewGVN] Introduce a definesNoMemory() helper and use it.Davide Italiano2017-05-101-3/+5
| | | | | | | This is nice as is, but it will be used in my next patch to fix a bug. Suggested by Daniel Berlin. llvm-svn: 302714
* [SDAG] Relax conditions under stores of loaded values can be mergedNirav Dave2017-05-101-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Allow consecutive stores whose values come from consecutive loads to merged in the presense of other uses of the loads. Previously this was disallowed as in general the merged load cannot be shared with the other uses. Merging N stores into 1 may cause as many as N redundant loads. However in the context of caching this should have neglible affect on memory pressure and reduce instruction count making it almost always a win. Fixes PR32086. Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30471 llvm-svn: 302712
* Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builderTeresa Johnson2017-05-103-2/+5
| | | | | | | | | | | | This fixes a ubsan bot failure after r302597, which made getProfileCount non-static, but ended up invoking it on a null ProfileSummaryInfo object in some cases from buildModuleSummaryIndex. Most testing passed because the non-static getProfileCount currently doesn't access any member variables, but I found this when testing a follow on patch (D32877) that adds a member variable access. llvm-svn: 302705
* [APInt] Make toString use udivrem instead of calling the divide helper ↵Craig Topper2017-05-101-8/+9
| | | | | | | | | | method directly. Do a better job of reusing allocations while looping. NFCI This lets toString take advantage of the degenerate case checks in udivrem and is just generally cleaner. One minor downside of this is that the divisor APInt now needs to be the same size as Tmp which requires an additional allocation. But we were doing a poor job of reusing allocations before so the new code should still be an improvement. llvm-svn: 302704
* [APInt] Use uint32_t instead of unsigned for the storage type throughout the ↵Craig Topper2017-05-101-39/+34
| | | | | | divide code. Use Lo_32/Hi_32/Make_64 helpers instead of casts and shifts. NFCI llvm-svn: 302703
* [APInt] Use getRawData to slightly simplify some code.Craig Topper2017-05-101-2/+2
| | | | llvm-svn: 302702
* [APInt] Remove check for single word since single word was handled earlier ↵Craig Topper2017-05-101-2/+2
| | | | | | in the function. NFC llvm-svn: 302701
* Small refactoring in DAGCombine. NFCAmaury Sechet2017-05-101-3/+3
| | | | llvm-svn: 302699
OpenPOWER on IntegriCloud