summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Teach lots of the new vector shuffle lowering to use UNPCKChandler Carruth2014-08-161-0/+24
| | | | | | | instructions for blend operations at 128 bits. This was a serious hole in our prior blend lowering. llvm-svn: 215819
* InstCombine: Fix a potential bug in 0 - (X sdiv C) -> (X sdiv -C)David Majnemer2014-08-162-1/+23
| | | | | | | | | | | | | | | While *most* (X sdiv 1) operations will get caught by InstSimplify, it is still possible for a sdiv to appear in the worklist which hasn't been simplified yet. This means that it is possible for 0 - (X sdiv 1) to get transformed into (X sdiv -1); dividing by -1 can make the transform produce undef values instead of the proper result. Sorry for the lack of testcase, it's a bit problematic because it relies on the exact order of operations in the worklist. llvm-svn: 215818
* InstCombine: Combine mul with div.David Majnemer2014-08-161-2/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can combne a mul with a div if one of the operands is a multiple of the other: %mul = mul nsw nuw %a, C1 %ret = udiv %mul, C2 => %ret = mul nsw %a, (C1 / C2) This can expose further optimization opportunities if we end up multiplying or dividing by a power of 2. Consider this small example: define i32 @f(i32 %a) { %mul = mul nuw i32 %a, 14 %div = udiv exact i32 %mul, 7 ret i32 %div } which gets CodeGen'd to: imull $14, %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $2, %eax retq We can now transform this into: define i32 @f(i32 %a) { %shl = shl nuw i32 %a, 1 ret i32 %shl } which gets CodeGen'd to: leal (%rdi,%rdi), %eax retq This fixes PR20681. llvm-svn: 215815
* arm asm: Let .fpu enable instructions, PR20447.Nico Weber2014-08-161-0/+36
| | | | | | | | | I'm not very happy with duplicating the fpu->feature mapping in ARMAsmParser.cpp and in clang's driver. See the bug for a patch that doesn't do that, and the review thread [1] for why this duplication exists. 1: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140811/231052.html llvm-svn: 215811
* BitcodeReader: Only create one basic block for each blockaddressDuncan P. N. Exon Smith2014-08-162-20/+18
| | | | | | | | | | | | | | | | Block address forward-references are implemented by creating a `BasicBlock` ahead of time that gets inserted in the `Function` when it's eventually encountered. However, if the same blockaddress was used in two separate functions that were parsed *before* the referenced function (and the blockaddress was never used at global scope), two separate basic blocks would get created, one of which would be forgotten creating invalid IR. This commit changes the forward-reference logic to create only one basic block (and always return the same blockaddress). llvm-svn: 215805
* UseListOrder: Correctly count the number of usesDuncan P. N. Exon Smith2014-08-161-2/+2
| | | | | | | | | | This is an off-by-one bug I found by inspection, which would only trigger if the bitcode writer sees more uses of a `Value` than the reader. Since this is only relevant when an instruction gets upgraded somehow, there unfortunately isn't a reasonable way to add test coverage. llvm-svn: 215804
* IR: Don't add inbounds to GEPs of extern_weak variablesDuncan P. N. Exon Smith2014-08-161-3/+4
| | | | | | | | | | | | Global variables that have `extern_weak` linkage may be null, so it's incorrect to add `inbounds` when constant folding. This also fixes a bug when parsing global aliases, whose forward reference placeholders are global variables with `extern_weak` linkage. If GEPs to these aliases are encountered before the alias itself, the GEPs would incorrectly gain the `inbounds` keyword as well. llvm-svn: 215803
* [DAGCombiner] Improve the folding of target independet shuffles to Undef.Andrea Di Biagio2014-08-161-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | When combining a pair of shuffle nodes, check if the combined shuffle mask is trivially Undef. In case, immediately fold that pair of shuffles to Undef. The lack of checks for undef masks was the root-cause of a poor-codegen bug in the dag combiner. Example: %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6> %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3> Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire sequence to Undef value and therefore we generated: shufps $-123, %xmm1, $xmm0 pshufd $-46, %xmm0, %xmm0 With this patch, the entire shuffle sequence is folded to Undef and no shuffles are generated in the output assembly. Added new test cases to test 'combine-vec-shuffle-5.ll'. llvm-svn: 215797
* [PowerPC] Mark fixed-offset byvals as pointed-to by IR valuesHal Finkel2014-08-161-2/+2
| | | | | | | | | | | | A byval object, even if allocated at a fixed offset (prescribed by the ABI) is pointed to by IR values. Most fixed-offset stack objects are not pointed-to by IR values, so the default is to assume this is not possible. However, we need to override the default in this case (instruction scheduling can cause miscompiles otherwise). Fixes PR20280. llvm-svn: 215795
* Make isAliased property for fixed-offset stack objects adjustableHal Finkel2014-08-162-11/+9
| | | | | | | | | | | | | | | | | | | | We used to assume that any fixed-offset stack object was not aliased. This meant that no IR value could point to the memory contained in such an object. This is a reasonable default, but is not a universally-correct target-independent fact. For example, on PowerPC (both Darwin and non-Darwin), some byval arguments are allocated at fixed offsets by the ABI. These, however, certainly can be pointed to by IR values. This change moves the 'isAliased' logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased property to be overridden for fixed-offset objects. This will be used by an upcoming commit to the PowerPC backend to fix PR20280. No functionality change intended (the behavior of FixedStackPseudoSourceValue::isAliased has been made more conservative for callers that don't pass an MFI object, but I don't see any in-tree callers that do that). llvm-svn: 215794
* [PowerPC] Darwin byval arguments are not immutableHal Finkel2014-08-161-1/+1
| | | | | | | | | | | | On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's frame, but are not immutable -- the pointer value is directly available to the higher-level code as the address of the argument, and the value of the byval argument can be modified at the IR level. This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in a follow-up commit, its test case will cover this change. llvm-svn: 215793
* Revert "[Support] Promote cl::StringSaver to a separate utility"Sean Silva2014-08-151-7/+27
| | | | | | | | | This reverts commit r215784 / 3f8a26f6fe16cc76c98ab21db2c600bd7defbbaa. LLD has 3 StringSaver's, one of which takes a lock when saving the string... Need to investigate more closely. llvm-svn: 215790
* Get rid of dead code: SelectAtomic64 in X86ISelDAGtoDAG.cppRobin Morisset2014-08-151-19/+0
| | | | llvm-svn: 215789
* [Support] Promote cl::StringSaver to a separate utilitySean Silva2014-08-151-27/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This class is generally useful. In breaking it out, the primary change is that it has been made non-virtual. It seems like being abstract led to there being 3 different (2 in llvm + 1 in clang) concrete implementations which disagreed about the ownership of the saved strings (see the manual call to free() in the unittest StrDupSaver; yes this is different from the CommandLine.cpp StrDupSaver which owns the stored strings; which is different from Clang's StringSetSaver which just holds a reference to a std::set<std::string> which owns the strings). I've identified 2 other places in the codebase that are open-coding this pattern: memcpy(Alloc.Allocate<char>(strlen(S)+1), S, strlen(S)+1) I'll be switching them over. They are * llvm::sys::Process::GetArgumentVector * The StringAllocator member of YAMLIO's Input class This also will allow simplifying Clang's driver.cpp quite a bit. Let me know if there are any other places that could benefit from StringSaver. I'm also thinking of adding a saveStringRef member for getting a stable StringRef. llvm-svn: 215784
* Fix typos in commentsRobin Morisset2014-08-155-7/+7
| | | | llvm-svn: 215777
* [AArch32] Add support for FP rounding operations for ARMv8/AArch32.Chad Rosier2014-08-152-12/+29
| | | | | | Phabricator Revision: http://reviews.llvm.org/D4935 llvm-svn: 215772
* [Option] Support MultiArg in --helpNick Kledzik2014-08-151-1/+12
| | | | | | | | Currently, if you use a MultiArg<> option, then printing out the help/usage message will cause an assert. This fixes getOptionHelpName() to work with MultiArg Options. llvm-svn: 215770
* Set comdats when lazily linking functions.Rafael Espindola2014-08-151-0/+5
| | | | | | | We were setting the comdat when functions were copied in the initial pass, but not when they were linked only when we found out that they are needed. llvm-svn: 215765
* [FastISel][AArch64] Fix a latent bug in floating-point materialization.Juergen Ributzka2014-08-151-1/+10
| | | | | | | | | | | | | | | | | | | | | | The floating-point value positive zero (+0.0) is a valid immedate value according to isFPImmLegal. As a result AArch64 FastISel went ahead and used the immediate version of fmov to materialize the constant. The problem is that the immediate version of fmov cannot encode an imediate for postive zero. Instead a fmov from the zero register was supposed to be used in this case. This fix adds handling for this special case and uses fmov from the zero register to materialize a positive zero (negative zeroes go to the constant pool). There is no test case for this, because this code is currently dead. It will be enabled in a future commit and I will add a test case in a separate commit after that. This fixes <rdar://problem/18027157>. llvm-svn: 215753
* Reapplying [FastISel][AArch64] Cleanup constant materialization code. NFCI.Juergen Ributzka2014-08-151-26/+30
| | | | | | | | | | Note: This reapplies r215582 without any modifications. The refactoring wasn't responsible for the buildbot failures. Original commit message: Cleanup and prepare constant materialization code for future commits. llvm-svn: 215752
* R600/SI: Move all fabs / fneg handling to patternsMatt Arsenault2014-08-152-117/+31
| | | | llvm-svn: 215749
* R600/SI: Use source modifiers for f64 fnegMatt Arsenault2014-08-153-6/+38
| | | | llvm-svn: 215748
* R600/SI: Use source modifier for f64 fabsMatt Arsenault2014-08-152-2/+30
| | | | llvm-svn: 215747
* R600/SI: Refactor fneg / fabs patternsMatt Arsenault2014-08-151-22/+17
| | | | llvm-svn: 215746
* Fix the build with MSVC 2013 after new shuffle codeReid Kleckner2014-08-151-2/+8
| | | | | | | | | | | | MSVC gives this awesome diagnostic: ..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1' ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl' Using an anonymous namespace makes the problem go away. llvm-svn: 215744
* R600/SI: Fix offset folding in some cases with shifted pointers.Matt Arsenault2014-08-154-1/+137
| | | | | | | | | | | | | Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) is only done if the add has one use. If the resulting constant add can be folded into an addressing mode, force this to happen for the pointer operand. This ends up happening a lot because of how LDS objects are allocated. Since the globals are allocated next to each other, acessing the first element of the second object is directly indexed by a shifted pointer. llvm-svn: 215739
* [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructionsChandler Carruth2014-08-151-0/+42
| | | | | | where applicable for blending. llvm-svn: 215737
* [FastISel] Remove an performance debugging assert.Juergen Ributzka2014-08-151-1/+0
| | | | | | | | As Jim pointed out this assert isn't really needed to test for correctness, because the code right afterwards does the same check and falls-back to SelectionDAG - as intended. llvm-svn: 215735
* R600/SI: Add intrinsic for ldexpMatt Arsenault2014-08-154-2/+14
| | | | llvm-svn: 215734
* R600/SI: Implement isLegalAddressingModeMatt Arsenault2014-08-152-0/+47
| | | | | | | | | | | | | The default assumes that a 16-bit signed offset is used. LDS instruction use a 16-bit unsigned offset, so it wasn't being used in some cases where it was assumed a negative offset could be used. More should be done here, but first isLegalAddressingMode needs to gain an addressing mode argument. For now, copy most of the rest of the default implementation with the immediate offset change. llvm-svn: 215732
* ARM: Fix and re-enable load/store optimizer for Thumb1.Moritz Roth2014-08-151-111/+8
| | | | | | | | | | | | | | | In a previous iteration of the pass, we would try to compensate for writeback by updating later instructions and/or inserting a SUBS to reset the base register if necessary. Since such a SUBS sets the condition flags it's not generally safe to do this. For now, only merge LDR/STRs if there is no writeback to the base register (LDM that loads into the base register) or the base register is killed by one of the merged instructions. These cases are clear wins both in terms of instruction count and performance. Also add three new test cases, and update the existing ones accordingly. llvm-svn: 215729
* ARM load/store optimizer: Compute BaseKill correctly.Moritz Roth2014-08-151-5/+11
| | | | | | | | | | | This adds some code back that was deleted in r92053. The location of the last merged memory operation needs to be kept up-to-date since MemOps may be in a different order to the original instruction stream to allow merging (since registers need to be in ascending order). Also simplify the logic to determine BaseKill using findRegisterUseOperandIdx to use an equivalent function call instead. llvm-svn: 215728
* [FastISel][ARM] Fix a think-o in my previous commit (r215682).Juergen Ributzka2014-08-151-15/+15
| | | | | | | | | We actually need to return the register into which we materialized the constant and not just "true" for success. This code is currently partially dead, that is why it didn't trigger any failures yet. Once I change the order of the constant materialization this code will be fully exercised. llvm-svn: 215727
* Introduce a helper to combine instruction metadata.Rafael Espindola2014-08-154-75/+70
| | | | | | | | | Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use it. Patch by Björn Steinbrink! llvm-svn: 215723
* Make EmitAbsValue an static helper.Rafael Espindola2014-08-152-31/+31
| | | | llvm-svn: 215721
* Delete dead code. NFC.Rafael Espindola2014-08-153-11/+0
| | | | llvm-svn: 215720
* Make EmitDwarfSetLineAddr an static helper. NFC.Rafael Espindola2014-08-152-13/+13
| | | | llvm-svn: 215718
* Make BuildSymbolDiff an static helper.Rafael Espindola2014-08-152-15/+13
| | | | llvm-svn: 215717
* [AArch64] Narrow arguments passed in wrong position on the stack inAmara Emerson2014-08-151-2/+2
| | | | | | | | | | big-endian mode. Patch by Asiri Rathnayake. Differential Revision: http://reviews.llvm.org/D4922 llvm-svn: 215716
* Make ForceExpAbs an static helper.Rafael Espindola2014-08-151-3/+4
| | | | llvm-svn: 215715
* Add a helper to MCExpr for when an expression is know to be absolute.Rafael Espindola2014-08-153-22/+23
| | | | llvm-svn: 215713
* Remove HasLEB128.Rafael Espindola2014-08-1513-17/+0
| | | | | | We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712
* PPC: Clean up pointer casting, no functionality change.Benjamin Kramer2014-08-151-2/+2
| | | | | | Silences GCC's -Wcast-qual. llvm-svn: 215703
* [x86] Add the initial skeleton of type-based dispatch for AVX vectors inChandler Carruth2014-08-151-9/+125
| | | | | | | | | | | | | the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] llvm-svn: 215702
* [x86] Teach the instruction printer to decode immediate operands toChandler Carruth2014-08-153-0/+74
| | | | | | | | | BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments. These will be used in my next commit as part of test cases for AVX shuffles which can directly use blend in more places. llvm-svn: 215701
* ARM: implement MRS/MSR (banked reg) system instructions.Tim Northover2014-08-157-4/+241
| | | | | | | | | | These are system-only instructions for CPUs with virtualization extensions, allowing a hypervisor easy access to all of the various different AArch32 registers. rdar://problem/17861345 llvm-svn: 215700
* Remove testcase from README which we didn't get. We do get it now.Erik Verbruggen2014-08-151-1/+1
| | | | llvm-svn: 215699
* Current implementation of c.cond.fmt instructions only accept default cc0 ↵Vladimir Medic2014-08-152-14/+51
| | | | | | register. This patch enables the instruction to accept other fcc registers. The aliases with default fcc0 registers are also defined. llvm-svn: 215698
* [x86] Remove the duplicated code for testing whether we can widen theChandler Carruth2014-08-151-12/+4
| | | | | | | elements of a shuffle mask and simplify how it works. No functionality changed now that the bug that was here has been fixed. llvm-svn: 215696
* [x86] Fix the very broken formation of vpunpck instructions in theChandler Carruth2014-08-151-1/+1
| | | | | | | | | | | | | | | | | target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. llvm-svn: 215690
OpenPOWER on IntegriCloud