summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Small cleanup on how we clear constant variables. NFC.Rafael Espindola2014-12-051-14/+9
| | | | llvm-svn: 223474
* Use an early return. NFC.Rafael Espindola2014-12-051-19/+19
| | | | llvm-svn: 223470
* [msan] Avoid extra origin address realignment.Evgeniy Stepanov2014-12-051-21/+24
| | | | | | | | | Do not realign origin address if the corresponding application address is at least 4-byte-aligned. Saves 2.5% code size in track-origins mode. llvm-svn: 223464
* [X86] Avoid introducing extra shuffles when lowering packed vector shifts.Andrea Di Biagio2014-12-051-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When lowering a vector shift node, the backend checks if the shift count is a shuffle with a splat mask. If so, then it introduces an extra dag node to extract the splat value from the shuffle. The splat value is then used to generate a shift count of a target specific shift. However, if we know that the shift count is a splat shuffle, we can use the splat index 'I' to extract the I-th element from the first shuffle operand. The advantage is that the splat shuffle may become dead since we no longer use it. Example: ;; define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) { %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer %shl = shl <4 x i32> %a, %c ret <4 x i32> %shl } ;; Before this patch, llc generated the following code (-mattr=+avx): vpshufd $0, %xmm1, %xmm1 # xmm1 = xmm1[0,0,0,0] vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq With this patch, the redundant splat operation is removed from the code. vpxor %xmm2, %xmm2 vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7] vpslld %xmm1, %xmm0, %xmm0 retq llvm-svn: 223461
* Add missing FP build attribute tests.Charlie Turner2014-12-051-0/+2
| | | | | | | | | | | | | | | | | | | | | The test file test/CodeGen/ARM/build-attributes.ll was missing several floating-point build attribute tests. The intention of this commit is that for each CPU / architecture currently tested, there are now tests that make sure the following attributes are sufficiently checked, * Tag_ABI_FP_rounding * Tag_ABI_FP_denormal * Tag_ABI_FP_exceptions * Tag_ABI_FP_user_exceptions * Tag_ABI_FP_number_model Also in this commit, the -unsafe-fp-math flag has been augmented with the full suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is, `-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim -enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast' Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc llvm-svn: 223454
* Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for ↵Hal Finkel2014-12-051-7/+1
| | | | | | | | | phys deps" Reverting this because, while it fixes the problem in the reduced test case, it does not fix the problem in the full test case from the bug report. llvm-svn: 223442
* Consider subregs when calling MI::registerDefIsDead for phys depsHal Finkel2014-12-051-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduling dependency graph is built bottom-up within each scheduling region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti dependencies, based on physical registers, to the SUs for instructions based on those that come before them. In the test case, we start before post-RA scheduling with a block that looks like this: ... INLINEASM <... andc $0,$0,$2 stdcx. $0,0,$3 bne- 1b > [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>> ... %X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def> ... %R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill> where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a subregister of %CR0. However, for post-RA scheduling, no dependency was added to prevent the INLINEASM from being scheduled in between the ANDIo8 and the ISEL (which communicate via the %CR0GT register). In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd iterate over all of its aliases (which include %CC itself and also %CR0), and look for previously-encountered defs of those registers. We'd find the ANDIo8, but decide not to add a dependency between the INLINEASM and the ANDIo8 because both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a subregister of %CR0, and thus a dependency still must exist. To fix this problem, when calling registerDefIsDead on the SU with the def, we also check all subregisters for possible non-dead defs, and add the dependency if any are found. Fixes PR21742. llvm-svn: 223440
* IR: Stop relying on GetStringMapEntryFromValue()Duncan P. N. Exon Smith2014-12-051-1/+3
| | | | | | It relies on undefined behaviour. llvm-svn: 223438
* Cleanup: Calls to getDwarfRegNum() may actually fail, if there isAdrian Prantl2014-12-053-27/+44
| | | | | | | | | | | | | no DWARF register number mapping, or if the register was a virtual register that was never materialized. Previously, we would just emit a bogus location, after this patch we don't emit a location at all by doing an early exit. After my bugfix in r223401 today, this doesn't actually happen on any target that I tested this with, but it's still preferable to make the possibility of a failure explicit. llvm-svn: 223428
* linkGlobalVariableProto never returns null. Simplify the caller. NFC.Rafael Espindola2014-12-051-6/+3
| | | | llvm-svn: 223424
* Rename the x86 isTargetMacho to isTargetMachO for uniformity.Eric Christopher2014-12-054-8/+8
| | | | llvm-svn: 223421
* Both of these subtargets have functions that check whether orEric Christopher2014-12-052-3/+2
| | | | | | not the target is mach-o. Use them. llvm-svn: 223420
* Move merging of alignment to a central location. NFC.Rafael Espindola2014-12-051-19/+3
| | | | llvm-svn: 223418
* [X86] Delete dead code in fcopysign lowering. NFC.Ahmed Bougacha2014-12-041-11/+0
| | | | | | | | | r32900 introduced custom lowering for fcopysign, with two checks to change the magnitude value's type if it's larger/smaller than the sign value's type. r32932 replaced that code for the smaller case. r43205 did the same for the larger case, but left the old code, now dead. llvm-svn: 223415
* Simplify implementation and testcase of r223401 based on feedback from dblaikie.Adrian Prantl2014-12-041-4/+2
| | | | llvm-svn: 223405
* Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() isAdrian Prantl2014-12-041-1/+13
| | | | | | | eliminating all uses of a vreg, update any DBG_VALUE describing that vreg to point to the rematerialized register instead. llvm-svn: 223401
* Add a FIXME as requested by Renato Golin.Roman Divacky2014-12-041-0/+3
| | | | llvm-svn: 223390
* Silence warning: variable 'buffer' set but not used.Yaron Keren2014-12-041-3/+5
| | | | llvm-svn: 223389
* [x86] Fix isOffsetSuitableForCodeModel kernel code model offsetBruno Cardoso Lopes2014-12-041-1/+1
| | | | | | | Offset == 0 is a valid offset for kernel code model according to the x86_64 System V ABI. Found by inspection, no testcase. llvm-svn: 223383
* [AArch64] Combining Load and IntToFp should check for neon availabilityWeiming Zhao2014-12-041-3/+4
| | | | llvm-svn: 223382
* Fix yet another unseen regression caused by r223113Asiri Rathnayake2014-12-041-14/+26
| | | | | | | | | | r223113 added support for ARM modified immediate assembly syntax. Which assumes all immediate operands are prefixed with a '#'. This assumption is wrong as per the ARMARM - which recommends that all '#' characters be treated optional. The current patch fixes this regression and adds a test case. A follow-up patch will expand the test coverage to other instructions. llvm-svn: 223381
* Fix thumbv4t indirect callsJonathan Roelofs2014-12-042-11/+50
| | | | | | | | | | | | | | | | | | | | | So there are a couple of issues with indirect calls on thumbv4t. First, the most 'obvious' instruction, 'blx' isn't available until v5t. And secondly, the next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code because the saved off pc has its thumb bit cleared, so when the callee returns we end up in ARM mode.... yuck. The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it. We could cut down on code size by sharing the landing pads between call sites that are close enough, but for the moment let's do correctness first and look at performance later. Patch by: Iain Sandoe http://reviews.llvm.org/D6519 llvm-svn: 223380
* Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots."Hal Finkel2014-12-041-3/+14
| | | | | | | | | | | | | | | | | | | | Reapply r223347, with a fix to not crash on uninserted instructions (or more precisely, instructions in uninserted blocks). bugpoint was able to reduce the test case somewhat, but it is still somewhat large (and relies on setting things up to be simplified during inlining), so I've not included it here. Nevertheless, it is clear what is going on and why. Original commit message: Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223371
* Fix a typo: use of cast where dyn_cast was intendedPhilip Reames2014-12-041-1/+1
| | | | | | | | This bug has the effect of converting a test of isGCRelocate(InvokeInst*) from a false return to a crash. This may be the root cause of the crash Joerg reported against r223137, but I'm still waiting for a clean build of clang to complete to be able to confirm. Once I've confirmed the issue, I'll submit a test case separately. llvm-svn: 223370
* Remove dead code. NFC.Rafael Espindola2014-12-046-87/+11
| | | | | | This interface was added 2 years ago but users never developed. llvm-svn: 223368
* Fix a minor regression introduced in r223113Asiri Rathnayake2014-12-041-11/+21
| | | | | | | | | | r223113 added support for ARM modified immediate assembly syntax. That patch has broken support for immediate expressions, as in: add r0, #(4 * 4) It wasn't caught because we don't have any tests for this feature. This patch fixes this regression and adds test cases. llvm-svn: 223366
* Revert r223347 which has caused crashes on bootstrap bots.Alexander Potapenko2014-12-041-13/+3
| | | | llvm-svn: 223364
* Revert "[Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list ↵Rafael Espindola2014-12-041-145/+89
| | | | | | | | | | for PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090>" This reverts commit r223356. It was failing check-all (MC/ARM/thumb.s in particular). llvm-svn: 223363
* [X86] Improve a dag-combine that handles a vector extract -> zext sequence.Michael Kuperstein2014-12-041-24/+51
| | | | | | | | | The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads. According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better. Differential Revision: http://reviews.llvm.org/D6501 llvm-svn: 223360
* [Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list for ↵Jyoti Allur2014-12-041-89/+145
| | | | | | PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090> llvm-svn: 223356
* [X86] Simplify code. NFC.Andrea Di Biagio2014-12-041-16/+6
| | | | | | | | Replaced some logic that checked if a build_vector node is doing a splat of a non-undef value with a call to method BuildVectorSDNode::getSplatValue(). No functional change intended. llvm-svn: 223354
* Use DomTree in MachineSink to sink over diamonds.Patrik Hagglund2014-12-041-15/+19
| | | | | | | | | | | | | | | | | | | | | According to a previous FIXME comment we now not only look at MBB successors, but also handle code sinking past them: x = computation if () {} else {} use x The instruction could be sunk over the whole diamond for the if/then/else (or loop, etc), allowing it to be sunk into other blocks after that. Modified test added in r204522, due to one spill less present. Minor fixes in comments. Patch provided by Jonas Paulsson. Reviewed by Hal Finkel. llvm-svn: 223350
* [InstCombine] Minor optimization for bswap with binary opsSimon Pilgrim2014-12-043-0/+67
| | | | | | | | | | | | | | | | | Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349
* Masked Load / Store Intrinsics - the CodeGen part.Elena Demikhovsky2014-12-0416-6/+661
| | | | | | | | | | | | | | | | | | I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
* Restrict somewhat the memory-allocation pointer cmp opt from r223093Hal Finkel2014-12-041-3/+13
| | | | | | | | | | Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223347
* clang-formatted ranged loops and assignment, NFC.Yaron Keren2014-12-041-12/+12
| | | | llvm-svn: 223344
* Add mach-o LC_RPATH support to llvm-objdumpJean-Daniel Dupas2014-12-041-0/+5
| | | | | | | | | | Summary: Add rpath load command support in Mach-O object and update llvm-objdump to use it. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6512 llvm-svn: 223343
* [X86] Clean up whitespace as well as minor coding styleMichael Liao2014-12-0433-409/+403
| | | | llvm-svn: 223339
* [Hexagon] Marking some instructions as CodeGenOnly=0 and adding disassembly ↵Colin LeMahieu2014-12-043-3/+9
| | | | | | tests. llvm-svn: 223334
* [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmpMichael Liao2014-12-044-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit on - This patch fixes the bug described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html The fix allocates an extra slot just below the GPRs and stores the base pointer there. This is done only for functions containing llvm.eh.sjlj.setjmp that also need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of the callee-save GPRs in the prologue, the offset to the extra slot can be computed before prologue generation runs. Impact at run-time on affected functions is:: - One extra store in the prologue, The store saves the base pointer. - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer. Because the extra slot is just above a gap between frame-pointer-relative and base-pointer-relative chunks of memory, there is no impact on other offset calculations other than ensuring there is room for the extra slot. http://reviews.llvm.org/D6388 Patch by Arch Robison <arch.robison@intel.com> llvm-svn: 223329
* [PowerPC] 'cc' should be an alias only to 'cr0'Hal Finkel2014-12-041-4/+2
| | | | | | | | | | | We had mistakenly believed that GCC's 'cc' referred to the entire condition-code register (cr0 through cr7) -- and implemented this in r205630 to fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM to clobber too much with legacy code with inline asm using the 'cc' clobber. Fixes PR21451. llvm-svn: 223328
* HexagonMCInst.h: Qualify constants explicitly to appease msc17.NAKAMURA Takumi2014-12-041-2/+2
| | | | llvm-svn: 223325
* Allow target to specify prefix for labelsMatt Arsenault2014-12-049-4/+13
| | | | | | | | Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323
* A few more checks for gc.statepoints in the VerifierPhilip Reames2014-12-041-0/+11
| | | | | | | | | This is simply a grab bag of unrelated checks: - A statepoint call can't be marked readonly or readnone - We don't currently support inline asm or varadic target functions. Both could be supported, but don't currently work. - I forgot to check that the number of call arguments actually matched the wrapped callee in my previous change. Included here. llvm-svn: 223322
* [PowerPC] Fix inline asm memory operands not to use r0Hal Finkel2014-12-031-2/+12
| | | | | | | | | | | | | | | On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is a register containing the address. As a result, this register cannot be r0, and we need to enforce this register subclass constraint to prevent miscompiling the code (we'd get this constraint for free with the usual instruction definitions, but that scheme has no knowledge of how we end up printing inline asm memory operands, and so here we need to do it 'by hand'). We can accomplish this within the current address-mode selection framework by introducing an explicit COPY_TO_REGCLASS node. Fixes PR21443. llvm-svn: 223318
* [RegAllocFast] Handle implicit definitions conservatively.Quentin Colombet2014-12-031-7/+14
| | | | | | | | | | | | | | | | | Prior to this commit, physical registers defined implicitly were considered free right after their definition, i.e.. like dead definitions. Therefore, their uses had to immediately follow their definitions, otherwise the related register may be reused to allocate a virtual register. This commit fixes this assumption by keeping implicit definitions alive until they are actually used. The downside is that if the implicit definition was dead (and not marked at such), we block an otherwise available register. This is however conservatively correct and makes the fast register allocator much more robust in particular regarding the scheduling of the instructions. Fixes PR21700. llvm-svn: 223317
* [msan] allow -fsanitize-coverage=N together with -fsanitize=memory, llvm partKostya Serebryany2014-12-032-2/+5
| | | | llvm-svn: 223312
* Test commit.Jacques Pienaar2014-12-031-2/+2
| | | | llvm-svn: 223310
* Split the set of identified struct types into opaque and non-opaque ones.Rafael Espindola2014-12-031-106/+201
| | | | | | | | | | | | | | | | | | | The non-opaque part can be structurally uniqued. To keep this to just a hash lookup, we don't try to unique cyclic types. Also change the type mapping algorithm to be optimistic about a type not being recursive and only create a new type when proven to be wrong. This is not as strong as trying to speculate that we can keep the source type, but is simpler (no speculation to revert) and more powerfull than what we had before (we don't copy non-recursive types at least). I initially wrote this to try to replace the name based type merging. It is not strong enough to replace it, but is is a useful addition. With this patch the number of named struct types is a clang lto bootstrap goes from 49674 to 15986. llvm-svn: 223278
* fix typos, grammar, formatting; NFCSanjay Patel2014-12-031-22/+19
| | | | llvm-svn: 223276
OpenPOWER on IntegriCloud