summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Add a FIXME as requested by Renato Golin.Roman Divacky2014-12-041-0/+3
| | | | llvm-svn: 223390
* [x86] Fix isOffsetSuitableForCodeModel kernel code model offsetBruno Cardoso Lopes2014-12-041-1/+1
| | | | | | | Offset == 0 is a valid offset for kernel code model according to the x86_64 System V ABI. Found by inspection, no testcase. llvm-svn: 223383
* [AArch64] Combining Load and IntToFp should check for neon availabilityWeiming Zhao2014-12-041-3/+4
| | | | llvm-svn: 223382
* Fix yet another unseen regression caused by r223113Asiri Rathnayake2014-12-041-14/+26
| | | | | | | | | | r223113 added support for ARM modified immediate assembly syntax. Which assumes all immediate operands are prefixed with a '#'. This assumption is wrong as per the ARMARM - which recommends that all '#' characters be treated optional. The current patch fixes this regression and adds a test case. A follow-up patch will expand the test coverage to other instructions. llvm-svn: 223381
* Fix thumbv4t indirect callsJonathan Roelofs2014-12-042-11/+50
| | | | | | | | | | | | | | | | | | | | | So there are a couple of issues with indirect calls on thumbv4t. First, the most 'obvious' instruction, 'blx' isn't available until v5t. And secondly, the next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code because the saved off pc has its thumb bit cleared, so when the callee returns we end up in ARM mode.... yuck. The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it. We could cut down on code size by sharing the landing pads between call sites that are close enough, but for the moment let's do correctness first and look at performance later. Patch by: Iain Sandoe http://reviews.llvm.org/D6519 llvm-svn: 223380
* Fix a minor regression introduced in r223113Asiri Rathnayake2014-12-041-11/+21
| | | | | | | | | | r223113 added support for ARM modified immediate assembly syntax. That patch has broken support for immediate expressions, as in: add r0, #(4 * 4) It wasn't caught because we don't have any tests for this feature. This patch fixes this regression and adds test cases. llvm-svn: 223366
* Revert "[Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list ↵Rafael Espindola2014-12-041-145/+89
| | | | | | | | | | for PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090>" This reverts commit r223356. It was failing check-all (MC/ARM/thumb.s in particular). llvm-svn: 223363
* [X86] Improve a dag-combine that handles a vector extract -> zext sequence.Michael Kuperstein2014-12-041-24/+51
| | | | | | | | | The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads. According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better. Differential Revision: http://reviews.llvm.org/D6501 llvm-svn: 223360
* [Thumb/Thumb2] Added restrictions on PC, LR, SP in the register list for ↵Jyoti Allur2014-12-041-89/+145
| | | | | | PUSH/POP/LDM/STM. <Differential Revision: http://reviews.llvm.org/D6090> llvm-svn: 223356
* [X86] Simplify code. NFC.Andrea Di Biagio2014-12-041-16/+6
| | | | | | | | Replaced some logic that checked if a build_vector node is doing a splat of a non-undef value with a call to method BuildVectorSDNode::getSplatValue(). No functional change intended. llvm-svn: 223354
* Masked Load / Store Intrinsics - the CodeGen part.Elena Demikhovsky2014-12-044-4/+166
| | | | | | | | | | | | | | | | | | I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
* [X86] Clean up whitespace as well as minor coding styleMichael Liao2014-12-0433-409/+403
| | | | llvm-svn: 223339
* [Hexagon] Marking some instructions as CodeGenOnly=0 and adding disassembly ↵Colin LeMahieu2014-12-043-3/+9
| | | | | | tests. llvm-svn: 223334
* [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmpMichael Liao2014-12-044-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit on - This patch fixes the bug described in http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html The fix allocates an extra slot just below the GPRs and stores the base pointer there. This is done only for functions containing llvm.eh.sjlj.setjmp that also need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of the callee-save GPRs in the prologue, the offset to the extra slot can be computed before prologue generation runs. Impact at run-time on affected functions is:: - One extra store in the prologue, The store saves the base pointer. - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer. Because the extra slot is just above a gap between frame-pointer-relative and base-pointer-relative chunks of memory, there is no impact on other offset calculations other than ensuring there is room for the extra slot. http://reviews.llvm.org/D6388 Patch by Arch Robison <arch.robison@intel.com> llvm-svn: 223329
* [PowerPC] 'cc' should be an alias only to 'cr0'Hal Finkel2014-12-041-4/+2
| | | | | | | | | | | We had mistakenly believed that GCC's 'cc' referred to the entire condition-code register (cr0 through cr7) -- and implemented this in r205630 to fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM to clobber too much with legacy code with inline asm using the 'cc' clobber. Fixes PR21451. llvm-svn: 223328
* HexagonMCInst.h: Qualify constants explicitly to appease msc17.NAKAMURA Takumi2014-12-041-2/+2
| | | | llvm-svn: 223325
* Allow target to specify prefix for labelsMatt Arsenault2014-12-045-0/+9
| | | | | | | | Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323
* [PowerPC] Fix inline asm memory operands not to use r0Hal Finkel2014-12-031-2/+12
| | | | | | | | | | | | | | | On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is a register containing the address. As a result, this register cannot be r0, and we need to enforce this register subclass constraint to prevent miscompiling the code (we'd get this constraint for free with the usual instruction definitions, but that scheme has no knowledge of how we end up printing inline asm memory operands, and so here we need to do it 'by hand'). We can accomplish this within the current address-mode selection framework by introducing an explicit COPY_TO_REGCLASS node. Fixes PR21443. llvm-svn: 223318
* Test commit.Jacques Pienaar2014-12-031-2/+2
| | | | llvm-svn: 223310
* fix typos, grammar, formatting; NFCSanjay Patel2014-12-031-22/+19
| | | | llvm-svn: 223276
* [Hexagon] Converting member InstrDesc to static variable.Colin LeMahieu2014-12-034-22/+28
| | | | llvm-svn: 223268
* [Hexagon] Converting subclass members to an implicit operand.Colin LeMahieu2014-12-033-24/+68
| | | | llvm-svn: 223264
* Add TableGen info for Power8.Will Schmidt2014-12-032-0/+395
| | | | | | | | This is based on the Power7 version, with units added and renamed to match P8. Differential Revision: http://reviews.llvm.org/D6358 llvm-svn: 223257
* Change the name to be in style.Roman Divacky2014-12-031-1/+1
| | | | llvm-svn: 223255
* R600/SI: Move SIInsertWaits into AMDGPUPassConfig::addPreSched2()Tom Stellard2014-12-031-1/+3
| | | | | | | This pass needs to be run after PrologEpilogInserter, because that pass may inserter spill code which reads or writes memory. llvm-svn: 223253
* R600/SI: Don't run SI passes on R600 subtargetsTom Stellard2014-12-031-1/+1
| | | | llvm-svn: 223252
* AArch64: fix wrong-endian parameter passing.Tim Northover2014-12-031-2/+4
| | | | | | | The blocked arguments code didn't take account of the hacks needed to support it. llvm-svn: 223247
* [NFC] Fixing pendantic warning extra semicolons.Colin LeMahieu2014-12-031-7/+7
| | | | llvm-svn: 223246
* [Hexagon] [NFC] Moving function implementations out of header. ↵Colin LeMahieu2014-12-032-79/+88
| | | | | | Clang-formatting files. llvm-svn: 223245
* [Hexagon] [NFC] Renaming *packetStart to *packetBeginColin LeMahieu2014-12-033-11/+11
| | | | llvm-svn: 223243
* Silencing a 32-bit implicit conversion warning in MSVC; NFC.Aaron Ballman2014-12-031-1/+1
| | | | llvm-svn: 223237
* [PowerPC] Print all inline-asm consts as signed numbersHal Finkel2014-12-031-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed numbers, and it is important that we print them as such. To make sure that happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it does all intermediate checks on a signed-extended int64_t value, and then creates the resulting target constant using MVT::i64. This will ensure that all negative values are printed as negative values (mirroring what is done in other backends to achieve the same sign-extension effect). This came up in the context of inline assembly like this: "add%I2 %0,%0,%2", ..., "Ir"(-1ll) where we used to print: addi 3,3,4294967295 and gcc would print: addi 3,3,-1 and gas accepts both forms, but our builtin assembler (correctly) does not. Now we print -1 like gcc does. While here, I replaced a bunch of custom integer checks with isInt<16> and friends from MathExtras.h. Thanks to Paul Hargrove for the bug report. llvm-svn: 223220
* Emit ABI_FP_rounding attribute.Charlie Turner2014-12-031-0/+6
| | | | | | | | | | | | LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When the user has specified this option, the Tag_ABI_FP_rounding attribute should be emitted with value 1. This option currently does not appear to disable transformations and optimizations that assume default floating point rounding behavior, AFAICT, but the intention should be recorded in the build attributes, regardless of what the compiler actually does with the intention. Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e llvm-svn: 223218
* R600/SI: Fix SIFixSGPRCopies for copies to physical registersMatt Arsenault2014-12-031-1/+6
| | | | | | | This shows up when operands required to be passed in VCC are copied to. llvm-svn: 223208
* R600/SI: Remove incorrect assertionMatt Arsenault2014-12-031-5/+5
| | | | | | This can be a COPY to a physical register, such as VCC llvm-svn: 223207
* R600/SI: Remove i1 pseudo VALU opsMatt Arsenault2014-12-033-63/+70
| | | | | | | | | | | | | | Select i1 logical ops directly to 64-bit SALU instructions. Vector i1 values are always really in SGPRs, with each bit for each item in the wave. This saves about 4 instructions when and/or/xoring any condition, and also helps write conditions that need to be passed in vcc. This should work correctly now that the SGPR live range fixing pass works. More work is needed to eliminate the VReg_1 pseudo regclass and possibly the entire SILowerI1Copies pass. llvm-svn: 223206
* R600/SI: Fix suspicious indexingMatt Arsenault2014-12-031-5/+7
| | | | | | | | The loop is over the operands of an instruction, and checks the register with the sub reg index of the dest register. This probably meant to be checking the sub reg index of the same operand. llvm-svn: 223205
* R600/SI: Fix running SILowerI1Copies a second timeMatt Arsenault2014-12-031-2/+1
| | | | llvm-svn: 223204
* R600/SI: Fix live range error hidden by SIFoldOperandsMatt Arsenault2014-12-031-0/+9
| | | | | | | | | | | | | | | m0 is treated as a virtual register class with a single register rather than the physical register it really is. This was updating the live range of the used virtual copy of m0 from the first ds_read instruction, and leaving the unused copy unchanged. This resulted in a "Live segment doesn't end at a valid instruction" verifier error because the erased instructions. Update the live range of the second copy (which should be dead). No test since I'm not sure how to trigger this with SIFoldOperands enabled. llvm-svn: 223203
* NVPTX: Delete dead codeDuncan P. N. Exon Smith2014-12-031-5/+0
| | | | | | `MDNode` does not inherit from `User`, and it never has a name. llvm-svn: 223198
* R600/SI: Enable inline assemblyTom Stellard2014-12-031-2/+1
| | | | | | | | We just needed to remove the assertion in AMDGPURegisterInfo::getFrameRegister(), which is called when initializing the parser for inline assembly. llvm-svn: 223197
* R600/SI: Change mubuf offsets to print as decimalMatt Arsenault2014-12-031-1/+1
| | | | | | This matches SC's behavior. llvm-svn: 223194
* [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.Ahmed Bougacha2014-12-031-1/+1
| | | | | | | | | | The X86AsmParser intel handling was refactored in r216481, making it try each different memory operand size to see which one matches. Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which led to an "invalid operand" error for code such as: movdqa [rax], xmm0 llvm-svn: 223187
* [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targetsHal Finkel2014-12-031-5/+3
| | | | | | | We need to use the custom expansion of readcyclecounter on all 32-bit targets (even those with 64-bit registers). This should fix the ppc64 buildbot. llvm-svn: 223182
* AArch64: strengthen Darwin ABI alignment assumptionsTim Northover2014-12-021-1/+1
| | | | | | | | | | A global variable without an explicit alignment specified should be assumed to be ABI-aligned according to its type, like on other platforms. This allows us to use better memory operations when accessing it. rdar://18533701 llvm-svn: 223180
* AArch64: don't be too greedy when folding :lo12: accesses into mem ops.Tim Northover2014-12-021-1/+22
| | | | | | | | | | | | | | | This frequently leads to cases like: ldr xD, [xN, :lo12:var] add xA, xN, :lo12:var ldr xD, [xA, #8] where the ADD would have been needed anyway, and the two distinct addressing modes can prevent the formation of an ldp. Because of how we handle ADRP (aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also results in duplicated ADRP instructions (one on its own to cover the ldr, and one combined with the add). llvm-svn: 223172
* [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targetsSimon Pilgrim2014-12-021-2/+2
| | | | | | | | | | 4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead. The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch. Differential Revision: http://reviews.llvm.org/D6458 llvm-svn: 223165
* [PowerPC] Implement readcyclecounter for PPC32Hal Finkel2014-12-024-0/+72
| | | | | | | | | | | | | | | | | | | We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161
* R600/SI: Emit amd_kernel_code_t header for AMDGPU environmentTom Stellard2014-12-025-1/+829
| | | | llvm-svn: 223160
* [AArch64][Stackmaps] Optimize stackmap shadows on AArch64.Lang Hames2014-12-021-1/+16
| | | | | | | | | | Reduce the number of nops emitted for stackmap shadows on AArch64 by counting non-stackmap instructions up to the next branch target towards the requested shadow. <rdar://problem/14959522> llvm-svn: 223156
OpenPOWER on IntegriCloud