summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Don't sibcall between SysV and Win64 convention functionsReid Kleckner2015-02-261-0/+6
| | | | | | | | The shadow stack space expectations won't match. Fixes PR22709. llvm-svn: 230667
* Fix justify error for small structures in varargs for MIPS64BEPetar Jovanovic2015-02-261-0/+4
| | | | | | | | | | | | | | | There was a problem when passing structures as variable arguments. The structures smaller than 64 bit were not left justified on MIPS64 big endian. This is now fixed by shifting the value to make it left- justified when appropriate. This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608 Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D7881 llvm-svn: 230657
* Use ".arch_extension" ARM directive to support hwdiv on kraitSumanth Gundapaneni2015-02-261-3/+12
| | | | | | | | | | | In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the features bits are not computed. This patch lets the asm printer emit ".cpu cortex-a9" directive for krait and the hwdiv feature is enabled through ".arch_extension". In short, krait is treated as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since it is not supported bu GNU GAS yet llvm-svn: 230651
* Use ".arch_extension" ARM directive to specify the additional CPU featuresSumanth Gundapaneni2015-02-264-0/+75
| | | | | | | | | | | This patch is in response to r223147 where the avaiable features are computed based on ".cpu" directive. This will work clean for the standard variants like cortex-a9. For custom variants which rely on standard cpu names for assembly, the additional features of a CPU should be propagated. This can be done via ".arch_extension" as long as the assembler supports it. The implementation for krait along with unit test will be submitted in next patch. llvm-svn: 230650
* R600/SI: Remove M0 from DS assembly stringsTom Stellard2015-02-261-8/+8
| | | | | | This matches the assembly syntax for the proprietary compiler. llvm-svn: 230645
* [X86][Haswell][SchedModel] Fix WriteMULm latency.Michael Kuperstein2015-02-261-1/+1
| | | | | | | The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5). A better estimate would be 4 added to WriteMULr, that is, 9. llvm-svn: 230634
* [x86] Sink the single-input v8i16 lowering code that is actuallyChandler Carruth2015-02-261-24/+26
| | | | | | | | | | formulaic into the top v8i16 lowering routine. This makes the generalized lowering a completely general and single path lowering which will allow generalizing it in turn for multiple 128-bit lanes. llvm-svn: 230623
* [x86] Remove a SimpleTy usage. No need for it here, we already have theChandler Carruth2015-02-261-2/+2
| | | | | | MVT. llvm-svn: 230622
* [x86] Make the vector shuffle helpers order the SDLoc and MVT arguments.Chandler Carruth2015-02-261-27/+27
| | | | | | This ordering matches that of DAG.getNode. llvm-svn: 230617
* Pass /nologo to ml64 for quieter buildsReid Kleckner2015-02-261-1/+1
| | | | | | | It still prints "Assembling path/to/X86CompilationCallback_Win64.asm", but linking does the same thing. llvm-svn: 230596
* Remove a FIXME.Eric Christopher2015-02-261-1/+0
| | | | | | | | | | Explanation: This function is in TargetLowering because it uses RegClassForVT which would need to be moved to TargetRegisterInfo and would necessitate moving isTypeLegal over as well - a massive change that would just require TargetLowering having a TargetRegisterInfo class member that it would use. llvm-svn: 230585
* Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.Eric Christopher2015-02-2621-31/+39
| | | | | | | | | This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583
* [PowerPC] Make LDtocL and friends invariant loadsHal Finkel2015-02-254-34/+54
| | | | | | | | | | | | | | | | | | | | | LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node, represent loads from the TOC, which is invariant. As a result, these loads can be hoisted out of loops, etc. In order to do this, we need to generate GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory intrinsic node type. Once this is done, the MMO transfer is automatically handled for TableGen-driven instruction selection, and for nodes generated directly in PPCISelDAGToDAG, we need to transfer the MMOs manually. Also, we were not transferring MMOs associated with pre-increment loads, so do that too. Lastly, this fixes an exposed bug where R30 was not added as a defined operand of UpdateGBR. This problem was highlighted by an example (used to generate the test case) posted to llvmdev by Francois Pichet. llvm-svn: 230553
* X86, Win64: Allow 'mov' to restore the stack pointer if we have a FPDavid Majnemer2015-02-251-13/+12
| | | | | | | | | | | | | | | | | | The Win64 epilogue structure is very restrictive, it permits a very small number of opcodes and none of them are 'mov'. This means that given: mov %rbp, %rsp pop %rbp The mov isn't the epilogue, only the pop is. This is problematic unless a frame pointer is present in which case we are free to do whatever we'd like in the "body" of the function. If a frame pointer is present, unwinding will undo the prologue operations in reverse order regardless of the fact that we are at an instruction which is reseting the stack pointer. llvm-svn: 230543
* [PowerPC] Cleanup unused target-specific SDAG nodesHal Finkel2015-02-254-35/+7
| | | | | | | | We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64 TOC access that were referenced only in TableGen patterns. The associated (pseudo-)instructions are used, but are being generated directly. NFC. llvm-svn: 230518
* AArch64: Add debug message for large shift constants.Matthias Braun2015-02-251-2/+8
| | | | | | As requested in code review. llvm-svn: 230517
* [MIPS]Multiple and add instructions for Mips are currently available in ↵Vladimir Medic2015-02-251-14/+14
| | | | | | mips32r2/mips64r2 and later but should also be available in mips4, mips5, and mips64. This patch fixes the requested features and updates the corresponding test files. llvm-svn: 230500
* [X86][MMX] Reapply: Add MMX instructions to foldable tablesBruno Cardoso Lopes2015-02-251-0/+84
| | | | | | | | | | Reapply r230248. Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230499
* [X86][MMX] Prevent MMX_MOVD64rm foldingBruno Cardoso Lopes2015-02-251-1/+0
| | | | | | | | | | | | | | | MMX_MOVD64rm zero-extends i32 load results into i64 registers. The peephole optimizer will try to fold it in other MMX foldable instructions, the wrong thing to do, since there's no MMX memory instruction that loads from i32 and does implict zero extension. Remove 'canFoldAsLoad' from MOVD64rm in order to prevent such folding. The current MMX tests already test this, but since there are no MMX instructions in the foldable tables yet, this did not trigger. This commit prepares the addition of those instructions. llvm-svn: 230498
* Improve handling of stack accesses in Thumb-1Renato Golin2015-02-254-12/+57
| | | | | | | | | | | | | | | | | Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions: * Use word loads for anyext byte and halfword loads from the stack. * Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid. * Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index. * Correct how many bits of offset we think AddrModeT1_s has. Patch by John Brawn. llvm-svn: 230496
* Silencing a "result of 32-bit shift implicitly converted to 64 bits (was ↵Aaron Ballman2015-02-251-1/+1
| | | | | | 64-bit shift intended?)" warning in MSVC; NFC. llvm-svn: 230489
* Silencing a -Wsign-compare warning triggered in MSVC; NFC.Aaron Ballman2015-02-251-1/+1
| | | | llvm-svn: 230488
* AVX-512: Gather and Scatter patternsElena Demikhovsky2015-02-253-44/+108
| | | | | | | | | | | | | | | Gather and scatter instructions additionally write to one of the source operands - mask register. In this case Gather has 2 destination values - the loaded value and the mask. Till now we did not support code gen pattern for gather - the instruction was generated from intrinsic only and machine node was hardcoded. When we introduce the masked_gather node, we need to select instruction automatically, in the standard way. I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands. (Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big patch in many small patches) llvm-svn: 230471
* [PowerPC] Add support for the QPX vector instruction setHal Finkel2015-02-2519-67/+2675
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413
* Rename UpdateRegAllocHint to match style guidelines.Eric Christopher2015-02-242-2/+2
| | | | llvm-svn: 230357
* AArch64: Relax assert about large shift sizes.Matthias Braun2015-02-241-3/+9
| | | | | | | | | | The reason why these large shift sizes happen is because OpaqueConstants currently inhibit alot of DAG combining, but that has to be addressed in another commit (like the proposal in D6946). Differential Revision: http://reviews.llvm.org/D6940 llvm-svn: 230355
* R600/SI: Remove isel mubuf legalizationTom Stellard2015-02-242-130/+0
| | | | | | | We legalize mubuf instructions post-instruction selection, so this code is no longer needed. llvm-svn: 230352
* ARM: treat [N x i32] and [N x i64] as AAPCS composite typesTim Northover2015-02-243-61/+100
| | | | | | | | | | | The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. llvm-svn: 230348
* simplify control flow; NFCSanjay Patel2015-02-241-8/+9
| | | | llvm-svn: 230342
* [x32] Mark RBX as reserved when EBX is the base pointer.Michael Kuperstein2015-02-241-1/+3
| | | | | | This should have gone into r230334. llvm-svn: 230339
* fix typo in comment; NFCSanjay Patel2015-02-241-1/+1
| | | | llvm-svn: 230338
* [x32] x32 should use ebx as the base pointer.Michael Kuperstein2015-02-241-8/+9
| | | | | | This fixes the original issue in PR22655, but not the secondary one. llvm-svn: 230334
* [mips] Reformat some TableGen definitions. NFC.Toma Tabacu2015-02-242-14/+18
| | | | | | | | | | | | | | Summary: Separated some instruction and pseudo-instruction definitions from InstAlias definitions, added banner for pseudo-instructions and removed a redundant whitespace from a pseudo-instruction definition. No functional change. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7552 llvm-svn: 230327
* [X86] Remove the AbsMem32 type from the assembly parser. Only really need ↵Craig Topper2015-02-242-12/+2
| | | | | | the 16-bit version which will automatically get prioritized over AbsMem. llvm-svn: 230313
* Beginning of alloca implementation for Mips fast-iselReed Kotler2015-02-241-20/+139
| | | | | | | | | | | | | | | | Summary: Begin to add various address modes; including alloca. Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: echristo, rfuhler, llvm-commits Differential Revision: http://reviews.llvm.org/D6426 llvm-svn: 230300
* Fix handling of negative offsets for AddrModeT2_i8s4 in rewriteT2FrameIndex.Bob Wilson2015-02-241-5/+2
| | | | | | | | | | | | | | This is a follow up to r230233 to fix something that I noticed by inspection. The AddrModeT2_i8s4 addressing mode does not support negative offsets. I spent a good chunk of the day trying to come up with a testcase for this but was not successful. This addressing mode is used to spill and restore GPRPair registers in Thumb2 code and that does not happen often. We also make very limited used of negative offsets when lowering frame indexes. I am going ahead with the change anyway, because I am pretty confident that it is correct. I also added a missing assertion to check that the low bits of the scaled offset are zero. llvm-svn: 230297
* X86: Only use 'lea' in Win64 epilogues if a frame pointer existsDavid Majnemer2015-02-241-7/+21
| | | | | | | We can only use 'add' in epilogues, 'lea' is not permitted unless we've established a frame pointer in the prologue. llvm-svn: 230286
* X86: Use a smaller 'mov' instruction for stack probe callsDavid Majnemer2015-02-231-3/+13
| | | | | | | | | | | | | Prologue emission, in some cases, requires calls to a stack probe helper function. The amount of stack to probe is passed as a register argument in the Win64 ABI but the instruction sequence used is pessimistic: it assumes that the number of bytes to probe is greater than 4 GB. Instead, select a more appropriate opcode depending on the number of bytes we are going to probe. llvm-svn: 230270
* X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possibleDavid Majnemer2015-02-231-2/+5
| | | | | | | 'mov' and 'lea' are equivalent when the displacement applied with 'lea' is zero. However, 'mov' should encode smaller. llvm-svn: 230269
* X86: Explain why we cannot use a 'mov' in a Win64 epilogueDavid Majnemer2015-02-231-0/+6
| | | | llvm-svn: 230268
* X86: Consistently use 'epilogue' instead of 'epilog'David Majnemer2015-02-231-1/+1
| | | | llvm-svn: 230267
* [AsmPrinter] Access pointers to globals via pcrel GOT entriesBruno Cardoso Lopes2015-02-232-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Front-ends could use global unnamed_addr to hold pointers to other symbols, like @gotequivalent below: @foo = global i32 42 @gotequivalent = private unnamed_addr constant i32* @foo @delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequivalent to i64), i64 ptrtoint (i32* @delta to i64)) to i32) The global @delta holds a data "PC"-relative offset to @gotequivalent, an unnamed pointer to @foo. The darwin/x86-64 assembly output for this follows: .globl _foo _foo: .long 42 .globl _gotequivalent _gotequivalent: .quad _foo .globl _delta _delta: .long _gotequivalent-_delta Since unnamed_addr indicates that the address is not significant, only the content, we can optimize the case above by replacing pc-relative accesses to "GOT equivalent" globals, by a PC relative access to the GOT entry of the final symbol instead. Therefore, "delta" can contain a pc relative relocation to foo's GOT entry and we avoid the emission of "gotequivalent", yielding the assembly code below: .globl _foo _foo: .long 42 .globl _delta _delta: .long _foo@GOTPCREL+4 There are a couple of advantages of doing this: (1) Front-ends that need to emit a great deal of data to store pointers to external symbols could save space by not emitting such "got equivalent" globals and (2) IR constructs combined with this opt opens a way to represent GOT pcrel relocations by using the LLVM IR, which is something we previously had no way to express. Differential Revision: http://reviews.llvm.org/D6922 rdar://problem/18534217 llvm-svn: 230264
* Revert "[X86][MMX] Add MMX instructions to foldable tables"Bruno Cardoso Lopes2015-02-231-82/+0
| | | | | | This reverts commit r230226 since it breaks win buildbots. llvm-svn: 230248
* Rewrite the global merge pass to be subprogram agnostic for now.Eric Christopher2015-02-236-23/+10
| | | | | | | | | | | | | It was previously using the subtarget to get values for the global offset without actually checking each function as it was generating code. Go ahead and solidify the current behavior and make the existing FIXMEs more prominent. As a note the ARM backend previously had a thumb1 and non-thumb1 set of defaults. Only the former was tested so I've changed the behavior to only use that for now. llvm-svn: 230245
* Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.Chad Rosier2015-02-232-0/+31
| | | | | | | | | | | This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a fmul from being hoisted in cases where it is more profitable to form a fmsub/fmadd. Phabricator Review: http://reviews.llvm.org/D7299 Patch by Lawrence Hu <lawrence@codeaurora.org> llvm-svn: 230241
* [mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled.Daniel Sanders2015-02-232-5/+20
| | | | | | | | | | | | | | | | | | | Summary: -mno-odd-spreg prohibits the use of odd-numbered single-precision floating point registers. However, vector insert/extract was still using them when manipulating the subregisters of an MSA register. Fixed this by ensuring that insertion/extraction is only performed on even-numbered vector registers when -mno-odd-spreg is given. Reviewers: vmedic, sstankovic Reviewed By: sstankovic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7672 llvm-svn: 230235
* Fix incorrect immediate size for AddrModeT2_i8s4 in rewriteT2FrameIndex.Bob Wilson2015-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The natural way to handle this addressing mode would be to say that it has 8 bits and gets scaled by 4, but since the MC layer is expecting the scaling to be already reflected in the immediate value, we have been setting the Scale to 1. That's fine, but then NumBits needs to be adjusted to reflect the effective increase in the range of the immediate. That adjustment was missing. The consequence is that the register scavenger can fail. The estimateRSStackSizeLimit() function in ARMFrameLowering.cpp correctly assumes that the AddrModeT2_i8s4 address mode can handle scaled offsets up to 1020. Under just the right circumstances, we fail to reserve space for the scavenger because it thinks that nothing will be needed. However, the overly pessimistic behavior in rewriteT2FrameIndex causes some frame indexes to be out of range and require scavenged registers, and so the scavenger asserts. Unfortunately I have not been able to come up with a testcase for this. I can only reproduce it on an internal branch where the frame layout and register allocation is slightly different than trunk. We really need a way to serialize MachineInstr-level IR to write reasonable tests for things like this. rdar://problem/19909005 llvm-svn: 230233
* [X86][MMX] Add MMX instructions to foldable tablesBruno Cardoso Lopes2015-02-231-0/+82
| | | | | | | | Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. llvm-svn: 230226
* [X86][MMX] Support folding loads in psll, psrl and psra intrinsicsBruno Cardoso Lopes2015-02-232-0/+21
| | | | llvm-svn: 230225
* AVX-512: recommitted 229837 + bugfix + testElena Demikhovsky2015-02-235-47/+87
| | | | llvm-svn: 230223
OpenPOWER on IntegriCloud