summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC] Split the blr definition into BLR and BLR8Hal Finkel2015-01-135-6/+13
| | | | | | | | | | We really need a separate 64-bit version of this instruction so that it can be marked as clobbering LR8 (instead of just LR). No change in functionality (although the verifier might be slightly happier), however, it is required for stackmap/patchpoint support. Thus, this will be covered by stackmap test cases once those are added. llvm-svn: 225804
* [PowerPC] Add DWARF numbers for CA (XER), etc.Hal Finkel2015-01-131-5/+4
| | | | | | | | | | | | For registers that have DWARF numbers (like CA, which is really part of XER), add them. Also, RM is not an SPR, and the declaration hack (where it is declared as an SPR with an arbitrary number) is not needed, so just declare it as a register. NFC; although CA's register number will be needed when stackmap/patchpoint support is added. llvm-svn: 225800
* [mips][microMIPS] Fix issue with 16b instructions in jr instruction delay slotJozef Kolek2015-01-131-5/+16
| | | | | | | | | 16 bit instructions are not allowed in jr delay slot. Same stands for PseudoIndirectBranch and PseudoReturn. Differential Revision: http://reviews.llvm.org/D6815 llvm-svn: 225798
* Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now ↵Olivier Sallenave2015-01-132-0/+7
| | | | | | guarded with that hook. llvm-svn: 225795
* Peephole opt needs optimizeSelect() to keep track of newly created MIsMehdi Amini2015-01-132-3/+11
| | | | | | | | | | | | | | | Peephole optimizer is scanning a basic block forward. At some point it needs to answer the question "given a pointer to an MI in the current BB, is it located before or after the current instruction". To perform this, it keeps a set of the MIs already seen during the scan, if a MI is not in the set, it is assumed to be after. It means that newly created MIs have to be inserted in the set as well. This commit passes the set as an argument to the target-dependent optimizeSelect() so that it can properly update the set with the (potentially) newly created MIs. llvm-svn: 225772
* ARM: prepare prefix parsing for improved AAELF supportSaleem Abdulrasool2015-01-131-5/+43
| | | | | | | | AAELF specifies a number of ELF specific relocation types which have custom prefixes for the symbol reference. Switch the parser to be more table driven with an idea of file formats for which they apply. NFC. llvm-svn: 225758
* Rename llvm.recoverframeallocation to llvm.framerecoverReid Kleckner2015-01-131-1/+1
| | | | | | | | This name is less descriptive, but it sort of puts things in the 'llvm.frame...' namespace, relating it to frameallocate and frameaddress. It also avoids using "allocate" and "allocation" together. llvm-svn: 225752
* Add the llvm.frameallocate and llvm.recoverframeallocation intrinsicsReid Kleckner2015-01-132-0/+7
| | | | | | | | | | | | | | | | | | | | | These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746
* R600/SI: Remove redundant setting expand on f64 vectorsMatt Arsenault2015-01-121-7/+0
| | | | | | | None of these are legal types already, so they default to Expand. llvm-svn: 225728
* [X86][SSE] Minor regression fix for r225551Simon Pilgrim2015-01-121-1/+2
| | | | | | r225551 vector byte shuffle optimization caused an assertion as fully zeroable vectors can be produced under certain circumstances. This fix drops the assert and returns a zero vector where the assert would have failed. llvm-svn: 225718
* [X86] Also create+widen FMIN/FMAX nodes for v2f32.Ahmed Bougacha2015-01-121-1/+18
| | | | | | | | | | | | | | | | This happens in the HINT benchmark, where the SLP-vectorizer created v2f32 fcmp/select code. The "correct" solution would have been to teach the vectorizer cost model that v2f32 isn't legal (because really, it isn't), but if we can vectorize we might as well do so. We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on. v3f32 were already widened to v4f32 by the generic unroll-and-build-vector legalization. rdar://15763436 Differential Revision: http://reviews.llvm.org/D6557 llvm-svn: 225691
* R600/SI: Use RegisterOperands to specify which operands can accept immediatesTom Stellard2015-01-1210-76/+68
| | | | | | | | | | | | There are some operands which can take either immediates or registers and we were previously using different register class to distinguish between operands that could take immediates and those that could not. This patch switches to using RegisterOperands which should simplify the backend by reducing the number of register classes and also make it easier to implement the assembler. llvm-svn: 225662
* Add r224985 back with two fixes.Rafael Espindola2015-01-127-182/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One is that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. The other is that ld64 requires the relocations to cstring to use linker visible symbols on AArch64. Thanks to Michael Zolotukhin for testing this! Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225644
* [mips][microMIPS] Implement BEQZ16 and BNEZ16 instructionsJozef Kolek2015-01-1210-0/+107
| | | | | | Differential Revision: http://reviews.llvm.org/D5271 llvm-svn: 225627
* [PowerPC] Fix calls to non-function objectsHal Finkel2015-01-121-5/+22
| | | | | | | | | | | | | | | | | | Looking at r225438 inspired me to see how the PowerPC backend handled the situation (calling a bitcasted TLS global), and it turns out we also produced an error (cannot select ...). What it means to "call" something that is not a function is implementation and platform specific, but in the name of doing something (besides crashing), this makes sure we do what GCC does (treat all such calls as calls through a function pointer -- meaning that the pointer is assumed, as is the convention on PPC, to point to a function descriptor structure holding the actual code address along with the function's TOC pointer and environment pointer). As GCC does, we now do the same for calling regular (non-TLS) non-function globals too. I'm not sure whether this is the most useful way to define the behavior, but at least we won't be alone. llvm-svn: 225617
* [X86][SSE] Minor fix to VPBLENDW AVX2 commutation.Simon Pilgrim2015-01-111-3/+3
| | | | | | D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for. llvm-svn: 225612
* Revert most of r225597David Majnemer2015-01-114-71/+44
| | | | | | We can't rely on a DataLayout enlightened constant folder. llvm-svn: 225599
* X86: Properly decode shuffle masks when the constant pool type is weirdDavid Majnemer2015-01-114-66/+80
| | | | | | | | | | | | | It's possible for the constant pool entry for the shuffle mask to come from a completely different operation. This occurs when Constants have the same bit pattern but have different types. Make DecodePSHUFBMask tolerant of types which, after a bitcast, are appropriately sized vector types. This fixes PR22188. llvm-svn: 225597
* X86: teach X86TargetLowering about L,M,O constraintsSaleem Abdulrasool2015-01-111-0/+25
| | | | | | | | | Teach the ISelLowering for X86 about the L,M,O target specific constraints. Although, for the moment, clang performs constraint validation and prevents passing along inline asm which may have immediate constant constraints violated, the backend should be able to cope with the invalid inline asm a bit better. llvm-svn: 225596
* ARM: add support for segment base relocations (SBREL)Saleem Abdulrasool2015-01-112-0/+7
| | | | | | | | This adds support for parsing and emitting the SBREL relocation variant for the ARM target. Handling this relocation variant is necessary for supporting the full ARM ELF specification. Addresses PR22128. llvm-svn: 225595
* [X86][SSE] Improved (v)insertps shuffle matchingSimon Pilgrim2015-01-101-42/+82
| | | | | | | | | | | | In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable. This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced. We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function. Differential Revision: http://reviews.llvm.org/D6879 llvm-svn: 225589
* [PowerPC] Mark zext of a small scalar load as freeHal Finkel2015-01-102-0/+22
| | | | | | | | | | | | | This initial implementation of PPCTargetLowering::isZExtFree marks as free zexts of small scalar loads (that are not sign-extending). This callback is used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to determine whether a zext or an anyext is used to lower illegally-typed PHIs. Because later truncates of zero-extended values are nops, this allows for the elimination of later unnecessary truncations. Fixes the initial complaint associated with PR22120. llvm-svn: 225584
* Remove some whitespace.Justin Hibbits2015-01-101-1/+1
| | | | llvm-svn: 225583
* Fully fix Bug #22115.Justin Hibbits2015-01-102-6/+53
| | | | | | | | | | | | | | | | | Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty results. Test Plan: Tests updated Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6906 llvm-svn: 225573
* [PowerPC] Readjust the loop unrolling thresholdHal Finkel2015-01-102-4/+4
| | | | | | | | Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566
* [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zerosSimon Pilgrim2015-01-091-12/+30
| | | | | | | | pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable. Differential Revision: http://reviews.llvm.org/D6878 llvm-svn: 225551
* Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revisionLang Hames2015-01-096-30/+21
| | | | | | | | | | introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534
* [mips] Add support for accessing $gp as a named register.Daniel Sanders2015-01-092-0/+24
| | | | | | | | | | | | | | | | | | | | | Summary: Mips Linux uses $gp to hold a pointer to thread info structure and accesses it with a named register. This makes this work for LLVM. The N32 ABI doesn't quite work yet since the frontend generates incorrect IR for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before converting to a pointer. Given correct IR (as in the testcase in this patch), it works correctly. Reviewers: sstankovic, vmedic, atanasyan Reviewed By: atanasyan Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6893 llvm-svn: 225529
* [PowerPC] Enable late partial unrolling on the POWER7Hal Finkel2015-01-093-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522
* [mips] Add comment which explains why we need to change the assembler ↵Toma Tabacu2015-01-091-0/+6
| | | | | | options before and after inline asm blocks. NFC. llvm-svn: 225521
* ARM: add support for R_ARM_ABS16Saleem Abdulrasool2015-01-091-0/+8
| | | | | | Add support for R_ARM_ABS16 relocation mapping. Addresses PR22156. llvm-svn: 225510
* ARM: add support for R_ARM_ABS8 relocationsSaleem Abdulrasool2015-01-091-0/+8
| | | | | | Add support for R_ARM_ABS8 relocation. Addresses PR22126. llvm-svn: 225507
* [PowerPC] Add a flag for experimenting with subreg liveness trackingHal Finkel2015-01-092-0/+10
| | | | | | | This cannot yet be enabled by default, it causes ~50 miscompiles in the test suite. llvm-svn: 225497
* [PowerPC] Fold [sz]ext with fp_to_int lowering where possibleHal Finkel2015-01-092-4/+61
| | | | | | | On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32 to i64 into the load necessary for an i64 -> fp conversion. llvm-svn: 225493
* [x86] Add a flag to control the vector shuffle legality predicates thatChandler Carruth2015-01-091-0/+23
| | | | | | | | | | complements the new vector shuffle lowering code path. This flag, naturally, is *off* because we've not tested or evaluated the results of this at all. However, the flag will make it much easier to evaluate whether we can be this aggressive and whether there are missing vector shuffle lowering optimizations. llvm-svn: 225491
* [PowerPC] Mark all instructions as non-cheap for MachineLICMHal Finkel2015-01-081-0/+9
| | | | | | | | | | | | MachineLICM uses a callback named hasLowDefLatency to determine if an instruction def operand has a 'low' latency. If all relevant operands have a 'low' latency, the instruction is considered too cheap to hoist out of loops even in low-register-pressure situations. On PowerPC cores, both the embedded cores and the others, there is no reason to believe that this is a good choice: all instructions have a cost inside a loop, and hoisting them when not limited by register pressure is a reasonable default. llvm-svn: 225471
* [ARM] Fix a bug in constant island pass that was triggering an assertion.Akira Hatanaka2015-01-081-1/+1
| | | | | | | | | | | | | | | | | | | The assert was being triggered when the distance between a constant pool entry and its user exceeded the maximally allowed distance after thumb2 branch shortening. A padding was inserted after a thumb2 branch instruction was shrunk, which caused the user to be out of range. This is wrong as the padding should have been inserted by the layout algorithm so that the distance between two instructions doesn't grow later during thumb2 instruction optimization. This commit fixes the code in ARMConstantIslands::createNewWater to call computeBlockSize and set BasicBlock::Unalign when a branch instruction is inserted to create new water after a basic block. A non-zero Unalign causes the worst-case padding to be inserted when adjustBBOffsetsAfter is called to recompute the basic block offsets. rdar://problem/19130476 llvm-svn: 225467
* Make the TargetMachine in MipsSubtarget a reference ratherEric Christopher2015-01-083-15/+15
| | | | | | than a pointer to make unifying code a bit easier. llvm-svn: 225459
* Update include - this class doesn't use the target machine, butEric Christopher2015-01-081-1/+1
| | | | | | only the subtarget. llvm-svn: 225458
* Fix a couple of odd formatting issues.Eric Christopher2015-01-081-6/+4
| | | | llvm-svn: 225457
* This routine is in InstrInfo, there's no need to access it again.Eric Christopher2015-01-081-8/+3
| | | | llvm-svn: 225456
* [X86] Reflow comment. NFC.Ahmed Bougacha2015-01-081-3/+4
| | | | llvm-svn: 225455
* Add saving and restoring of r30 to the prologue and epilogue, respectivelyJustin Hibbits2015-01-082-0/+17
| | | | | | | | | | | | | | | | Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450
* Fix large stack alignment codegen for ARM and Thumb2 targetsKristof Beyls2015-01-082-22/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446
* R600/SI: Remove SIISelLowering::legalizeOperands()Tom Stellard2015-01-082-176/+1
| | | | | | | | | Its functionality has been replaced by calling SIInstrInfo::legalizeOperands() from SIISelLowering::AdjstInstrPostInstrSelection() and running the SIFoldOperands and SIShrinkInstructions passes. llvm-svn: 225445
* [X86] Don't try to generate direct calls to TLS globalsMichael Kuperstein2015-01-081-1/+2
| | | | | | | | | The call lowering assumes that if the callee is a global, we want to emit a direct call. This is correct for regular globals, but not for TLS ones. Differential Revision: http://reviews.llvm.org/D6862 llvm-svn: 225438
* [X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the ↵Craig Topper2015-01-084-4/+14
| | | | | | LEA variants in Intel syntax. The memory operand is inherently unsized. llvm-svn: 225432
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-0815-136/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* X86: VZeroUpperInserter: shortcut should not trigger if we have any function ↵Matthias Braun2015-01-081-8/+12
| | | | | | live-ins. llvm-svn: 225419
* R600/SI: Commute instructions to enable more folding opportunitiesTom Stellard2015-01-072-19/+51
| | | | llvm-svn: 225410
OpenPOWER on IntegriCloud