summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Add comment to explain a non-obvious setting; NFC.Sanjay Patel2015-02-171-0/+6
| | | | | | | This is paraphrased from Simon Pilgrim's comment in: http://reviews.llvm.org/D7492 llvm-svn: 229566
* remove function names from comments; NFCSanjay Patel2015-02-171-38/+31
| | | | llvm-svn: 229558
* replace meaningless variable names; NFCISanjay Patel2015-02-171-31/+31
| | | | llvm-svn: 229549
* R600/SI: Fix asam errors in SIFoldOperandsTom Stellard2015-02-171-1/+2
| | | | | | | We were trying to fold into implicit uses, which led to out of bounds access of the MCInstrDesc::OpInfo arrray. llvm-svn: 229533
* prevent folding a scalar FP load into a packed logical FP instruction (PR22371)Sanjay Patel2015-02-173-16/+50
| | | | | | | | | | | | | | | | Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 llvm-svn: 229531
* Make the ARM AsmPrinter independent of global subtargetEric Christopher2015-02-173-69/+92
| | | | | | | | | | | | | | | | | initialization. Initialize the subtarget once per function and migrate Emit{Start|End}OfAsmFile to either use attributes on the TargetMachine or get information from the subtarget we'd use for assembling. One bit (getISAEncoding) touched the general AsmPrinter and the debug output. Handle this one by passing the function for the subprogram down and updating all callers and users. The top-level-ness of the ARM attribute output for assembly is, by nature, contrary to how we'd want to do this for an LTO situation where we have multiple cpu architectures so this solution is good enough for now. llvm-svn: 229528
* R600/SI: Extend private extload pattern to include zext loadsTom Stellard2015-02-171-4/+6
| | | | llvm-svn: 229507
* Prefer SmallVector::append/insert over push_back loops.Benjamin Kramer2015-02-177-53/+19
| | | | | | Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500
* [X86] Silence -Wsign-compare warnings.Andrea Di Biagio2015-02-171-2/+2
| | | | | | | | | GCC 4.8 reported two new warnings due to comparisons between signed and unsigned integer expressions. The new warnings were accidentally introduced by revision 229480. Added explicit casts to silence the warnings. No functional change intended. llvm-svn: 229488
* AVX-512: changes in intel_ocl_bi calling conventionsElena Demikhovsky2015-02-172-14/+33
| | | | | | | | - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions llvm-svn: 229482
* [X86] Combine vector anyext + and into a vector zextMichael Kuperstein2015-02-171-9/+99
| | | | | | | | | | Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 llvm-svn: 229480
* Make the PowerPC AsmPrinter independent of global subtargetEric Christopher2015-02-171-15/+24
| | | | | | | | | initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use attributes on the TargetMachine or get information from all of the various subtargets. llvm-svn: 229475
* Add a FIXME to move IsLittleEndian to the target machine.Eric Christopher2015-02-171-0/+1
| | | | llvm-svn: 229472
* Move ABI handling and 64-bitness to the PowerPC target machine.Eric Christopher2015-02-175-32/+42
| | | | | | | This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471
* [x86] Teach the unpack lowering to try wider element unpacks.Chandler Carruth2015-02-171-16/+52
| | | | | | | | | | | This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. llvm-svn: 229463
* [PowerPC] Support non-direct-sub/superclass VSX copiesHal Finkel2015-02-161-4/+4
| | | | | | | | | | | | | | | | Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457
* [Mips] Add .MIPS.options section descriptor kinds enumerationSimon Atanasyan2015-02-161-1/+1
| | | | | | No functional changes. llvm-svn: 229452
* [ARM] Remove unused declaration. NFC.Ahmed Bougacha2015-02-161-1/+0
| | | | | | | GlobalMerge was moved to lib/CodeGen a while ago, and is no longer called "ARMGlobalMerge". llvm-svn: 229448
* [AVX512] Make 512b vector floating point rounds legal on AVX512.Cameron McInally2015-02-161-0/+11
| | | | llvm-svn: 229445
* [X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domainSimon Pilgrim2015-02-161-10/+10
| | | | | | | | Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 llvm-svn: 229439
* [X86] Remove the multiply by 8 that goes into the shift constant for ↵Craig Topper2015-02-162-36/+31
| | | | | | X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction. llvm-svn: 229431
* [X86] Remove x86.avx2.psll.dq.bs and x86.avx2.psrl.dq.bs intrinsics.Craig Topper2015-02-161-7/+3
| | | | llvm-svn: 229430
* ARM: Transfer kill flag when lowering VSTMQIA to VSTMDIA.Matthias Braun2015-02-161-1/+2
| | | | llvm-svn: 229425
* We require MSVC 1800 as our minimum, so these checks can safely go away; ↵Aaron Ballman2015-02-161-12/+7
| | | | | | NFC. (It seems this code has been copy/pasted around, unfortunately.) llvm-svn: 229417
* AArch64: Safely handle the incoming sret call argument.Andrew Trick2015-02-167-28/+43
| | | | | | | | | | | | | | | | | | This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
* [x86] Add a generic unpack-targeted lowering technique. This can be usedChandler Carruth2015-02-161-0/+54
| | | | | | | | | | | | | to generically lower blends and is particularly nice because it is available frome SSE2 onward. This removes a lot of the remaining domain crossing blends in SSE2 code. I'm hoping to replace some of the "interleaved" lowering hacks with something closer to this which should be more principled. First, this needs to learn how to detect and use other interleavings besides that of the natural type provided. That will be a follow-up patch though. llvm-svn: 229378
* [x86] Add initial basic support for forming blends of v16i8 vectors.Chandler Carruth2015-02-161-10/+22
| | | | | | | | | | | | | This blend instruction is ... really lame. The register usage is insane. As a consequence this is probably only *barely* better than 2 pshufbs followed by a por, and that mostly because it only has to read from a single memory location. However, this doesn't fix as much as I kind of expected, so more to go. Pretty sure that the ordering and delegation of v16i8 is just really, really bad. llvm-svn: 229373
* [x86] Switch my usage of VariadicFunction to a "normal" variadicChandler Carruth2015-02-161-30/+37
| | | | | | | | | template now that we can use them. This is, of course, horribly ugly because of the required recursive formulation. Suggestions for making it less ugly welcome. llvm-svn: 229367
* [X86] Add support for lowering shuffles to 256-bit PALIGNR instruction.Craig Topper2015-02-161-78/+103
| | | | llvm-svn: 229359
* [x86] Teach the 128-bit vector shuffle lowering routines to takeChandler Carruth2015-02-161-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a *very* fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. llvm-svn: 229350
* Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for ↵Aaron Ballman2015-02-1510-20/+20
| | | | | | requiring the macro. NFC; LLVM edition. llvm-svn: 229340
* Removing LLVM_EXPLICIT, as MSVC 2012 was the last reason for requiring the ↵Aaron Ballman2015-02-152-2/+2
| | | | | | macro. NFC; LLVM edition. llvm-svn: 229335
* Coding style fixes to recent patches. NFC.Simon Pilgrim2015-02-151-6/+6
| | | | llvm-svn: 229312
* [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2Simon Pilgrim2015-02-152-62/+79
| | | | | | | | | | This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions. Differential Revision: http://reviews.llvm.org/D7596 llvm-svn: 229311
* [x86] Teach the decomposed shuffle/blend lowering to use an early blendChandler Carruth2015-02-151-3/+14
| | | | | | | | | | | | when that will allow it to lower with a single permute instead of multiple permutes. It tries to detect when it will only have to do a single permute in either case to maximize folding of loads and such. This cuts a *lot* of the avx2 shuffle permute counts in half. =] llvm-svn: 229309
* [x86] Teach the shuffle mask equivalence test to look through buildChandler Carruth2015-02-151-52/+65
| | | | | | | | | | | | | | | | vectors and detect equivalent inputs. This lets the code match unpck-style instructions when only one of the inputs are lined up but the other input is a splat and so which lanes we pull from doesn't matter. Today, this doesn't really happen, but just by accident. I have a patch that normalizes how we shuffle splats, and with that patch this will be necessary for a lot of the mask equivalence tests to work. I don't really know how to write a test case for this specific change until the other change lands though. llvm-svn: 229307
* [x86] Tweak the ordering of unpack matching vs. element insertion, andChandler Carruth2015-02-151-17/+21
| | | | | | | | | | | | | | | | | don't try to do element insertion for non-zero-index floating point vectors. We don't have any useful patterns or lowering for element insertion into high elements of a floating point vector, and the generic shuffle lowering will end up being better -- namely it will fall back to unpck. But we should try to handle other forms of element insertion before matching unpck patterns. While this doesn't matter much right now, I'm working on a patch that makes unpck matching much more powerful, and that patch will break without this re-ordering. llvm-svn: 229306
* [x86] Stop shuffling zero vectors. =]Chandler Carruth2015-02-151-0/+7
| | | | | | | | | I was somewhat surprised this pattern really came up, but it does. It seems better to just directly handle it than try to special case every place where we end up forming a shuffle that devolves to a shuffle of a zero vector. llvm-svn: 229301
* [x86] Use a more helpful parenthesizing of these comparisons. SilencesChandler Carruth2015-02-151-2/+2
| | | | | | a -Wparentheses complaint from GCC. llvm-svn: 229300
* [x86] When splitting 256-bit vectors into 128-bit vectors, don't extractChandler Carruth2015-02-151-8/+37
| | | | | | | | | | | | | subvectors from buildvectors. That doesn't really make any sense and it breaks all of the down-stream matching of buildvectors to cleverly lower shuffles. With this, we now get the shift-based lowering of 256-bit vector shuffles with AVX1 when we split them into 128-bit vectors. We also do much better on the zero-extension patterns, although there remains quite a bit of room for improvement here. llvm-svn: 229299
* [x86] Make computing the zeroable elements slightly more powerful, atChandler Carruth2015-02-151-3/+8
| | | | | | | | | | | | least in theory. I don't actually have a test case that benefits from this, but theoretically, it could come up, and I don't want to try to think about whether this is the culprit or something else is, so I'd rather just make this code powerful. =/ Makes me sad that I can't really test it though. llvm-svn: 229298
* [x86] Add a slight variation on some of the other generic shuffleChandler Carruth2015-02-151-0/+55
| | | | | | | | | | | | | | | lowerings -- one which decomposes into an initial blend followed by a permute. Particularly on newer chips, blends are handled independently of shuffles and so this is much less bottlenecked on the single port that floating point shuffles are executed with on Intel. I'll be adding this lowering to a bunch of other code paths in subsequent commits to handle still more places where we can effectively leverage blends when they're available in the ISA. llvm-svn: 229292
* [X86] Add assembly parser support for mnemonic aliases for AVX-512 vpcmp ↵Craig Topper2015-02-151-12/+39
| | | | | | instructions. llvm-svn: 229287
* [X86] Add assembler predicates for the rest of the AVX512 feature flags. ↵Craig Topper2015-02-151-6/+11
| | | | | | This makes the assembly matching consistent across all AVX512 instructions. Without this we were allowing some AVX512 instructions to be parsed always, but not the foundation instructions. llvm-svn: 229280
* [X86] Add the remaining 11 possible exact ModRM formats. This makes their ↵Craig Topper2015-02-153-103/+57
| | | | | | encodings linear which can then be used to simplify some other code. llvm-svn: 229279
* [X86][XOP] Enable commutation for XOP instructionsSimon Pilgrim2015-02-142-68/+115
| | | | | | | | | | Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes. This patch also sets the SSE domains of all the XOP instructions. Differential Revision: http://reviews.llvm.org/D7646 llvm-svn: 229267
* [X86] Improve parsing support AVX/SSE floating point compare instruction ↵Craig Topper2015-02-141-19/+9
| | | | | | mnemonic aliases. They'll now print with the alias the parser received instead of converting to the explicit immediate form. llvm-svn: 229266
* Target: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-4/+2
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229261
* NVPTX: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-141-3/+1
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229260
* Line ending fix. NFC.Simon Pilgrim2015-02-141-81/+81
| | | | llvm-svn: 229256
OpenPOWER on IntegriCloud