summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Initial improvements to the new shuffle lowering for v16i8Chandler Carruth2014-07-101-10/+36
| | | | | | | | | | | | | shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680
* R600/SI: Add support for llvm.convert.{to|from}.fp16Matt Arsenault2014-07-101-2/+6
| | | | llvm-svn: 212676
* [x86] Refactor some of the new code for lowering v16i8 shuffles toChandler Carruth2014-07-101-17/+17
| | | | | | | | remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674
* [SDAG] Make the new zext-vector-inreg node default to expand so targetsChandler Carruth2014-07-091-1/+0
| | | | | | | | | | | don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661
* AArch64: Better codegen for storing to __fp16.Jim Grosbach2014-07-091-0/+40
| | | | | | | | | | | | | Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 llvm-svn: 212638
* TargetRegisterInfo: Remove function that fell out of use years ago.Benjamin Kramer2014-07-092-19/+0
| | | | llvm-svn: 212636
* [X86] AVX512: Enable it in the Loop VectorizerAdam Nemet2014-07-091-1/+5
| | | | | | | | | | This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634
* Make AArch64FastISel::EmitIntExt explicitly check its source and destination ↵Louis Gerbarg2014-07-091-3/+8
| | | | | | | | | | | | types This is a follow up to r212492. There should be no functional difference, but this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be an i8/i16/i32/i64. rdar://17516686 llvm-svn: 212633
* X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.Benjamin Kramer2014-07-091-5/+13
| | | | | | | | | | Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [mips][mips64r6] Correct select patterns that have the condition or ↵Daniel Sanders2014-07-092-26/+26
| | | | | | | | | | true/false values backwards Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others). Differential Revision: http://reviews.llvm.org/D4388 llvm-svn: 212608
* [mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructionsDaniel Sanders2014-07-092-41/+42
| | | | | | | | | | Summary: It seems we accidentally read the wrong column of the table MIPS64r6 spec and used the names for c.cond.fmt instead of cmp.cond.fmt. Differential Revision: http://reviews.llvm.org/D4387 llvm-svn: 212607
* [x86] Initialize a pointer to null to fix a bug in r212602.Chandler Carruth2014-07-091-1/+1
| | | | | | | This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606
* [mips][mips64r6] Use JALR for indirect branches instead of JR (which is not ↵Daniel Sanders2014-07-095-25/+47
| | | | | | | | | | | | | | | | | available on MIPS32r6/MIPS64r6) Summary: This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6. Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4269 llvm-svn: 212605
* [mips][mips64r6] Use JALR for returns instead of JR (which is not available ↵Daniel Sanders2014-07-0910-26/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on MIPS32r6/MIPS64r6) Summary: RET, and RET_MM have been replaced by a pseudo named PseudoReturn. In addition a version with a 64-bit GPR named PseudoReturn64 has been added. Instruction selection for a return matches RetRA, which is expanded post register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter, this PseudoReturn/PseudoReturn64 are emitted as: - (JALR64 $zero, $rs) on MIPS64r6 - (JALR $zero, $rs) on MIPS32r6 - (JR_MM $rs) on microMIPS - (JR $rs) otherwise On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid development and review (specifically, to ensure all cases of jr are updated), these aliases are temporarily named 'r6.jr' instead of 'jr'. A follow up patch will change them back to the correct mnemonic. Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect jump, and removed it from its definition of a call. Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's doesn't appear to account for any MIPS64-specifics. The return instruction created as part of eh_return expansion is now expanded using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6 ('jalr $zero, $rs'). Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in expandEhReturn(). Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4268 llvm-svn: 212604
* [x86] Re-apply a variant of the x86 side of r212324 now that the restChandler Carruth2014-07-091-74/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602
* MipsTargetStreamer.h: Avoid "using" to appease msc17.NAKAMURA Takumi2014-07-081-1/+1
| | | | llvm-svn: 212577
* AArch64: Better codegen for loading from __fp16.Jim Grosbach2014-07-081-0/+35
| | | | | | | | | | | | | Loading will generally extend to an f32 or an 64, so make sure to match those patterns directly to load into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-to-f64 path which was first converting to f32 and then to f64 from there. rdar://17594379 llvm-svn: 212573
* [PowerPC] Implement atomic NAND operations as actual NANDUlrich Weigand2014-07-081-4/+4
| | | | | | | | | | | This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547
* [mips] Fixed struct/class mismatch introduced in r212522.Daniel Sanders2014-07-081-1/+1
| | | | | | Clang emits a warning about this. llvm-svn: 212528
* Fix r212522 - [mips] Improve encapsulation of the .MIPS.abiflags ↵Daniel Sanders2014-07-081-0/+3
| | | | | | | | implementation and limit scope of related enums Added two lines that should have been in r212522. llvm-svn: 212523
* [mips] Improve encapsulation of the .MIPS.abiflags implementation and limit ↵Daniel Sanders2014-07-087-298/+353
| | | | | | | | | | | | | | | scope of related enums Summary: Follow on to r212519 to improve the encapsulation and limit the scope of the enums. Also merged two very similar parser functions, fixed a bug where ASE's were not being reported, and marked CPR1's as being 128-bit when MSA is enabled. Differential Revision: http://reviews.llvm.org/D4384 llvm-svn: 212522
* Revert "Refactor ARM subarchitecture parsing"Renato Golin2014-07-081-78/+82
| | | | | | This reverts commit 7b4a6882467e7fef4516a0cbc418cbfce0fc6f6d. llvm-svn: 212521
* Truncate the immediate in logical operation to the register widthArnaud A. de Grandmaison2014-07-081-2/+7
| | | | | | And continue to produce an error if the 32 most significant bits are not all ones or zeros. llvm-svn: 212520
* Mips.abiflags is a new implicitly generated section that will be present on ↵Vladimir Medic2014-07-085-65/+527
| | | | | | all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it. llvm-svn: 212519
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-41/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*Adam Nemet2014-07-081-7/+7
| | | | | | | | | As destination k0 is allowed but not as predicate/writemask. I also modified the test to allow checking of error messages by the assembler. I applied a similar approach to the test ret.s in the same directory. llvm-svn: 212504
* [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with ↵Andrea Di Biagio2014-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | different types. When combining a sequence of two PSHUFD dag nodes into a single PSHUFD, make sure that we assign the correct type to the resulting PSHUFD. X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32. Before this change, an assertion failure was triggered in method 'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test below into a single PSHUFD. define <4 x float> @test1(<4 x float> %V) { %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1> ret <4 x float> %2 } llvm-svn: 212498
* [FastISel][X86] Fix smul.with.overflow.i8 lowering.Juergen Ributzka2014-07-071-3/+19
| | | | | | | | | | | Add custom lowering code for signed multiply instruction selection, because the default FastISel instruction selection for ISD::MUL will use unsigned multiply for the i8 type and signed multiply for all other types. This would set the incorrect flags for the overflow check. This fixes <rdar://problem/17549300> llvm-svn: 212493
* Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128Louis Gerbarg2014-07-071-0/+6
| | | | | | | | | | | | | | | | | Currently AArch64FastISel crashes if it tries to extend an integer into an MVT::i128. This can happen by creating 128 bit integers like so: typedef unsigned int uint128_t __attribute__((mode(TI))); typedef int sint128_t __attribute__((mode(TI))); This patch makes EmitIntExt check for their presence and then falls back to SelectionDAG. Tests included. rdar://17516686 llvm-svn: 212492
* Refactor ARM subarchitecture parsingRenato Golin2014-07-071-82/+78
| | | | | | | | | According to a FIXME in ARMMCTargetDesc.cpp the ARM version parsing should be in the Triple helper class. Patch by: Gabor Ballabas llvm-svn: 212479
* [PowerPC] Fix no-assert buildUlrich Weigand2014-07-071-0/+1
| | | | | | | r212476 caused a compile failure (unused variable) in a non-assertion build ... llvm-svn: 212477
* [PowerPC] Fix "byval align" argumentsUlrich Weigand2014-07-071-67/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument *from* its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476
* [x86] Revert r212324 which was too aggressive w.r.t. allowing undefChandler Carruth2014-07-071-76/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lanes in vector splats. The core problem here is that undef lanes can't *unilaterally* be considered to contribute to splats. Their handling needs to be more cautious. There is also a reported failure of the nightly testers (thanks Tobias!) that may well stem from the same core issue. I'm going to fix this theoretical issue, factor the APIs a bit better, and then verify that I don't see anything bad with Tobias's reduction from the test suite before recommitting. Original commit message for r212324: [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for any constant, constant FP, or undef splat and to tolerate any undef lanes in a splat, then replace all uses of isSplatVector in X86's lowering with it. This fixes issues where undef lanes in an otherwise splat vector would prevent the splat logic from firing. It is a touch more awkward to use this interface, but it is much more accurate. Suggestions for better interface structuring welcome. With this fix, the code generated with the widening legalization strategy for widen_cast-4.ll is *dramatically* improved as the special lowering strategies for a v16i8 SRA kick in even though the high lanes are undef. We also get a slightly different choice for broadcasting an aligned memory location, and use vpshufd instead of vbroadcastss. This looks like a minor win for pipelining and domain crossing, but a minor loss for the number of micro-ops. I suspect its a wash, but folks can easily tweak the lowering if they want. llvm-svn: 212475
* R600: Fix mishandling of load / store chains.Matt Arsenault2014-07-073-36/+90
| | | | | | | | Fixes various bugs with reordering loads and stores. Scalarized vector loads weren't collecting the chains at all. llvm-svn: 212473
* Fix typo, weird indentationMatt Arsenault2014-07-071-2/+4
| | | | llvm-svn: 212472
* X86: revert unintentional change to X86FastISel.Tim Northover2014-07-071-1/+1
| | | | | | This crept in with r212443. llvm-svn: 212459
* [asan] Generate asm instrumentation in MC.Evgeniy Stepanov2014-07-071-63/+308
| | | | | | | | | Generate entire ASan asm instrumentation in MC without relying on runtime helper functions. Patch by Yuri Gorshenin. llvm-svn: 212455
* [x86] Teach the new vector shuffle lowering code to handle what isChandler Carruth2014-07-071-0/+41
| | | | | | | | | | | | | | | | | | essentially a DAG combine that never gets a chance to run. We might typically expect DAG combining to remove shuffles-of-splats and other similar patterns, but we don't get a chance to run the DAG combiner when we recursively form sub-shuffles during the lowering of a shuffle. So instead hand-roll a really important combine directly into the lowering code to detect shuffles-of-splats, especially shuffles of an all-zero splat which needn't even have the same element width, etc. This lets the new vector shuffle lowering handle shuffles which implement things like zero-extension really nicely. This will become even more important when I wire the legalization of zero-extension to vector shuffles with the new widening legalization strategy. llvm-svn: 212444
* CodeGen: it turns out that NAND is not the same thing as BIC. At all.Tim Northover2014-07-071-1/+1
| | | | | | | | | | | We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443
* ARM: properly lower dllimport'ed global valuesSaleem Abdulrasool2014-07-071-2/+23
| | | | | | | | | | | | | | This completes the handling for DLL import storage symbols when lowering instructions. A DLL import storage symbol must have an additional load performed prior to use. This is applicable to variables and functions. This is particularly important for non-function symbols as it is possible to handle function references by emitting a thunk which performs the translation from the unprefixed __imp_ symbol to the proper symbol (although, this is a non-optimal lowering). For a variable symbol, no such thunk can be accommodated. llvm-svn: 212431
* ARM: correctly mangle dllimport symbolsSaleem Abdulrasool2014-07-072-17/+40
| | | | | | | | Add support for tracking DLLImport storage class information on a per symbol basis in the ARM instruction selection. Use that information to correctly mangle the symbol (dllimport symbols are referenced via *__imp_<name>). llvm-svn: 212430
* ARM: unify symbol name retrievalSaleem Abdulrasool2014-07-071-5/+8
| | | | | | | | | | | Ensure that all paths that retrieve the symbol name go through GetARMGVSymbol rather than getSymbol. This is desirable so that any global symbol mangling can be centralised to this function. The motivation for this is handling of symbols that are marked as having dll import dll storage. Such a symbol requires an extra load that is currently handled in the backend and a __imp_ prefix on the symbol name. llvm-svn: 212429
* [AArch64] Normalize all constants to build a vector.Kevin Qin2014-07-071-1/+27
| | | | | | The value of constant operands will be truncated to fit element width. llvm-svn: 212428
* AArch64: whitespace cleanupSaleem Abdulrasool2014-07-061-1/+1
| | | | llvm-svn: 212420
* Use cast<> instead of dyn_cast + assertMatt Arsenault2014-07-051-2/+1
| | | | llvm-svn: 212380
* Fix grammarMatt Arsenault2014-07-051-1/+1
| | | | llvm-svn: 212379
* Add support for parsing the not operator in Microsoft inline assemblyEhsan Akhgari2014-07-041-5/+36
| | | | | | This fixes http://llvm.org/PR20202 llvm-svn: 212352
* [mips][mips64r6] Set ELF e_flags for MIPS32r6/MIPS64r6. Also do MIPS-I to MIPS-VDaniel Sanders2014-07-041-1/+13
| | | | | | Differential Revision: http://reviews.llvm.org/D4386 llvm-svn: 212346
* ARM: when falling back to scattered relocs, keep the type.Tim Northover2014-07-041-3/+7
| | | | | | | | | | The linker relies on relocation type info (e.g. is it a branch?) to perform the correct actions, so we should keep that even when we end up using a scattered relocation for whatever reason. rdar://problem/17553104 llvm-svn: 212333
OpenPOWER on IntegriCloud