summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.Quentin Colombet2014-07-111-9/+41
| | | | | | | | Also add a few comments. <rdar://problem/17581756> llvm-svn: 212808
* [X86] AVX512: Improve readability of isCDisp8Adam Nemet2014-07-111-3/+12
| | | | | | | | No functional change. As I was trying to understand this function, I found that variables were reused with confusing names and the broadcast case was a bit too implicit. Hopefully, this is an improvement. llvm-svn: 212795
* [X86] AVX512: Simplify logic in isCDisp8Adam Nemet2014-07-111-6/+6
| | | | | | | | | | | | It was computing the VL/n case as: MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize ElemByteSize not only falls out but VectorByteSize/Divider now actually matches the definition of VL/n. Also some formatting fixes. llvm-svn: 212794
* R600: Implement float to long/ulongJan Vesely2014-07-103-2/+18
| | | | | | | | | | | | | | Use alg. from LegalizeDAG.cpp Move Expand setting to SIISellowering v2: Extend existing tests instead of creating new ones v3: use separate LowerFPTOSINT function v4: use TargetLowering::expandFP_TO_SINT add comment about using FP_TO_SINT for uints Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 212773
* Use the integrated assembler by default on OpenBSD.Brad Smith2014-07-101-1/+2
| | | | llvm-svn: 212771
* [mips] Emit two CFI offset directives per double precision SDC1/LDC1Zoran Jovanovic2014-07-102-4/+21
| | | | | | | instead of just one for FR=1 registers Differential Revision: http://reviews.llvm.org/D4310 llvm-svn: 212769
* [X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.Akira Hatanaka2014-07-102-2/+5
| | | | | | | | | Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> llvm-svn: 212747
* Make it possible for the Subtarget to change between functionEric Christopher2014-07-109-56/+57
| | | | | | | | passes in the mips back end. This, unfortunately, required a bit of churn in the various predicates to use a pointer rather than a reference. llvm-svn: 212744
* Mips: Silence a -Wcovered-switch-defaultDavid Majnemer2014-07-101-2/+2
| | | | | | | | | Remove a default label which covered no enumerators, replace it with a llvm_unreachable. No functionality changed. llvm-svn: 212729
* [mips] Added FPXX modeless calling convention.Zoran Jovanovic2014-07-105-1/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D4293 llvm-svn: 212726
* [AArch64] Add logical alias instructions to MC AsmParserArnaud A. de Grandmaison2014-07-103-14/+76
| | | | | | | | | | | | | | | | This patch teaches the AsmParser to accept some logical+immediate instructions and convert them as shown: bic Rd, Rn, #imm -> and Rd, Rn, #~imm bics Rd, Rn, #imm -> ands Rd, Rn, #~imm orn Rd, Rn, #imm -> orr Rd, Rn, #~imm eon Rd, Rn, #imm -> eor Rd, Rn, #~imm Those instructions are an alternate syntax available to assembly coders, and are needed in order to support code already compiling with some other assemblers. For example, the bic construct is used by the linux kernel. llvm-svn: 212722
* AArch64: correctly fast-isel i8 & i16 multipliesTim Northover2014-07-101-0/+1
| | | | | | | | We were asking for a register for type i8 or i16 which caused an assert. rdar://problem/17620015 llvm-svn: 212718
* [mips] Add support for -modd-spreg/-mno-odd-spregDaniel Sanders2014-07-1011-98/+250
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When -mno-odd-spreg is in effect, 32-bit floating point values are not permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit floating point comparison results from being written to odd registers. This option has three purposes: * It allows support for certain MIPS implementations such as loongson-3a that do not allow the use of odd registers for single precision arithmetic. * When using -mfpxx, -mno-odd-spreg is the default and this allows us to statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1 instructions to/from odd registers are guaranteed not to appear for any reason. Once this has been established, the user can then re-enable -modd-spreg to regain the use of all 32 single-precision registers. * When using -mfp64 and -mno-odd-spreg together, an O32 extension named O32 FP64A is used as the ABI. This is intended to provide almost all functionality of an FR=1 processor but can also be executed on a FR=0 core with the assistance of a hardware compatibility mode which emulates FR=0 behaviour on an FR=1 processor. * Added '.module oddspreg' and '.module nooddspreg' each of which update the .MIPS.abiflags section appropriately * Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller doesn't have to remember to do it. * MipsABIFlags now calculates the flags1 and flags2 member on demand rather than trying to maintain them in the same format they will be emitted in. There is one portion of the -mfp64 and -mno-odd-spreg combination that is not implemented yet. Moves to/from odd-numbered double-precision registers must not use mtc1. I will fix this in a follow-up. Differential Revision: http://reviews.llvm.org/D4383 llvm-svn: 212717
* [x32] Add AsmBackend for X32 which uses ELF32 with x86_64 (the author is ↵Zinovy Nis2014-07-101-0/+14
| | | | | | | | | | Pavel Chupin). This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow. Differential Revision: http://reviews.llvm.org/D4181 llvm-svn: 212716
* [SystemZ] Use SystemZCallingConv.td to define callee-saved registersRichard Sandiford2014-07-105-16/+23
| | | | | | Just a clean-up. No behavioral change intended. llvm-svn: 212711
* [SystemZ] Tweak instruction format classificationsRichard Sandiford2014-07-102-53/+43
| | | | | | | | | | There's no real need to have Shift as a separate format type from Binary. The comments for other format types were too specific and in some cases no longer accurate. Just a clean-up, no behavioral change intended. llvm-svn: 212707
* [x86] Add another combine that is particularly useful for the new vectorChandler Carruth2014-07-101-0/+41
| | | | | | | | | | shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. llvm-svn: 212705
* [SystemZ] Add MC support for LEDBRA, LEXBRA and LDXBRARichard Sandiford2014-07-101-0/+7
| | | | | | | These instructions aren't used for codegen since the original L*DB instructions are suitable for fround. llvm-svn: 212703
* [SystemZ] Avoid using i8 constants for immediate fieldsRichard Sandiford2014-07-105-58/+54
| | | | | | | | | | | | Immediate fields that have no natural MVT type tended to use i8 if the field was small enough. This was a bit confusing since i8 isn't a legal type for the target. Fields for short immediates in a 32-bit or 64-bit operation use i32 or i64 instead, so it would be better to do the same for all fields. No behavioral change intended. llvm-svn: 212702
* [SystemZ] Fix FPR dwarf numberingRichard Sandiford2014-07-101-1/+24
| | | | | | | | The dwarf FPR numbers are supposed to have the order F0, F2, F4, F6, F1, F3, F5, F7, F8, etc., which matches the pairing of registers for long doubles. E.g. a long double stored in F0 is paired with F2. llvm-svn: 212701
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [x86] Expand the target DAG combining for PSHUFD nodes to be able toChandler Carruth2014-07-101-1/+34
| | | | | | | | | | combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be *really* nice. llvm-svn: 212695
* [x86] Tweak the v16i8 single input special case lowering for shufflesChandler Carruth2014-07-101-34/+44
| | | | | | | | | | | | | | | | | | | that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. llvm-svn: 212692
* [x86] Initial improvements to the new shuffle lowering for v16i8Chandler Carruth2014-07-101-10/+36
| | | | | | | | | | | | | shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. llvm-svn: 212680
* R600/SI: Add support for llvm.convert.{to|from}.fp16Matt Arsenault2014-07-101-2/+6
| | | | llvm-svn: 212676
* [x86] Refactor some of the new code for lowering v16i8 shuffles toChandler Carruth2014-07-101-17/+17
| | | | | | | | remove duplication and make it easier to select different strategies. No functionality changed. llvm-svn: 212674
* [SDAG] Make the new zext-vector-inreg node default to expand so targetsChandler Carruth2014-07-091-1/+0
| | | | | | | | | | | don't need to set it manually. This is based on feedback from Tom who pointed out that if every target needs to handle this we need to reach out to those maintainers. In fact, it doesn't make sense to duplicate everything when anything other than expand seems unlikely at this stage. llvm-svn: 212661
* AArch64: Better codegen for storing to __fp16.Jim Grosbach2014-07-091-0/+40
| | | | | | | | | | | | | Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 llvm-svn: 212638
* TargetRegisterInfo: Remove function that fell out of use years ago.Benjamin Kramer2014-07-092-19/+0
| | | | llvm-svn: 212636
* [X86] AVX512: Enable it in the Loop VectorizerAdam Nemet2014-07-091-1/+5
| | | | | | | | | | This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634
* Make AArch64FastISel::EmitIntExt explicitly check its source and destination ↵Louis Gerbarg2014-07-091-3/+8
| | | | | | | | | | | | types This is a follow up to r212492. There should be no functional difference, but this patch makes it clear that SrcVT must be an i1/i8/16/i32 and DestVT must be an i8/i16/i32/i64. rdar://17516686 llvm-svn: 212633
* X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.Benjamin Kramer2014-07-091-5/+13
| | | | | | | | | | Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. llvm-svn: 212611
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [mips][mips64r6] Correct select patterns that have the condition or ↵Daniel Sanders2014-07-092-26/+26
| | | | | | | | | | true/false values backwards Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others). Differential Revision: http://reviews.llvm.org/D4388 llvm-svn: 212608
* [mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructionsDaniel Sanders2014-07-092-41/+42
| | | | | | | | | | Summary: It seems we accidentally read the wrong column of the table MIPS64r6 spec and used the names for c.cond.fmt instead of cmp.cond.fmt. Differential Revision: http://reviews.llvm.org/D4387 llvm-svn: 212607
* [x86] Initialize a pointer to null to fix a bug in r212602.Chandler Carruth2014-07-091-1/+1
| | | | | | | This should restore GCC hosts (which happen to put the bad stuff into the pointer) and MSan, etc. llvm-svn: 212606
* [mips][mips64r6] Use JALR for indirect branches instead of JR (which is not ↵Daniel Sanders2014-07-095-25/+47
| | | | | | | | | | | | | | | | | available on MIPS32r6/MIPS64r6) Summary: This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6. Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4269 llvm-svn: 212605
* [mips][mips64r6] Use JALR for returns instead of JR (which is not available ↵Daniel Sanders2014-07-0910-26/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on MIPS32r6/MIPS64r6) Summary: RET, and RET_MM have been replaced by a pseudo named PseudoReturn. In addition a version with a 64-bit GPR named PseudoReturn64 has been added. Instruction selection for a return matches RetRA, which is expanded post register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter, this PseudoReturn/PseudoReturn64 are emitted as: - (JALR64 $zero, $rs) on MIPS64r6 - (JALR $zero, $rs) on MIPS32r6 - (JR_MM $rs) on microMIPS - (JR $rs) otherwise On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid development and review (specifically, to ensure all cases of jr are updated), these aliases are temporarily named 'r6.jr' instead of 'jr'. A follow up patch will change them back to the correct mnemonic. Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect jump, and removed it from its definition of a call. Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's doesn't appear to account for any MIPS64-specifics. The return instruction created as part of eh_return expansion is now expanded using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6 ('jalr $zero, $rs'). Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in expandEhReturn(). Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4268 llvm-svn: 212604
* [x86] Re-apply a variant of the x86 side of r212324 now that the restChandler Carruth2014-07-091-74/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. llvm-svn: 212602
* MipsTargetStreamer.h: Avoid "using" to appease msc17.NAKAMURA Takumi2014-07-081-1/+1
| | | | llvm-svn: 212577
* AArch64: Better codegen for loading from __fp16.Jim Grosbach2014-07-081-0/+35
| | | | | | | | | | | | | Loading will generally extend to an f32 or an 64, so make sure to match those patterns directly to load into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-to-f64 path which was first converting to f32 and then to f64 from there. rdar://17594379 llvm-svn: 212573
* [PowerPC] Implement atomic NAND operations as actual NANDUlrich Weigand2014-07-081-4/+4
| | | | | | | | | | | This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547
* [mips] Fixed struct/class mismatch introduced in r212522.Daniel Sanders2014-07-081-1/+1
| | | | | | Clang emits a warning about this. llvm-svn: 212528
* Fix r212522 - [mips] Improve encapsulation of the .MIPS.abiflags ↵Daniel Sanders2014-07-081-0/+3
| | | | | | | | implementation and limit scope of related enums Added two lines that should have been in r212522. llvm-svn: 212523
* [mips] Improve encapsulation of the .MIPS.abiflags implementation and limit ↵Daniel Sanders2014-07-087-298/+353
| | | | | | | | | | | | | | | scope of related enums Summary: Follow on to r212519 to improve the encapsulation and limit the scope of the enums. Also merged two very similar parser functions, fixed a bug where ASE's were not being reported, and marked CPR1's as being 128-bit when MSA is enabled. Differential Revision: http://reviews.llvm.org/D4384 llvm-svn: 212522
* Revert "Refactor ARM subarchitecture parsing"Renato Golin2014-07-081-78/+82
| | | | | | This reverts commit 7b4a6882467e7fef4516a0cbc418cbfce0fc6f6d. llvm-svn: 212521
* Truncate the immediate in logical operation to the register widthArnaud A. de Grandmaison2014-07-081-2/+7
| | | | | | And continue to produce an error if the 32 most significant bits are not all ones or zeros. llvm-svn: 212520
* Mips.abiflags is a new implicitly generated section that will be present on ↵Vladimir Medic2014-07-085-65/+527
| | | | | | all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it. llvm-svn: 212519
* [x86,SDAG] Sink the logic for folding shuffles of splats moreChandler Carruth2014-07-081-41/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | aggressively from the x86 shuffle lowering to the generic SDAG vector shuffle formation code. This code already tried to fold away shuffles of splats! It just had lots of bugs and couldn't handle the case my new x86 shuffle lowering needed. First, it failed to correctly compute whether N2 was undef because it pre-computed this, then did transformations which could *make* N2 undef, then failed to ever re-consider the precomputed state. Second, it didn't look through bitcasts at all, even in the safe cases where they are just element-type bitcasts with no change to the number of elements. Third, it didn't handle all-zero bit casts nicely the way my code in the x86 side of things did, which is essential to getting good zext-shuffle lowerings. But all of these are generic. I just ported the code down to this layer and fixed the surrounding bugs. Tests exercising this in the x86 backend still pass and some silly code in widen_cast-6.ll gets better. I updated that test to be a bit more precise but it's still pretty unclear what the value of the test is in this day and age. llvm-svn: 212517
* [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*Adam Nemet2014-07-081-7/+7
| | | | | | | | | As destination k0 is allowed but not as predicate/writemask. I also modified the test to allow checking of error messages by the assembler. I applied a similar approach to the test ret.s in the same directory. llvm-svn: 212504
OpenPOWER on IntegriCloud