summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] Start improving the matching of unpck instructions based on testChandler Carruth2014-11-121-0/+6
| | | | | | | | cases from Halide folks. This initial step was extracted from a prototype change by Clay Wood to try and address regressions found with Halide and the new vector shuffle lowering. llvm-svn: 221779
* AVX-512: Intrinsics for ERIElena Demikhovsky2014-11-125-59/+94
| | | | | | | | | 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774
* Reverts r221772 which fails testsJingyue Wu2014-11-121-39/+2
| | | | llvm-svn: 221773
* Disable indvar widening if arithmetics on the wider type are more expensiveJingyue Wu2014-11-121-2/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772
* [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsicsBill Schmidt2014-11-122-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767
* Pass an ArrayRef to MCDisassembler::getInstruction.Rafael Espindola2014-11-1211-83/+68
| | | | | | | | | | | | With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751
* Remove a bit of dead code.Rafael Espindola2014-11-123-19/+10
| | | | | | Every "real" object file implements this an ptx doesn't use it. llvm-svn: 221746
* Initialize new subtarget feature variable for generating reciprocal estimate ↵Sanjay Patel2014-11-111-0/+1
| | | | | | | | instructions. This was missed in r221706. llvm-svn: 221731
* [FastISel][AArch64] Add support for fabs intrinsic.Juergen Ributzka2014-11-111-0/+26
| | | | | | | | Lower the llvm.fabs intrinsic to the 'fabs' MI instruction. This fixes rdar://problem/18946552. llvm-svn: 221729
* Revert "IR: MDNode => Value"Duncan P. N. Exon Smith2014-11-113-5/+4
| | | | | | | | | | | | | | | | | Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711
* Add Forward Control-Flow Integrity.Tom Roeder2014-11-114-29/+18
| | | | | | | | | | | | | | | | | | | | This commit adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails. The default failure function ignores the error and continues. This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site. Also, this removes support for jumptables in ARM, pending further performance analysis and discussion. Review: http://reviews.llvm.org/D4167 llvm-svn: 221708
* Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).Sanjay Patel2014-11-114-1/+44
| | | | | | | | | | | | | | | | | | | | | | This is a first step for generating SSE rcp instructions for reciprocal calcs when fast-math allows it. This is very similar to the rsqrt optimization enabled in D5658 ( http://reviews.llvm.org/rL220570 ). For now, be conservative and only enable this for AMD btver2 where performance improves significantly both in terms of latency and throughput. We may never enable this codegen for Intel Core* chips because the divider circuits are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21 cycle critical path for the rcp + mul + sub + mul + add estimate. Follow-on patches may allow configuration of the number of Newton-Raphson refinement steps, add AVX512 support, and enable the optimization for more chips. More background here: http://llvm.org/bugs/show_bug.cgi?id=21385 Differential Revision: http://reviews.llvm.org/D6175 llvm-svn: 221706
* [PowerPC] Replace foul hackery with real calls to __tls_get_addrBill Schmidt2014-11-117-125/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get_addr when lowering a TargetGlobalAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping with unhappy results. Additionally, since these weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls. The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp. The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence. The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs. During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method. Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed. There are existing TLS tests to verify the changes haven't messed anything up). I've added one new test that verifies that the problem with the original code has been fixed. llvm-svn: 221703
* Use a 8 bit immediate when possible.Rafael Espindola2014-11-111-2/+14
| | | | | | This fixes pr21529. llvm-svn: 221700
* [X86][ELF] Fix PR20243 - leaf frame pointer bug with TLS accessDario Domizioli2014-11-111-0/+1
| | | | | | | | | | | | | The ISel lowering for global TLS access in PIC mode was creating a pseudo instruction that is later expanded to a call, but the code was not setting the hasCalls flag in the MachineFrameInfo alongside the adjustsStack flag. This caused some functions to be mistakenly recognized as leaf functions, and this in turn affected the decision to eliminate the frame pointer. With the fix, hasCalls is properly set and the leaf frame pointer is correctly preserved. llvm-svn: 221695
* [mips] Add preliminary support for the MIPS II target.Vasileios Kalintiris2014-11-113-5/+13
| | | | | | | | | | | | | | | | | | | Summary: This patch enables code generation for the MIPS II target. Pre-Mips32 targets don't have the MUL instruction, so we add the correspondent pattern that uses the MULT/MFLO combination in order to retrieve the product. This is WIP as we don't support code generation for select nodes due to the lack of conditional-move instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6150 llvm-svn: 221686
* [mips] Add hardware register name "hwr_ulr" ($29)Vasileios Kalintiris2014-11-111-0/+1
| | | | | | | | | | The canonical name when printing assembly is still $29. The reason is that GAS does not accept "$hwr_ulr" at the moment. This addresses the comments from r221307, which reverted the original commit r221299. llvm-svn: 221685
* [X86] Add missing check for 'isINSERTPSMask' in method 'isShuffleMaskLegal'.Andrea Di Biagio2014-11-111-1/+2
| | | | | | This helps the DAGCombiner to identify more opportunities to fold shuffles. llvm-svn: 221684
* Recommit "[mips] Add names and tests for the hardware registers"Vasileios Kalintiris2014-11-112-2/+37
| | | | | | | The original commit r221299 was reverted in r221307. I removed the name "hrw_ulr" ($29) from the original commit because two tests were failing. llvm-svn: 221681
* Use uint64_t as the type for the X86 TSFlag format enum. Allows removal of ↵Craig Topper2014-11-112-61/+62
| | | | | | the VEXShift hack that was used to access the higher bits of TSFlags. llvm-svn: 221673
* [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSextMichael Kuperstein2014-11-111-0/+1
| | | | | | | | | This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Recommitting - This time, with a hopefully working test. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221672
* [NVPTX] Remove dead code in NVPTXTargetTransformInfo (NFC)Jingyue Wu2014-11-111-12/+2
| | | | llvm-svn: 221668
* MCAsmParserExtension has a copy of the MCAsmParser. Use it.Rafael Espindola2014-11-115-53/+176
| | | | | | Base classes were storing a second copy. llvm-svn: 221667
* [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 ifQuentin Colombet2014-11-112-8/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AVX2 is available. According to IACA, the new lowering has a throughput of 8 cycles instead of 13 with the previous one. Althought this lowering kicks in some SPECs benchmarks, the performance improvement was within the noise. Correctness testing has been done for the whole range of uint32_t with the following program: uint4 v = (uint4) {0,1,2,3}; uint32_t i; //Check correctness over entire range for uint4 -> float4 conversion for( i = 0; i < 1U << (32-2); i++ ) { float4 t = test(v); float4 c = correct(v); if( 0xf != _mm_movemask_ps( t == c )) { printf( "Error @ %vx: %vf vs. %vf\n", v, c, t); return -1; } v += 4; } Where "correct" is the old lowering and "test" the new one. The patch adds a test case for the two custom lowering instruction. It also modifies the vector cost model, which is why cast.ll and uitofp.ll are modified. 2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7 instructions instead of 4 (3 more constant loads). rdar://problem/18153096> llvm-svn: 221657
* Reverting r221626 due to a too-strict test.Michael Kuperstein2014-11-101-1/+0
| | | | llvm-svn: 221629
* [AArch64][FastISel] Fix kill flags for integer extends.Juergen Ributzka2014-11-101-0/+8
| | | | | | | | | In the case we optimize an integer extend away and replace it directly with the source register, we also have to clear all kill flags at all its uses. This is necessary, because the orignal IR instruction might be trivially dead, but we replaced it with a nop at MI level. llvm-svn: 221628
* [X86] Fix pattern match for 32-to-64-bit zext in the presence of AssertSextMichael Kuperstein2014-11-101-0/+1
| | | | | | | | | This fixes an issue with matching trunc -> assertsext -> zext on x86-64, which would not zero the high 32-bits. See PR20494 for details. Differential Revision: http://reviews.llvm.org/D6128 llvm-svn: 221626
* [NVPTX] Add an NVPTX-specific TargetTransformInfoJingyue Wu2014-11-105-12/+115
| | | | | | | | | | | | | | | | | | | | Summary: It currently only implements hasBranchDivergence, and will be extended in later diffs. Split from D6188. Test Plan: make check-all Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6195 llvm-svn: 221619
* Misc style fixes. NFC.Rafael Espindola2014-11-1011-353/+275
| | | | | | | | | | | | | This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615
* [mips][microMIPS] Fix issue with delay slot filler and microMIPSZoran Jovanovic2014-11-101-11/+19
| | | | | | Differential Revision: http://reviews.llvm.org/D6193 llvm-svn: 221612
* [mips] Fix sret arguments for N32/N64 which were accidentally broken in r221534.Daniel Sanders2014-11-101-0/+1
| | | | llvm-svn: 221604
* R600: Remove unused defineMatt Arsenault2014-11-071-2/+0
| | | | llvm-svn: 221543
* [mips] Promote i32 arguments to i64 for the N32/N64 ABI and fix <64-bit ↵Daniel Sanders2014-11-075-43/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structs... Summary: ... and after all that refactoring, it's possible to distinguish softfloat floating point values from integers so this patch no longer breaks softfloat to do it. Remove direct handling of i32's in the N32/N64 ABI by promoting them to i64. This more closely reflects the ABI documentation and also fixes problems with stack arguments on big-endian targets. We now rely on signext/zeroext annotations (already generated by clang) and the Assert[SZ]ext nodes to avoid the introduction of unnecessary sign/zero extends. It was not possible to convert three tests to use signext/zeroext. These tests are bswap.ll, ctlz-v.ll, ctlz-v.ll. It's not possible to put signext on a vector type so we just accept the sign extends here for now. These tests don't pass the vectors the same way clang does (clang puts multiple elements in the same argument, these map 1 element to 1 argument) so we don't need to worry too much about it. With this patch, all known N32/N64 bugs should be fixed and we now pass the first 10,000 tests generated by ABITest.py. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6117 llvm-svn: 221534
* [mips] Removed the remainder of MipsCC. NFC.Daniel Sanders2014-11-072-39/+24
| | | | | | | | | | | | | | | | | Summary: One of the calls to AllocateStack (the one in LowerCall) doesn't look like it should be there but it was there before and removing it breaks the frame size calculation. Reviewers: vmedic, theraven Reviewed By: theraven Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6116 llvm-svn: 221529
* [mips] Remove MipsCC::reservedArgArea() in favour of ↵Daniel Sanders2014-11-074-18/+29
| | | | | | | | | | | | | | | | MipsABIInfo::GetCalleeAllocdArgSizeInBytes(). NFC. Summary: Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6115 llvm-svn: 221528
* MipsCCState.h: Use LLVM_DELETED_FUNCTION for msc17.NAKAMURA Takumi2014-11-071-2/+2
| | | | llvm-svn: 221527
* [mips] Move MipsCCState to a separate file and clang-formatted it.Daniel Sanders2014-11-074-199/+260
| | | | | | | | | | | | | | Summary: Depends on D6113 Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6114 llvm-svn: 221525
* [mips] Fix unused variable warnings introduced in r221521Daniel Sanders2014-11-071-9/+0
| | | | llvm-svn: 221522
* [mips] Remove remaining use of MipsCC::intArgRegs() in favour of ↵Daniel Sanders2014-11-074-15/+18
| | | | | | | | | | | | | | | | MipsABIInfo::GetByValArgRegs() and MipsABIInfo::GetVarArgRegs() Summary: Depends on D6112 Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6113 llvm-svn: 221521
* [mips] Remove MipsCC::getRegVT(). NFCDaniel Sanders2014-11-072-22/+0
| | | | | | | | | | | | | | Summary: It's no longer used. Reviewers: vmedic, theraven Reviewed By: theraven Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6112 llvm-svn: 221519
* [mips] Remove MipsCC::analyzeCallOperands in favour of ↵Daniel Sanders2014-11-073-48/+39
| | | | | | | | | | | | | | | | | | CCState::AnalyzeCallOperands. NFC Summary: In addition to the usual f128 workaround, it was also necessary to provide a means of accessing ArgListEntry::IsFixed. Reviewers: theraven, vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6111 llvm-svn: 221518
* [mips] Move SpecialCallingConv to MipsCCState and use it from ↵Daniel Sanders2014-11-073-46/+48
| | | | | | | | | | | | | | | | tablegen-erated code. NFC Summary: In the long run, it should probably become a calling convention in its own right but for now just move it out of MipsISelLowering::analyzeCallOperands() so that we can drop this function in favour of CCState::AnalyzeCallOperands(). Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6085 llvm-svn: 221517
* [mips] Removed IsVarArg from MipsISelLowering::analyzeCallOperands(). NFC.Daniel Sanders2014-11-072-8/+6
| | | | | | | | | | | Summary: CCState objects already carry this information in their isVarArg() method. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6084 llvm-svn: 221516
* [AArch64] Keep flags on condition vreg when instantiating a CB branch.Ahmed Bougacha2014-11-071-1/+2
| | | | | | | | | | Reversing a CB* instruction used to drop the flags on the condition. On the included testcase, this lead to a read from an undefined vreg. Using addOperand keeps the flags, here <undef>. Differential Revision: http://reviews.llvm.org/D6159 llvm-svn: 221507
* [X86][SSE] Vector integer/float conversion memory folding (cvttps2dq / ↵Simon Pilgrim2014-11-061-2/+10
| | | | | | | | | | | | cvttpd2dq) Fixed an issue with the (v)cvttps2dq and (v)cvttpd2dq instructions being incorrectly put in the 2 source operand folding tables instead of the 1 source operand and added the missing SSE/AVX versions. Also added missing (v)cvtps2dq and (v)cvtpd2dq instructions to the folding tables. Differential Revision: http://reviews.llvm.org/D6001 llvm-svn: 221489
* [X86] Add VFMADDSUB cases for the 213->231 custom inserter.Ahmed Bougacha2014-11-061-0/+9
| | | | | | Also add tests for vfmadd/vfmsub. llvm-svn: 221488
* [X86] Add missing FMA3 VFMADDSUB in the emitter.Ahmed Bougacha2014-11-061-0/+8
| | | | | | Also reuse the fma4 intrinsic test to cover fma3 instructions too. llvm-svn: 221487
* [Hexagon] Adding basic Hexagon ELF object emitter.Colin LeMahieu2014-11-065-4/+182
| | | | llvm-svn: 221465
* Clean up NVPTXLowerStructArgs.cpp. NFCEli Bendersky2014-11-061-54/+37
| | | | | | | | | * Remove unnecessary const_casts and C-style casts * Simplify attribute access code * Simplify ArrayRef creation * 80-col and clang-format llvm-svn: 221464
* [mips] Removed IsSoftFloat from MipsISelLowering::analyzeCallOperands(). NFCDaniel Sanders2014-11-062-6/+5
| | | | | | | | | | | | | | | | | Summary: It isn't used anymore. Depends on D6081 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6083 llvm-svn: 221463
OpenPOWER on IntegriCloud