summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-1613-45/+166
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* [CostModel][X86][XOP] Improve costs for XOP shufflesSimon Pilgrim2017-08-161-0/+22
| | | | | | VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions llvm-svn: 311005
* [mips] Handle variables with an explicit section and interactions with ↵Simon Dardis2017-08-161-0/+16
| | | | | | | | | | | | | | | | | | .sdata, .sbss If a variable has an explicit section such as .sdata or .sbss, it is placed in that section and accessed in a gp relative manner. This overrides the global -G setting. Otherwise if a variable has a explicit section attached to it, such as '.rodata' or '.mysection', it is not placed in the small data section. This also overrides the global -G setting. Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D36616 llvm-svn: 311001
* [ARM] Improve loop unrolling for Cortex-MSam Parker2017-08-161-6/+19
| | | | | | | | | | | - Set the default runtime unroll count to 4 and use the newly added UnrollRemainder option. - Create loop cost and force unroll for a cost less than 12. - Disable unrolling on Thumb1 only targets. Differential Revision: https://reviews.llvm.org/D36134 llvm-svn: 310997
* [AMDGPU] Eliminate no effect instructions before s_endpgmStanislav Mekhanoshin2017-08-161-3/+63
| | | | | | Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987
* Reapply "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-158-200/+75
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310425, thus reapplying r310335 with a fix for link issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON. Original commit message: [GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. ---- The fix for the link issue consists in adding the GlobalISel library in the list of dependencies for the AArch64 unittests. This dependency comes from the use of AArch64Subtarget that needs to know how to destruct the GISel related APIs when being detroyed. Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and understand the problem. llvm-svn: 310969
* Revert r310919 - [globalisel][tablegen] Support zero-instruction emission.Daniel Sanders2017-08-151-11/+1
| | | | | | | | | | As expected, this failed on the windows bots but the instrumentation showed something interesting. The ADD8ri and INC8r rules are never directly compared on the windows machines. That implies that the issue lies in transitivity of the Compare predicate. I believe I've already verified that but maybe I missed something. llvm-svn: 310922
* Re-commit with some instrumentation: [globalisel][tablegen] Support ↵Daniel Sanders2017-08-151-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | zero-instruction emission. Summary: Support the case where an operand of a pattern is also the whole of the result pattern. In this case the original result and all its uses must be replaced by the operand. However, register class restrictions can require a COPY. This patch handles both cases by always emitting the copy and leaving it for the register allocator to optimize. The previous commit failed on the windows bots and this one is likely to fail on those same bots. However, the added instrumentation should reveal a particular isHigherPriorityThan() evaluation which I'm expecting to expose that these machines are weighing priority of two rules differently from the non-windows machines. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D36084 llvm-svn: 310919
* [RISCV] Add RISCVInstPrinter and basic MC assembler testsAlex Bradbury2017-08-158-4/+141
| | | | | | | | | With the addition of RISCVInstPrinter, it is now possible to test the basic operation of the RISCV MC layer. Differential Revision: https://reviews.llvm.org/D23564 llvm-svn: 310917
* [MIPS] Implement support for -mstack-alignment.John Baldwin2017-08-146-16/+32
| | | | | | | | | | | | | | | | | | | | | Summary: This is modeled on the implementation for x86 which stores the command line option in a 'StackAlignOverride' field in MipsSubtarget and then uses this to compute a 'stackAlignment' value in MipsSubtarget::initializeSubtargetDependencies. The stackAlignment() method in MipsSubTarget is renamed to getStackAlignment() and returns the computed 'stackAlignment'. Reviewers: sdardis Reviewed By: sdardis Subscribers: llvm-commits, arichardson Differential Revision: https://reviews.llvm.org/D35874 llvm-svn: 310891
* IPRA: Allow target to enable IPRA by defaultMatt Arsenault2017-08-141-6/+0
| | | | llvm-svn: 310876
* [PowerPC] Add codegen for VSX word extract convert to FPLei Huang2017-08-141-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add codegen for VSX word extract conversion from signed/unsigned to single/double precision. For UINT_TO_FP: Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239. Here we will add the missing extract integer and conversion to double. This utilizes the new P9 instruction xxextractuw to extracting an integer element when the result will be converted to double thereby saving 2 direct moves (VSR <-> GPR). For SINT_TO_FP: We will implement the following sequence which will also reduce the number of instructions by saving 2 direct moves. v4i32->f32: xxspltw xvcvsxwsp xscvspdpn v4i32->f64: xxspltw xvcvsxwdp Differential Revision: https://reviews.llvm.org/D35859 llvm-svn: 310866
* Revert "Reland "[mips][mt][6/7] Add support for mftr, mttr instructions.""Simon Dardis2017-08-147-375/+1
| | | | | | | This reverts r310834. It didn't pacify the buildbot, FileCheck is still crashing. llvm-svn: 310854
* [x86] fold the mask op on 8- and 16-bit rotatesSanjay Patel2017-08-141-3/+39
| | | | | | | | | | | | | | | | | | | | | | | Ref the post-commit thread for r310770: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170807/478507.html The motivating cases as 'C' source examples can look like this: unsigned char rotate_right_8(unsigned char v, int shift) { // shift &= 7; v = ( v >> shift ) | ( v << ( 8 - shift ) ); return v; } https://godbolt.org/g/K6rc1A Notice that the source doesn't contain UB-safe masked shift amounts, but instcombine created those in order to produce narrow rotate patterns. This should be the last step needed to resolve PR34046: https://bugs.llvm.org/show_bug.cgi?id=34046 Differential Revision: https://reviews.llvm.org/D36644 llvm-svn: 310849
* [X86] Fix a place that was mishandling X86ISD::UMUL.Craig Topper2017-08-141-1/+1
| | | | | | | | According to the X86ISelLowering.h, UMUL results are low, high, and flags. But this place was treating result 1 or 2 as flags. Differential Revision: https://reviews.llvm.org/D36654 llvm-svn: 310846
* [X86] Remove flag setting ISD nodes from computeKnownBitsForTargetNodeCraig Topper2017-08-141-15/+0
| | | | | | | | | | | | | | | | | Summary: The flag result is an i32 type. But its only really used for connectivity. I don't think anything even assumes a particular format. We don't ever do any real operations on it. So known bits don't help us optimize anything. My main motivation is that the UMUL behavior is actually wrong. I was going to fix this in D36654, but then realized there was just no reason for it to be here. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36657 llvm-svn: 310845
* [AVX512] Make the itinerary parameter actually pass through the the ↵Craig Topper2017-08-141-2/+2
| | | | | | | | | | | | | | | | AVX512_maskable_common multiclass Summary: This looks to have been disconnected about 3 years ago in r219358. Reviewers: gadi.haber, RKSimon, zvi Reviewed By: gadi.haber Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36658 llvm-svn: 310844
* [AVX512] Remove leftover code for when i1 was a legal type from the fast ↵Craig Topper2017-08-141-14/+0
| | | | | | | | | | | | | | | | | | | isel load/store code. Summary: I don't think we need this code anymore. It only existed because i1 used to be legal. There's probably more unneeded code in fast isel still. Reviewers: guyblank, zvi Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36652 llvm-svn: 310843
* Reland "[mips][mt][6/7] Add support for mftr, mttr instructions."Simon Dardis2017-08-147-1/+375
| | | | | | | | | | | | | | | | | | | | This adjusts the tests to hopfully pacify the llvm-clang-x86_64-expensive-checks-win buildbot. Unlike many other instructions, these instructions have aliases which take coprocessor registers, gpr register, accumulator (and dsp accumulator) registers, floating point registers, floating point control registers and coprocessor 2 data and control operands. For the moment, these aliases are treated as pseudo instructions which are expanded into the underlying instruction. As a result, disassembling these instructions shows the underlying instruction and not the alias. Reviewers: slthakur, atanasyan Differential Revision: https://reviews.llvm.org/D35253 llvm-svn: 310834
* [AArch64] Remove unused MC functionSam Parker2017-08-141-18/+0
| | | | | | | | | | | | An unused function warning was raised in https://bugs.llvm.org/show_bug.cgi?id=34178. The offending function, in AArch64MCCodeEmitter.cpp, was committed by me last week. Differential Revision: https://reviews.llvm.org/D36665 llvm-svn: 310823
* [AVX-512] Add hasSideEffects = 0 to the 8-bit and 16-bit register broadcasts.Craig Topper2017-08-141-1/+1
| | | | llvm-svn: 310813
* [X86] Remove unused argument from the vextract_for_size multiclass. NFCCraig Topper2017-08-141-14/+7
| | | | llvm-svn: 310812
* [AVX512] Remove comment I should have removed in r310808. NFCCraig Topper2017-08-141-3/+0
| | | | llvm-svn: 310811
* [PowerPC] Revert r310346 (and followups r310356 & r310424) whichChandler Carruth2017-08-141-132/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | introduce a miscompile bug. There appears to be a bug where the generated code to extract the sign bit doesn't work correctly for 32-bit inputs. I've replied to the original commit pointing out the problem. I think I see by inspection (and reading the manual for PPC) how to fix this, but I can't be 100% confident and I also don't know what the best way to test this is. Currently it seems nearly impossible to get the backend to hit this code path, but the patch autohr is likely in a better position to craft such test cases than I am, and based on where the bug is it should be easily done. Original commit message for r310346: """ [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds the handling for the special case where RHS == 0. Differential Revision: https://reviews.llvm.org/D34048 """ llvm-svn: 310809
* [AVX512] Simplify the instruction defintion for VEXTRACT. NFCICraig Topper2017-08-141-33/+16
| | | | | | The comment about why we couldn't use avx512_maskable appears to have been incorrect. llvm-svn: 310808
* [ARM] Tidy-up Cortex-A15 DPR-SPR optimizer implementationJaved Absar2017-08-141-27/+12
| | | | | | | | | Modernise the code with range-loops etc Reviewed by: @fhahn, @rovka Differential Revision: https://reviews.llvm.org/D36502 llvm-svn: 310807
* [X86] Fix typo from r310794. Index = 0 should have been Index == 0.Craig Topper2017-08-131-2/+2
| | | | llvm-svn: 310801
* [X86] Remove unused pattern fragment that referenced MVT::i1. NFCCraig Topper2017-08-131-5/+0
| | | | llvm-svn: 310799
* [COFF, ARM64] Use '//' as comment character in assembly files in GNU ↵Martin Storsjo2017-08-133-2/+19
| | | | | | | | | | | | environments This allows using semicolons for bundling up more than one statement per line. This is used within the mingw-w64 project in some assembly files that contain code for multiple architectures. Differential Revision: https://reviews.llvm.org/D36366 llvm-svn: 310797
* [AVX512] Correct isExtractSubvectorCheap so that it will return the correct ↵Craig Topper2017-08-131-1/+7
| | | | | | | | | | | | answers for extracting 128-bits from a 512-bit vector and for mask registers. Previously it would not return true for extracting either of the upper quarters of a 512-bit registers. For mask registers we support extracting anything from index 0. And otherwise we only support extracting the upper half of a register. Differential Revision: https://reviews.llvm.org/D36638 llvm-svn: 310794
* [X86][ARM][TargetLowering] Add SrcVT to isExtractSubvectorCheapCraig Topper2017-08-134-4/+6
| | | | | | | | | | | | | | | | | Summary: Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not. For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors. Reviewers: RKSimon, zvi, efriedma Reviewed By: RKSimon Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D36649 llvm-svn: 310793
* [X86][SandyBridge] Additional updates to the SNB instructions scheduling ↵Gadi Haber2017-08-131-824/+988
| | | | | | | | | | | | | | | | information This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019). In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit. The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080. Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim Differential Revision: https://reviews.llvm.org/D36388 llvm-svn: 310792
* [X86][AsmParser][AVX512] Error appropriately when K0 is tried as a write-maskCoby Tayree2017-08-131-1/+4
| | | | | | | | | K0 isn't expected as a write-mask, so provide a detailed error here, instead of the more generic one (invalid op for insn) Conforms with gas Differential Revision: https://reviews.llvm.org/D36570 llvm-svn: 310789
* [X86][AVX512] Add combine for TESTMGuy Blank2017-08-131-9/+16
| | | | | | | | | | | | Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...). TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...) TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...) Differential Revision: https://reviews.llvm.org/D36536 llvm-svn: 310787
* [X86] Early out of combineInsertSubvector for mask vectors.Craig Topper2017-08-121-1/+6
| | | | | | The combines here shouldn't be done for mask vectors, but it wasn't clear anything was preventing that. llvm-svn: 310786
* [X86] Fix bad comment. NFCCraig Topper2017-08-121-1/+1
| | | | llvm-svn: 310785
* [X86] When handling addcarry intrinsic, create the flag result with the ↵Craig Topper2017-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | correct type so we don't crash if we use a memory instruction Summary: Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result. Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types. We should probably consider this for 5.0 as well. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36645 llvm-svn: 310784
* [Triple] Add isThumb and isARM functions.Florian Hahn2017-08-123-11/+3
| | | | | | | | | | | | | | | | | | | Summary: isThumb returns true for Thumb triples (little and big endian), isARM returns true for ARM triples (little and big endian). There are a few more checks using arm/thumb that are not covered by those functions, e.g. that the architecture is either ARM or Thumb (little endian) or ARM/Thumb little endian only. Reviewers: javed.absar, rengolin, kristof.beyls, t.p.northover Reviewed By: rengolin Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D34682 llvm-svn: 310781
* D36604: PR34148: Do not assume we can use a copy relocation for an ↵Richard Smith2017-08-111-2/+3
| | | | | | | | | | | `external_weak` global An `external_weak` global may be intended to resolve as a null pointer if it's not defined, so it doesn't make sense to use a copy relocation for it. Differential Revision: https://reviews.llvm.org/D36604 llvm-svn: 310773
* [MIPS] Use ABI to determine stack alignment.John Baldwin2017-08-111-1/+3
| | | | | | | | | | | | | | | | Summary: The stack alignment depends on the ABI (16 bytes for N32 and N64 and 8 bytes for O32), not the CPU type. Reviewers: sdardis Reviewed By: sdardis Subscribers: atanasyan, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D36326 llvm-svn: 310768
* [X86] Don't use fsin/fcos/fsincos instructions everCraig Topper2017-08-111-15/+12
| | | | | | | | | | | | | | | | | Summary: Previously we would use these instructions if sse was disabled and fastmath was enabled. As mentioned in D28335, this is a bad idea. Reviewers: efriedma, scanon, DavidKreitzer Reviewed By: DavidKreitzer Subscribers: zvi, llvm-commits Differential Revision: https://reviews.llvm.org/D36344 llvm-svn: 310762
* Fix access to undefined weak symbols in pic codeRafael Espindola2017-08-111-1/+17
| | | | | | | | | | When the access to a weak symbol is not a call, the access has to be able to produce the value 0 at runtime. We were sometimes producing code sequences where that was not possible if the code was leaded more than 4g away from 0. llvm-svn: 310756
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-119-21/+294
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* Revert r310716 (and r310735): [globalisel][tablegen] Support ↵Daniel Sanders2017-08-111-11/+1
| | | | | | | | | | | zero-instruction emission. Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir which should not have been affected by the change. Reverting while I investigate. Also reverted r310735 because it builds on r310716. llvm-svn: 310745
* [mips] clang-format MipsSubtarget.cpp.John Baldwin2017-08-111-3/+3
| | | | | | This only fixes a few things and serves as my initial test commit. llvm-svn: 310742
* [AMDGPU] Fix santizer error after last commitStanislav Mekhanoshin2017-08-111-1/+0
| | | | | | Removed useless assert. llvm-svn: 310738
* [AMDGPU] Ported and adopted AMDLibCalls passStanislav Mekhanoshin2017-08-116-6/+2975
| | | | | | | | | | | | | | | | | | The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 llvm-svn: 310731
* [AVX512] Remove and autoupgrade many of the broadcast intrinsicsCraig Topper2017-08-112-42/+1
| | | | | | | | | | | | | | | | | Summary: This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time. This leaves the 32x2 intrinsics because they are still used in clang. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36606 llvm-svn: 310725
* [x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512Craig Topper2017-08-111-2/+12
| | | | | | | | | | | | | | | | | | | | | Summary: This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size. We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not. I believe this is step towards being able to handle D36454 without a special case. From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner. Reviewers: RKSimon, zvi, delena, jbhateja Reviewed By: RKSimon, jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36601 llvm-svn: 310724
* [x86] use more shift or LEA for select-of-constants (2nd try)Sanjay Patel2017-08-111-63/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous rev (r310208) failed to account for overflow when subtracting the constants to see if they're suitable for shift/lea. This version add a check for that and more test were added in r310490. We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs. 5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310717
OpenPOWER on IntegriCloud