summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Allow any_extend to be combined with setcc on VLX targets.Craig Topper2018-01-261-0/+8
| | | | | | For VLX target getSetccResultType returns vXi1 which prevents the target independent DAG combine from doing this tranform itself. llvm-svn: 323555
* [X86][AVX512] Add combining support for X86ISD::VTRUNCSSimon Pilgrim2018-01-261-7/+45
| | | | | | | | Similar to the existing support for X86ISD::VTRUNCUS. Differential Revision: https://reviews.llvm.org/D42544 llvm-svn: 323553
* [SelectionDAGISel] Add a debug print before call to Select. Adjust where ↵Craig Topper2018-01-2611-38/+0
| | | | | | | | | | | | blank lines are printed during isel process to make things more sensibly grouped. Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table. It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search. There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line. llvm-svn: 323551
* [X86] Add 'rdrnd' feature to silvermont to match recent gcc bug fix.Craig Topper2018-01-261-0/+1
| | | | | | gcc recently fixed this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546 llvm-svn: 323550
* [Hexagon] Fix an incorrect assertion in HexagonConstExtendersKrzysztof Parzyszek2018-01-261-8/+14
| | | | llvm-svn: 323548
* [x86] fix typo in comment; NFCSanjay Patel2018-01-261-1/+1
| | | | llvm-svn: 323545
* [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to ↵Simon Pilgrim2018-01-261-0/+7
| | | | | | | | v4i32/v4f32 Extension to D42431, adding support for v4i32/v4f32 as well as v2i64/v2f64 now that D42308 has landed llvm-svn: 323542
* [X86][SSE] Don't colaesce v4i32 extractsSimon Pilgrim2018-01-261-96/+1
| | | | | | | | | | We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends. This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines. Differential Revision: https://reviews.llvm.org/D42308 llvm-svn: 323541
* [X86][SSE] Drop PMADDWD in lowerMulSimon Pilgrim2018-01-261-7/+0
| | | | | | As mentioned in D42258, we don't need this any more llvm-svn: 323540
* [AMDGPU][MC] Added validation of image dst/data size (must match dmask and tfe)Dmitry Preobrazhensky2018-01-261-0/+61
| | | | | | | | | See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000 Differential Revision: https://reviews.llvm.org/D42483 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323538
* [MIPS] Don't crash on unsized extern types with -mgpoptAlexander Richardson2018-01-261-0/+7
| | | | | | | | | | | | | | Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel. Reviewers: atanasyan, sdardis, emaste Reviewed By: sdardis Subscribers: krytarowski, llvm-commits Differential Revision: https://reviews.llvm.org/D42571 llvm-svn: 323536
* [AMDGPU][MC] Added support of 64-bit image atomicsDmitry Preobrazhensky2018-01-265-17/+115
| | | | | | | | | See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998 Differential Revision: https://reviews.llvm.org/D42469 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323534
* [AMDGPU][MC] Enabled disassembler for image atomic operationsDmitry Preobrazhensky2018-01-261-12/+16
| | | | | | | | | See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988 Differential Revision: https://reviews.llvm.org/D42186 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323527
* [X86] Cleanup SDLoc arguments as mentioned on D42544Simon Pilgrim2018-01-261-6/+7
| | | | llvm-svn: 323526
* [AMDGPU] fix LDS f32 intrinsicsDaniil Fukalov2018-01-262-16/+19
| | | | | | | | | | | | - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516
* [ARM] Accept a subset of Thumb GPR register class when emitting an SP-relativeMomchil Velikov2018-01-261-2/+2
| | | | | | | | | | | | | load instruction The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR` register class. The function serves to emit a `tLDRspi` instruction and certainly any subset of the `tGPR` register class is a valid destination of the load. Differential revision: https://reviews.llvm.org/D42535 llvm-svn: 323514
* [ARM] Armv8.2-A FP16 code generation (part 1/3)Sjoerd Meijer2018-01-269-28/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is *not* supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-266-7/+7
| | | | | | "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508
* [RISCV] Encode RISCV specific ELF e_flags to RISCV Binary by RISCVTargetStreamerShiva Chen2018-01-266-0/+117
| | | | llvm-svn: 323507
* [X86] Remove dead code from LowerBUILD_VECTOR that tried to handle i64 ↵Craig Topper2018-01-261-21/+0
| | | | | | | | element type in 32-bit mode. Type legalization would prevent any i64 operands to the build_vector from existing before we get here. The coverage bots show this code as uncovered. llvm-svn: 323506
* [X86] Remove code from combineBitcastvxi1 that was needed to support the ↵Craig Topper2018-01-261-47/+0
| | | | | | | | previous native IR for kunpck intrinsics. The original autoupgrade for kunpck intrinsics used a bitcasted scalar shift, or, and. This combine would turn this into a concat_vectors. Now the kunpck intrinsics are autoupgraded to a vector shuffle that will become a concat_vectors. llvm-svn: 323504
* [X86] Remove unused intrinsic type handling. NFCCraig Topper2018-01-262-28/+2
| | | | llvm-svn: 323503
* [X86] Simplify condition in VSETCC. NFCCraig Topper2018-01-261-2/+1
| | | | | | This listed all legal 128-bit integer types individually, but since we already know we have a legal type and its integer, we can just check is128BitVector. llvm-svn: 323502
* [X86] Remove LowerVSETCC code for handling vXi1 setcc with vXi8/vXi16 input ↵Craig Topper2018-01-261-6/+3
| | | | | | | | type. NFC These kinds of setccs are promoted by a DAG combine before they ever get to legalization. llvm-svn: 323501
* [X86] Remove some dead code from LowerVSETCC. NFCCraig Topper2018-01-261-13/+0
| | | | | | This code was added in r321967, but ultimately I fixed the issue in the legalizer and this code was no longer required. llvm-svn: 323500
* [X86] Fix killed flag handling in X86FixupLea passSerguei Katkov2018-01-261-1/+2
| | | | | | | | | | | | | | | | When pass creates a MOV instruction for lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst modification it should clean the killed flag for base if base is equal to index. Otherwise verifier complains about usage of killed register in add instruction. Reviewers: lsaba, zvi, zansari, aaboud Reviewed By: lsaba Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42522 llvm-svn: 323497
* [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.Joel Jones2018-01-254-0/+16
| | | | | | | | | | | | | | This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included. Patch by: steleman (Stefan Teleman) Differential Revision: https://reviews.llvm.org/D40696 llvm-svn: 323474
* [X86] Teach Intel syntax InstPrinter to print lock prefixes that have been ↵Craig Topper2018-01-251-2/+2
| | | | | | | | parsed from the asm parser. The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does. llvm-svn: 323469
* [X86] Combine two unnecessarily complicated ifs that had the same body. NFCCraig Topper2018-01-251-3/+1
| | | | llvm-svn: 323468
* [Hexagon] SETEQ and SETNE are valid integer condition codesKrzysztof Parzyszek2018-01-251-1/+2
| | | | llvm-svn: 323452
* [X86] Apply clang-format to detectUSatPattern. NFCI.Simon Pilgrim2018-01-251-5/+4
| | | | | | Cleanup from D42544 llvm-svn: 323439
* Revert "[Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing code"Krzysztof Parzyszek2018-01-252-22/+16
| | | | | | This reverts r323374. The fix needs a different approach. llvm-svn: 323438
* [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency ↵Craig Topper2018-01-255-95/+95
| | | | | | | | | | | | to some of them in SkylakeClient model. The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results. This changes them all to use explicit instrs instead. While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too. llvm-svn: 323406
* [X86] Expand IMUL/MUL instregexs in Znver1 scheduler to show what's actually ↵Craig Topper2018-01-251-22/+16
| | | | | | | | | | implemented. The IMUL instruction names mixed with the prefix matching of the instregex lead to some strange matches. The worst being that several memory instructions are using the register form latency. I don't know what the right answer is, so I've left TODOs and will try to work with the AMD folks to get this cleaned up. llvm-svn: 323405
* [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFCCraig Topper2018-01-258-13/+13
| | | | llvm-svn: 323403
* [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall ↵Craig Topper2018-01-259-400/+400
| | | | | | | | | | consistency. NFC MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation. SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128. AVX2 instructions should use a Y to indicate 256-bits. llvm-svn: 323402
* [X86] Remove unnecessary '_alt' and '_Int' from scheduler model regular ↵Craig Topper2018-01-255-18/+18
| | | | | | | | expressions. These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark. llvm-svn: 323401
* [Hexagon] Replace EmitFunctionEntryCode with a DAG preprocessing codeKrzysztof Parzyszek2018-01-242-16/+22
| | | | | | | | | The code in EmitFunctionEntryCode needs to know the maximum stack alignment, but it runs very early in the selection process (before lowering). The final stack alignment may change during lowering, so the code needs to be moved to where the alignment is known. llvm-svn: 323374
* [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.Amara Emerson2018-01-241-0/+6
| | | | | | | | | | | | | | | The tablegen imported patterns for sext(load(a)) don't check for single uses of the load or delete the original after matching. As a result two loads are left in the generated code. This particular issue will be fixed by adding support for a G_SEXTLOAD opcode in future. There are however other potential issues around this that wouldn't be fixed by a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile loads at all in the AArch64 selector. Fixes/works around PR36018. llvm-svn: 323371
* [X86][SSE] Aggressively use PMADDWD for v4i32 multiplies with 17 or more ↵Simon Pilgrim2018-01-241-8/+20
| | | | | | | | | | | | leading zeros As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD: On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it. On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this. Differential Revision: https://reviews.llvm.org/D42258 llvm-svn: 323367
* [AMDGPU] Make sure all super regs of reserved regs are marked reserved.Geoff Berry2018-01-247-29/+30
| | | | | | | | | | | | | | | | | | | | | Summary: Move reserveRegisterTuples into AMDGPURegisterInfo and use it in R600RegisterInfo::getReservedRegs and R600InstrInfo::reserveIndirectRegisters to ensure that all super registers of reserved registers are also marked as reserved. Before this change, under certain circumstances, the registers %t1_x and %t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not be, leading to the register allocator sometimes assigning a register to %t1_xy, which is invalid since %t1_x is reserved. Reviewers: arsenm, tstellar, MatzeB, qcolombet Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D42448 llvm-svn: 323356
* [ARM] Expand long shifts for Thumb1 to __aeabi_ callsWeiming Zhao2018-01-241-0/+7
| | | | | | | | | | | | | | Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1. Reviewers: samparker Reviewed By: samparker Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42401 llvm-svn: 323354
* [X86] Fix some inconsistencies in the itineraries and Sched for ↵Craig Topper2018-01-242-6/+6
| | | | | | | | (V)PEXTRW/(V)PINSRW The weirdest being that PEXTRWrr was tagged as a memory operation. llvm-svn: 323353
* [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 ↵Craig Topper2018-01-249-82/+74
| | | | | | for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI llvm-svn: 323352
* [X86] Remove '(_REV)?' from a bunch of scheduler regular expressions. NFCCraig Topper2018-01-245-195/+193
| | | | | | The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark. llvm-svn: 323351
* [Hexagon] Run late copy propagation and dead code elimination passesKrzysztof Parzyszek2018-01-241-0/+5
| | | | llvm-svn: 323346
* [AArch64] Avoid unnecessary vector byte-swapping in big-endianPablo Barrio2018-01-241-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Loads/stores of some NEON vector types are promoted to other vector types with different lane sizes but same vector size. This is not a problem in little-endian but, when in big-endian, it requires additional byte reversals required to preserve the lane ordering while keeping the right endianness of the data inside each lane. For example: %1 = load <4 x half>, <4 x half>* %p results in the following assembly: ld1 { v0.2s }, [x1] rev32 v0.4h, v0.4h This patch changes the promotion of these loads/stores so that the actual vector load/store (LD1/ST1) takes care of the endianness correctly and there is no need for further byte reversals. The previous code now results in the following assembly: ld1 { v0.4h }, [x1] Reviewers: olista01, SjoerdMeijer, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D42235 llvm-svn: 323325
* [Hexagon] Remove unused HexagonISD opcodes, NFCKrzysztof Parzyszek2018-01-244-28/+5
| | | | llvm-svn: 323324
* [X86][SSE] Avoid calls to combineX86ShufflesRecursively that can't combine ↵Simon Pilgrim2018-01-241-9/+14
| | | | | | | | | | | | to target shuffles (PR32037) Don't bother making recursive calls to combineX86ShufflesRecursively if we have more shuffle source operands than will be combined together with the remaining recursive depth. See https://bugs.llvm.org/show_bug.cgi?id=32037#c26 and https://bugs.llvm.org/show_bug.cgi?id=32037#c27 for the reduction in compile times from this patch. Differential Revision: https://reviews.llvm.org/D42378 llvm-svn: 323320
* [ARM] Call __chkstk for dynamic stack allocation in all windows environmentsMartin Storsjo2018-01-241-2/+2
| | | | | | | | | | | | | | | This matches what MSVC does for alloca() function calls on ARM. Even if MSVC doesn't support VLAs at the language level, it does support the alloca function. On the clang level, both the _alloca() (when emulating MSVC, which is what the alloca() function expands to) and __builtin_alloca() builtin functions, and VLAs, map to the same LLVM IR "alloca" function - so within LLVM they're not distinguishable from each other. Differential Revision: https://reviews.llvm.org/D42292 llvm-svn: 323308
OpenPOWER on IntegriCloud