summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add patterns for combining movss+uint_to_fp into the intrinsic ↵Craig Topper2018-05-131-0/+40
| | | | | | | | instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205
* [X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.Craig Topper2018-05-131-4/+0
| | | | llvm-svn: 332198
* AMDGPU: Rename OpenCL lowering pass to be R600 specific.Matt Arsenault2018-05-134-10/+11
| | | | | | | | | | | | | | | | | | This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196
* AMDGPU: Make undef legal for v2i16/v2f16Matt Arsenault2018-05-131-0/+3
| | | | | | | This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195
* [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic ↵Craig Topper2018-05-132-0/+60
| | | | | | instructions. llvm-svn: 332189
* [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.Craig Topper2018-05-131-13/+7
| | | | llvm-svn: 332187
* [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what ↵Craig Topper2018-05-122-25/+13
| | | | | | clang has used for a very long time. llvm-svn: 332186
* [x86] Remove a comment obviated by r330269. Should have deleted theChandler Carruth2018-05-121-5/+0
| | | | | | | | comment in the same revision but missed it. Thanks to Dimitry Andric for catching this! llvm-svn: 332177
* Clear converters map after X86 Domain Reassignment to avoid crashesDimitry Andric2018-05-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: As reported in PR37264, in some cases the X86 Domain Reassignment `runOnMachineFunction()` is called twice. Because it only deletes the `.second` members of its `InstrConverterBaseMap`, and does not clean up the map itself, this can lead to double frees and crashes. Use `DeleteContainerSeconds()` instead, so the `Converters` map can safely be reinitialized and its members re-deleted for each X86 Domain Reassignment pass. Reviewers: guyblank, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46425 llvm-svn: 332176
* [X86] Add WriteFCMOV scheduler class for x87 CMOVsSimon Pilgrim2018-05-1211-16/+15
| | | | llvm-svn: 332173
* [mips] Initialize the long branch pass for testing purposesSimon Dardis2018-05-123-2/+8
| | | | llvm-svn: 332172
* [X86] Remove some unused masked conversion intrinsics that can be replaced ↵Craig Topper2018-05-121-18/+0
| | | | | | | | with an older intrinsic and a select. This is what clang already uses. llvm-svn: 332170
* [AMDGPU] Fix amdgpu-waves-per-eu accounting in schedulerStanislav Mekhanoshin2018-05-122-3/+7
| | | | | | | | | | We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166
* AMDGPU/GlobalISel: Implement select() for >32-bit G_STORETom Stellard2018-05-112-1/+28
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154
* AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.Changpeng Fang2018-05-111-1/+0
| | | | | | | | | | | | | Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147
* [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer ↵Craig Topper2018-05-111-22/+0
| | | | | | used by clang. llvm-svn: 332146
* [AMDGPU] Fix compilation failure when IR contains comdatYaxun Liu2018-05-111-2/+0
| | | | | | | | | | | | | | | | | Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137
* [X86][BtVer2] Model ymm move as double pumped instructionsSimon Pilgrim2018-05-111-7/+7
| | | | | | We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions llvm-svn: 332109
* [RISCV] Support .option rvc and norvc assembler directivesAlex Bradbury2018-05-116-2/+123
| | | | | | | | | | These directives allow the 'C' (compressed) extension to be enabled/disabled within a single file. Differential Revision: https://reviews.llvm.org/D45864 Patch by Kito Cheng llvm-svn: 332107
* [X86][MMX] Tag MMX Move/Load/Store as WriteVec schedule classesSimon Pilgrim2018-05-115-17/+5
| | | | | | Fixes an issue on SLM/Btver2 where we had instructions were being treated as scalar loads/stores llvm-svn: 332104
* [AArch64] Fix performPostLD1Combine to check for constant lane index.Geoff Berry2018-05-111-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103
* [mips] Rename Filler to MipsDelaySlotFiller and initialize the passSimon Dardis2018-05-113-26/+44
| | | | llvm-svn: 332102
* [mips] Enable disassembly of fused (negative) multiply add/sub instructionsSimon Dardis2018-05-112-37/+45
| | | | | | | | Reviewers: atanasyan, smaksimovic, abeserminji Differential Revision: https://reviews.llvm.org/D46392 llvm-svn: 332097
* [X86][SLM] Vector stores only use the MEC port.Simon Pilgrim2018-05-111-10/+10
| | | | | | | | Confirmed by both Agner and Intel's AOM - the IEC/FPC are not required for pure load/stores (even if its a partial update). Can't fix WriteStore until all RMW instructions are cleaned up though.... llvm-svn: 332096
* [X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector widthSimon Pilgrim2018-05-1110-80/+137
| | | | | | Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions llvm-svn: 332094
* [X86] Added scheduler helper classes to split move/load/store by sizeSimon Pilgrim2018-05-114-198/+261
| | | | | | Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves. llvm-svn: 332090
* AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUITom Stellard2018-05-113-0/+18
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082
* [X86] Remove and autoupgrade the avx512.mask.store.ss intrinsic.Craig Topper2018-05-111-4/+0
| | | | llvm-svn: 332079
* [STLExtras] Add distance() for ranges, pred_size(), and succ_size()Vedant Kumar2018-05-101-2/+1
| | | | | | | | | | | This commit adds a wrapper for std::distance() which works with ranges. As it would be a common case to write `distance(predecessors(BB))`, this also introduces `pred_size()` and `succ_size()` helpers to make that easier to write. Differential Revision: https://reviews.llvm.org/D46668 llvm-svn: 332057
* [WebAssembly] Initial Disassembler.Sam Clegg2018-05-105-14/+139
| | | | | | | | | | | | | | | | | | | | | This implements a new table-gen emitter to create tables for a wasm disassembler, and a dissassembler to use them. Comes with 2 tests, that tests a few instructions manually. Is also able to disassemble large .wasm files with objdump reasonably. Not working so well, to be addressed in followups: - objdump appears to be passing an incorrect starting point. - since the disassembler works an instruction at a time, and it is disassembling stack instruction, it has no idea of pseudo register assignments. These registers are required for the instruction printing code that follows. For now, all such registers appear in the output as $0. Patch by Wouter van Oortmerssen Differential Revision: https://reviews.llvm.org/D45848 llvm-svn: 332052
* [X86] Add new patterns for masked scalar load/store to match clang's codegen ↵Craig Topper2018-05-101-0/+117
| | | | | | | | | | | | from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049
* AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>Tom Stellard2018-05-102-0/+21
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042
* AMDGPU/GlobalISel: Enable TableGen'd instruction selectorTom Stellard2018-05-107-4/+132
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039
* [X86] Initialize HasPTWRITE member of X86SubtargetGabor Buella2018-05-101-0/+1
| | | | | | | This was missing from r331961. Caught by sanitizer bots. llvm-svn: 332024
* [X86] Convert/Merge more instregex patterns to reduce InstrRW compile time.Simon Pilgrim2018-05-106-243/+162
| | | | | | Use instrs lists or merge multiple instregex patterns. llvm-svn: 332022
* [CGP] Split large data structres to sink more GEPsHaicheng Wu2018-05-102-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015
* [X86][Znver1] Remove unnecessary SchedWritePMULLD InstRW overrides.Simon Pilgrim2018-05-101-17/+2
| | | | llvm-svn: 332006
* [X86][SNB] Fix typo in PEXTRDmr instregex, was missing VPEXTRDmr.Simon Pilgrim2018-05-101-4/+2
| | | | llvm-svn: 332002
* [X86] Split ↵Simon Pilgrim2018-05-1012-341/+136
| | | | | | | | WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999
* [mips] Accept 32-bit offsets for ld/sd/lld commandsSimon Atanasyan2018-05-102-4/+4
| | | | | | | | | | | | This is a follow up to the rL330983. The patch teaches ld, sd, and lld commands accept 32-bit memory offsets by replacing `mem_simm16` operand to `mem_simmptr`. In fact, these commands should accept 64-bit offsets, but so large offsets require another command expanding and will be supported by a separate patch. Differential Revision: https://reviews.llvm.org/D46629 llvm-svn: 331997
* [mips] Accept 32-bit offsets for lh and lhu commandsSimon Atanasyan2018-05-102-4/+4
| | | | | | | | | | This is a follow up to the rL330983. The patch teaches lh and lhu commands accepts 32-bit memory offsets by replacing `mem_simm16` operand to `mem_simmptr`. Differential Revision: https://reviews.llvm.org/D46513 llvm-svn: 331996
* [x86] fix fmaxnum/fminnum with nnanSanjay Patel2018-05-101-9/+13
| | | | | | | | | | | | | | | | | | | | | | With nnan, there's no need for the masked merge / blend sequence (that probably costs much more than the min/max instruction). Somewhere between clang 5.0 and 6.0, we started producing these intrinsics for fmax()/fmin() in C source instead of libcalls or fcmp/select. The backend wasn't prepared for that, so we regressed perf in those cases. Note: it's possible that other targets have similar problems as seen here. Noticed while investigating PR37403 and related bugs: https://bugs.llvm.org/show_bug.cgi?id=37403 The IR FMF propagation cases still don't work. There's a proposal that might fix those cases in D46563. llvm-svn: 331992
* [mips] Correct the predicates of cvt.fmt.fmt instructionsSimon Dardis2018-05-102-23/+24
| | | | | | | | Reviewers: atanasyan, smaksimovic, abeserminji Differential Revision: https://reviews.llvm.org/D46390 llvm-svn: 331969
* [X86] ptwrite intrinsicGabor Buella2018-05-104-13/+25
| | | | | | | | | | Reviewers: craig.topper, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D46539 llvm-svn: 331961
* [NVPTX] Added a feature to use short pointers for const/local/shared AS.Artem Belevich2018-05-098-61/+108
| | | | | | | | | | | | Const/local/shared address spaces are all < 4GB and we can always use 32-bit pointers to access them. This has substantial performance impact on kernels that uses shared memory for intermediary results. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46147 llvm-svn: 331941
* [ARM] Add support for SETCCCARRY instead of SETCCEAmaury Sechet2018-05-091-5/+12
| | | | | | | | | | | | Summary: As per title. SETCCE is deprecated and will eventually be removed. Reviewers: rogfer01, efriedma, rengolin, javed.absar Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D46512 llvm-svn: 331929
* [AMDGPU] Support horizontal vectorization of min/max.Farhana Aleen2018-05-093-1/+26
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920
* AMDGPU: Ignore any_extend in mul24 combineMatt Arsenault2018-05-091-0/+11
| | | | | | | | | | If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919
* [Hexagon] Add patterns for vector shift-and-accumulateKrzysztof Parzyszek2018-05-091-0/+5
| | | | llvm-svn: 331918
* AMDGPU: Handle partial shift reduction for variable shiftsMatt Arsenault2018-05-091-15/+22
| | | | | | | If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917
OpenPOWER on IntegriCloud