summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly][NFC] Remove WebAssemblyStackifier TableGen backendThomas Lively2018-10-225-32/+36
| | | | | | | | | | | | | | | | Summary: Replace its functionality with a TableGen InstrInfo relational instruction mapping. Although arguably more complex than the TableGen backend, the relational mapping is a smaller maintenance burden than a TableGen backend. Reviewers: aardappel, aheejin, dschuff Subscribers: mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53307 llvm-svn: 344962
* X86: add alias for pushfw/popfw in Intel modeTim Northover2018-10-221-0/+4
| | | | | | | | | A while ago we changed pushf and popf in Intel mode to generate pushfq and popfq. Unfortunately that left us with no way to get the 16-bit encoding in Intel mode so this patch adds pushfw and popfw as aliases there. llvm-svn: 344949
* Revert rL344931 from llvm/trunk: [X86][SSE] getTargetShuffleMaskIndices - ↵Simon Pilgrim2018-10-221-9/+10
| | | | | | | | | | allow opt-in support for whole undef shuffle mask elements We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all. ........ Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344937
* Revert rL344933 from llvm/trunk: [X86][SSE] Tidyup DecodeVPERMILPMask ↵Simon Pilgrim2018-10-222-5/+5
| | | | | | | | | | shuffle mask decoding We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits. ........ Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344936
* [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decodingSimon Pilgrim2018-10-222-5/+5
| | | | | | Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp llvm-svn: 344933
* [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole ↵Simon Pilgrim2018-10-221-10/+9
| | | | | | | | undef shuffle mask elements Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344931
* [X86] getTargetConstantBitsFromNode - handle extraction from larger constant ↵Simon Pilgrim2018-10-221-2/+3
| | | | | | | | pool entries First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp llvm-svn: 344924
* Revert r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-229-631/+521
| | | | | | Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-229-47/+191
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element ↵Simon Pilgrim2018-10-221-29/+22
| | | | | | size. NFCI. llvm-svn: 344910
* [X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR.Roman Lebedev2018-10-225-26/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904
* [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) ↵Roman Lebedev2018-10-221-3/+22
| | | | | | | | | | | | | | | | | | | pattern Summary: Trivial continuation of D52304. While this pattern is not canonical, we do select it in the BZHI case, so this should not be any different. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52348 llvm-svn: 344902
* Test commit: change comment.Petar Avramovic2018-10-221-1/+1
| | | | llvm-svn: 344900
* [PowerPC][NFC] Fix bugs in r+r to r+i conversionNemanja Ivanovic2018-10-222-13/+58
| | | | | | | | | | | | | | | | | | The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of the corresponding X-Forms since they only target the Altivec registers. Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only load into the upper 32 VSX registers. Similarly with the remaining affected instructions. There is currently no way that I can see to trigger the bug, but as we add other ways of exploiting these instructions, there may very well be instances that do. This is an NFC patch in practical terms since the changes it introduces can not be triggered without an MIR test. Differential revision: https://reviews.llvm.org/D53323 llvm-svn: 344894
* [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.Craig Topper2018-10-222-0/+205
| | | | | | | | | | | | This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884
* [X86] Stop promoting integer loads to vXi64Craig Topper2018-10-219-521/+631
| | | | | | | | | | | | | | | | | | | | | Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
* Revert r344873 "foo"Craig Topper2018-10-214-62/+45
| | | | | | Rebase gone wrong left this in my tree. llvm-svn: 344875
* [X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG ↵Craig Topper2018-10-213-73/+54
| | | | | | | | | | | | | | | | | | | | | combine and target bits support. Use a post isel peephole instead. Summary: These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended. To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion. This patch instead removes all of that and adds a late peephole to detect the two extends. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53449 llvm-svn: 344874
* fooCraig Topper2018-10-214-45/+62
| | | | llvm-svn: 344873
* [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 ↵Simon Pilgrim2018-10-211-2/+12
| | | | | | unary shuffle lowering llvm-svn: 344868
* [X86] Only extract constant pool shuffle mask data with zero offsetsSimon Pilgrim2018-10-212-2/+2
| | | | | | D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets. llvm-svn: 344867
* [WebAssembly] Implement vector sext_inreg and tests with comparisonsThomas Lively2018-10-202-4/+9
| | | | | | | | | | | | Summary: Depends on D53251. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53252 llvm-svn: 344826
* [WebAssembly] Custom lower i64x2 constant shifts to avoid wrapThomas Lively2018-10-204-0/+55
| | | | | | | | | | | | Summary: Depends on D53057. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53251 llvm-svn: 344825
* AMDGPU: Add support pattern for SUB of one bitChangpeng Fang2018-10-191-0/+10
| | | | | | | | | | | | | Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815
* [X86] Remove some left over code from when MVT:i1 was a legal type for AVX512.Craig Topper2018-10-191-3/+0
| | | | llvm-svn: 344813
* [X86] In PostprocessISelDAG, start from allnodes_end, not the root.Craig Topper2018-10-191-2/+1
| | | | | | | | | | There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG. This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from. I don't have a test case, but without this a future patch doesn't work which is how I found it. llvm-svn: 344808
* [WebAssembly] Handle undefined lane indices in SIMD patternsThomas Lively2018-10-192-2/+40
| | | | | | | | | | | | | | | | Summary: Undefined indices in shuffles can be used when not all lanes of the output vector will be used. This happens for example in the expansion of vector reduce operations. Regardless, undefs are legal as lane indices in IR and should be supported. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53057 llvm-svn: 344803
* [Hexagon] Remove support for V4Krzysztof Parzyszek2018-10-1923-757/+567
| | | | llvm-svn: 344791
* Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFCFangrui Song2018-10-192-13/+8
| | | | llvm-svn: 344774
* Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")Eli Friedman2018-10-181-194/+27
| | | | | | | Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752
* [X86] Support for the mno-tls-direct-seg-refs flagKristina Brooks2018-10-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-173-3/+26
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-176-220/+90
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-10-171-27/+194
| | | | | | | | | | | | | | Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693
* AMDGPU: Remove dead TableGen codeNicolai Haehnle2018-10-171-2/+0
| | | | | | | | | | | | | Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691
* [MIPS GlobalISel] Legalize constantsPetar Jovanovic2018-10-171-1/+24
| | | | | | | | | | Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684
* [ARM] Do not fuse VADD and VMUL, continued (2/2)Sjoerd Meijer2018-10-171-2/+4
| | | | | | | | | This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683
* [ARM] Follow up of rL344671, attempt to pacify a buildbotSjoerd Meijer2018-10-171-1/+1
| | | | | | It was rightfully complaining about an unpretty logical expression. llvm-svn: 344677
* [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)Sjoerd Meijer2018-10-173-42/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- | Opt | < rL342874 | >= rL342874 | | |------------------------------------------------------| |-O3 | vmla | vmul | vmul | | | | vadd | vadd | |------------------------------------------------------| |-Ofast | vfma | vfma | vmul | | | | | vadd | |------------------------------------------------------| |-Oz | vmla | vmla | vmla | -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671
* [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.Craig Topper2018-10-161-15/+32
| | | | | | | | | | Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649
* Revert "[WebAssembly] LSDA info generation"Krasimir Georgiev2018-10-163-21/+3
| | | | | | | | This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639
* [PATCH] [NFC][AArch64] Fix refactoring of macro fusionEvandro Menezes2018-10-161-8/+4
| | | | | | Fix compiler error. llvm-svn: 344632
* [NFC][ARM] Refactor macro fusionEvandro Menezes2018-10-161-19/+5
| | | | | | Simplify code for wildcards. llvm-svn: 344625
* [NFC][AArch64] Refactor macro fusionEvandro Menezes2018-10-161-76/+90
| | | | | | Simplify API of checking functions. llvm-svn: 344624
* [X86] Fix Skylake ReadAfterLd for PADDrm etc.Simon Pilgrim2018-10-162-4/+8
| | | | | | Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600
* [mips][micromips] Fix how values in .gcc_except_table are calculatedAleksandar Beserminji2018-10-162-0/+10
| | | | | | | | | | | | | When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591
* [WebAssembly] LSDA info generationHeejin Ahn2018-10-163-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575
* [X86] Remove some isel patterns that shouldn't be possible.Craig Topper2018-10-152-6/+0
| | | | | | These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573
* [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for ↵Craig Topper2018-10-151-9/+10
| | | | | | EVEX encoded instructions. llvm-svn: 344563
* [AARCH64] Improve vector popcnt lowering with ADDLPSimon Pilgrim2018-10-151-12/+36
| | | | | | | | | | AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554
OpenPOWER on IntegriCloud