summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Fix assembly printing of br_tableHeejin Ahn2018-10-231-0/+32
| | | | | | | | | | | | Summary: In `br_table's stack version asm string, \t was missing. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53516 llvm-svn: 344981
* [WebAssembly] Added test for inline assembly roundtrip.Wouter van Oortmerssen2018-10-231-0/+43
| | | | | | | | | | | | | | | | | | | Summary: Due to previous work to make WebAssembly MC by default stack-only inline assembly now "just works" (previously it didn't since it had no way to know types of registers), so no further work required. So far we only have tests (in inline-asm.ll) which test with non-existing instructions, so this adds a test that roundtrips both the inline assembly and its surrounding code thru the assembler. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D52914 llvm-svn: 344977
* [Intrinsic] Unigned Saturation Addition IntrinsicLeonard Chan2018-10-221-0/+157
| | | | | | | | | | | | Add an intrinsic that takes 2 integers and perform unsigned saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53340 llvm-svn: 344971
* X86: Do not optimize branches with undef eflags inputsMatthias Braun2018-10-221-0/+18
| | | | | | | | | | | | analyzeBranch()/insertBranch() etc. do not properly deal with an undef flag on the eflags input and used to produce invalid MIR. I don't see this ever affecting real world inputs (I don't think it is possible to produce undef flags with llvm IR), so I simply changed the code to bail out in this case. rdar://42122367 llvm-svn: 344970
* Recommit r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-226-68/+79
| | | | | | | | | | | | | | | | | | | | | | | | I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
* [x86] add test for PR25498 and complete checks; NFCSanjay Patel2018-10-221-14/+1213
| | | | | | Might as well test the actual codegen instead of just the absence of crashing. llvm-svn: 344955
* Reapply "[MachineCopyPropagation] Reimplement CopyTracker in terms of ↵Justin Bogner2018-10-222-0/+102
| | | | | | | | | | | | | | | | | | | register units" Recommits r342942, which was reverted in r343189, with a fix for an issue where we would propagate unsafely if we defined only the upper part of a register. Original message: Change the copy tracker to keep a single map of register units instead of 3 maps of registers. This gives a very significant compile time performance improvement to the pass. I measured a 30-40% decrease in time spent in MCP on x86 and AArch64 and much more significant improvements on out of tree targets with more registers. llvm-svn: 344942
* Revert r344877 "[X86] Stop promoting integer loads to vXi64"Craig Topper2018-10-226-79/+68
| | | | | | Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-2218-582/+2188
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) ↵Roman Lebedev2018-10-222-221/+108
| | | | | | | | | | | | | | | | | | | pattern Summary: Trivial continuation of D52304. While this pattern is not canonical, we do select it in the BZHI case, so this should not be any different. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52348 llvm-svn: 344902
* [PowerPC][NFC] Fix bugs in r+r to r+i conversionNemanja Ivanovic2018-10-221-12/+12
| | | | | | | | | | | | | | | | | | The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of the corresponding X-Forms since they only target the Altivec registers. Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only load into the upper 32 VSX registers. Similarly with the remaining affected instructions. There is currently no way that I can see to trigger the bug, but as we add other ways of exploiting these instructions, there may very well be instances that do. This is an NFC patch in practical terms since the changes it introduces can not be triggered without an MIR test. Differential revision: https://reviews.llvm.org/D53323 llvm-svn: 344894
* [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.Craig Topper2018-10-223-6/+17
| | | | | | | | | | | | This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884
* [X86] Stop promoting integer loads to vXi64Craig Topper2018-10-216-68/+79
| | | | | | | | | | | | | | | | | | | | | Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
* [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)Sanjay Patel2018-10-213-24/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872
* [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 ↵Simon Pilgrim2018-10-215-166/+57
| | | | | | unary shuffle lowering llvm-svn: 344868
* DebugInfo: Use DW_OP_addrx in DWARFv5David Blaikie2018-10-201-2/+2
| | | | | | Reuse addresses in the address pool, even in non-split cases. llvm-svn: 344838
* [WebAssembly] Implement vector sext_inreg and tests with comparisonsThomas Lively2018-10-202-20/+875
| | | | | | | | | | | | Summary: Depends on D53251. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53252 llvm-svn: 344826
* [WebAssembly] Custom lower i64x2 constant shifts to avoid wrapThomas Lively2018-10-201-3/+26
| | | | | | | | | | | | Summary: Depends on D53057. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53251 llvm-svn: 344825
* [MachineCSE][GlobalISel] Making sure MachineCSE works mid-GlobalISel (again)Roman Tereshin2018-10-201-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | Change of approach, it looks like it's a much better idea to deal with the vregs that have LLTs and reg classes both properly, than trying to avoid creating those across all GlobalISel passes and all targets. The change mostly touches MachineRegisterInfo::constrainRegClass, which is apparently only used by MachineCSE. The changes are NFC for any pipeline but one that contains MachineCSE mid-GlobalISel. NOTE on isCallerPreservedOrConstPhysReg change in MachineCSE: There is no test covering it as the only way to insert a new pass (MachineCSE) from a command line I know of is llc's -run-pass option, which only works with MIR, but MIRParser freezes reserved registers upon MachineFunctions creation, making it impossible to reproduce the state that exposes the issue. Reviwed By: aditya_nandakumar Differential Revision: https://reviews.llvm.org/D53144 llvm-svn: 344822
* AMDGPU: Add support pattern for SUB of one bitChangpeng Fang2018-10-192-0/+73
| | | | | | | | | | | | | Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815
* [GISel]: Allow PHIs to be DCEdAditya Nandakumar2018-10-191-0/+1
| | | | | | | | | | | https://reviews.llvm.org/D53304 Currently dead phis are not cleaned up during DCE. This patch allows dead PHI and G_PHI insts to be deleted. Reviewed by: dsanders llvm-svn: 344811
* [WebAssembly] Handle undefined lane indices in SIMD patternsThomas Lively2018-10-191-0/+266
| | | | | | | | | | | | | | | | Summary: Undefined indices in shuffles can be used when not all lanes of the output vector will be used. This happens for example in the expansion of vector reduce operations. Regardless, undefs are legal as lane indices in IR and should be supported. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D53057 llvm-svn: 344803
* [Hexagon] Remove support for V4Krzysztof Parzyszek2018-10-1930-1374/+1352
| | | | llvm-svn: 344791
* [pipeliner] Fix test added in rL344748 to require assertsFangrui Song2018-10-191-0/+2
| | | | llvm-svn: 344775
* Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")Eli Friedman2018-10-1824-688/+0
| | | | | | | Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752
* [Pipeliner] copyToPhi DAG Mutation to improve scheduling.Sumanth Gundapaneni2018-10-181-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | In a loop, create artificial dependences between the source of a COPY/REG_SEQUENCE to the use in next iteration. Eg: SRC ----Data Dep--> COPY COPY ---Anti Dep--> PHI (implies, to be used in next iteration) PHI ----Data Dep--> USE This patches creates USE ----Artificial Dep---> SRC This will effectively schedule the COPY late to eliminate additional copies. Before this patch, the schedule can be SRC, COPY, USE : The COPY is used in next iteration and it needs to be preserved. After this patch, the schedule can be USE, SRC, COPY : The COPY is used in next iteration and the live interval is reduced. Differential Revision: https://reviews.llvm.org/D53303 llvm-svn: 344748
* [X86] Support for the mno-tls-direct-seg-refs flagKristina Brooks2018-10-181-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-171-0/+129
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* StructurizeCFG: Simplify inserted PHI nodesNicolai Haehnle2018-10-173-9/+12
| | | | | | | | | | | | | | | Summary: This improves subsequent divergence analysis in some cases. Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53316 llvm-svn: 344697
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-172-8/+67
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* [ARM] bottom-top mul support in ARMParallelDSPSam Parker2018-10-1724-0/+688
| | | | | | | | | | | | | | Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693
* [MIPS GlobalISel] Legalize constantsPetar Jovanovic2018-10-172-0/+272
| | | | | | | | | | Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D53077 llvm-svn: 344684
* [ARM] Do not fuse VADD and VMUL, continued (2/2)Sjoerd Meijer2018-10-171-0/+9
| | | | | | | | | This is patch 2/2, following up on D53314, and is the functional change to prevent fusing mul + add sequences into VFMAs. Differential revision: https://reviews.llvm.org/D53315 llvm-svn: 344683
* [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)Sjoerd Meijer2018-10-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- | Opt | < rL342874 | >= rL342874 | | |------------------------------------------------------| |-O3 | vmla | vmul | vmul | | | | vadd | vadd | |------------------------------------------------------| |-Ofast | vfma | vfma | vmul | | | | | vadd | |------------------------------------------------------| |-Oz | vmla | vmla | vmla | -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671
* [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.Craig Topper2018-10-161-4/+2
| | | | | | | | | | Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649
* Revert "[WebAssembly] LSDA info generation"Krasimir Georgiev2018-10-162-242/+3
| | | | | | | | This reverts commit r344575. Newly introduced test eh-lsda.ll.test fails with use-after-free under ASAN build. llvm-svn: 344639
* [Intrinsic] Signed Saturation Addition IntrinsicLeonard Chan2018-10-161-0/+267
| | | | | | | | | | | Add an intrinsic that takes 2 integers and perform saturation addition on them. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D53053 llvm-svn: 344629
* [mips][micromips] Fix how values in .gcc_except_table are calculatedAleksandar Beserminji2018-10-161-0/+37
| | | | | | | | | | | | | When a landing pad is calculated in a program that is compiled for micromips, it will point to an even address. Such an error will cause a segmentation fault, as the instructions in micromips are aligned on odd addresses. This patch sets the last bit of the offset where a landing pad is, to 1, which will effectively be an odd address and point to the instruction exactly. Differential Revision: https://reviews.llvm.org/D52985 llvm-svn: 344591
* [WebAssembly] LSDA info generationHeejin Ahn2018-10-162-3/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds support for LSDA (exception table) generation for wasm EH. Wasm EH mostly follows the structure of Itanium-style exception tables, with one exception: a call site table entry in wasm EH corresponds to not a call site but a landing pad. In wasm EH, the VM is responsible for stack unwinding. After an exception occurs and the stack is unwound, the control flow is transferred to wasm 'catch' instruction by the VM, after which the personality function is called from the compiler-generated code. (Refer to WasmEHPrepare pass for more information on this part.) This patch: - Changes wasm.landingpad.index intrinsic to take a token argument, to make this 1:1 match with a catchpad instruction - Stores landingpad index info and catch type info MachineFunction in before instruction selection - Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an exception table - Adds WasmException class with overridden methods for table generation - Adds support for LSDA section in Wasm object writer Reviewers: dschuff, sbc100, rnk Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52748 llvm-svn: 344575
* StructurizeCFG,AMDGPU: Test case of a redundant phi and codegen consequencesNicolai Haehnle2018-10-151-0/+34
| | | | | Change-Id: I9681f9e41ca30f82576f3d1f965c3a550a34b171 llvm-svn: 344569
* [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for ↵Craig Topper2018-10-153-12/+6
| | | | | | EVEX encoded instructions. llvm-svn: 344563
* [X86] Add test cases showing failure to fold load into vpsrlw when EVEX ↵Craig Topper2018-10-153-27/+115
| | | | | | | | encoded instructions are used. There's a bad bitcast being used in the isel patterns for the vXi16 shift instructions. llvm-svn: 344562
* [X86] Disable the peephole pass on avx2-intrinsics-x86.ll and ↵Craig Topper2018-10-152-6/+6
| | | | | | avx512bw-intrinsics.ll to ensure any load folding tests are testing isel not load folding tables. llvm-svn: 344561
* [X86] Regenerate avx2-intrinsics-x86.ll to compress the 32 vs 64 bit mode ↵Craig Topper2018-10-151-1188/+559
| | | | | | checks. llvm-svn: 344560
* [AARCH64] Improve vector popcnt lowering with ADDLPSimon Pilgrim2018-10-151-114/+26
| | | | | | | | | | AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554
* AMDGPU: Generate .amdgcn_target for object code v3Konstantin Zhuravlyov2018-10-151-0/+58
| | | | | | Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552
* [SelectionDAG] allow FP binops in SimplifyDemandedVectorEltsSanjay Patel2018-10-153-26/+24
| | | | | | | | | | | | This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in: rL343727 ...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553. Differential Revision: https://reviews.llvm.org/D52912 llvm-svn: 344541
* [DAGCombiner] allow undef elts in vector fmul matchingSanjay Patel2018-10-151-6/+4
| | | | llvm-svn: 344534
* [AArch64] add tests for fmul x, -2.0 with undef elts; NFCSanjay Patel2018-10-151-10/+45
| | | | | | Also, add tests with commuted operands. There was no coverage for that case. llvm-svn: 344531
* [DAGCombiner] allow undef elts in vector fma matchingSanjay Patel2018-10-151-66/+126
| | | | llvm-svn: 344528
OpenPOWER on IntegriCloud