summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Add basic support for Qualcomm's Saphira CPU.Chad Rosier2017-09-254-0/+4
| | | | llvm-svn: 314105
* [X86] Make IFMA instructions during isel so we can fold broadcast loads.Craig Topper2017-09-241-6/+3
| | | | | | This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
* [X86] Add tests to show missed opportunities to fold broadcast loads into ↵Craig Topper2017-09-241-0/+85
| | | | | | | | IFMA instructions when the load is on operand1 of the instrinsic. We need to enable commuting during isel to catch this since the load folding tables can't handle broadcasts. llvm-svn: 314082
* [X86] Add IFMA instructions to the load folding tables and make them ↵Craig Topper2017-09-241-0/+70
| | | | | | commutable for the multiply operands. llvm-svn: 314080
* [X86][SSE] Add more tests for shuffle combining with extracted vector ↵Simon Pilgrim2017-09-241-0/+56
| | | | | | elements (PR22415) llvm-svn: 314077
* [X86][SSE] Add support for extending bool vectors bitcasted from scalarsSimon Pilgrim2017-09-243-6582/+1099
| | | | | | | | | | This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type. Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases. Differential Revision: https://reviews.llvm.org/D35320 llvm-svn: 314076
* [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGENemanja Ivanovic2017-09-244-0/+524
| | | | | | | | As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential review. llvm-svn: 314073
* [AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare ↵Craig Topper2017-09-242-70/+37
| | | | | | | | instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072
* [X86] Regenerate i64 to v2f32 bitcast testSimon Pilgrim2017-09-231-3/+30
| | | | llvm-svn: 314068
* [x86] reduce 64-bit mask constant to 32-bits by right shiftingSanjay Patel2017-09-231-3/+3
| | | | | | | | | | | | | | This is a follow-up from D38181 (r314023). We have to put 64-bit constants into a register using a separate instruction, so we should try harder to avoid that. From what I see, we're not likely to encounter this pattern in the DAG because the upstream setcc combines from this don't (usually?) produce this pattern. If we fix that, then this will become more relevant. Since the cost of handling this case is just loosening the predicate of the existing fold, we might as well do it now. llvm-svn: 314064
* [x86] add an add+shift test for follow-up suggestion from D38181; NFCSanjay Patel2017-09-231-0/+21
| | | | llvm-svn: 314063
* [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGTNemanja Ivanovic2017-09-2314-0/+1200
| | | | | | | | As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314062
* [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGENemanja Ivanovic2017-09-2317-26/+1428
| | | | | | | | As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314060
* [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGTNemanja Ivanovic2017-09-238-7/+610
| | | | | | | | As mentioned in https://reviews.llvm.org/D33718, this simply adds another pattern to the compare elimination sequence and is committed without a differential revision. llvm-svn: 314055
* [x86] remove over-specified platform from test configSanjay Patel2017-09-221-27/+92
| | | | llvm-svn: 314027
* [x86] swap order of srl (and X, C1), C2 when it saves sizeSanjay Patel2017-09-228-288/+293
| | | | | | | | | | | | | | The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81. There are also better improvements based on known-bits that allow us to eliminate the mask entirely. As noted, this could be extended. There are potentially other wins from always shifting first, but doing that reveals a tangle of problems in other pattern matching. We do this transform generically in instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this in the backend. Differential Revision: https://reviews.llvm.org/D38181 llvm-svn: 314023
* [XRay] support conditional return on PPC.Tim Shen2017-09-222-0/+111
| | | | | | | | | | | | Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest. Reviewers: dberris, echristo Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D38102 llvm-svn: 314005
* Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse passPranav Bhandarkar2017-09-221-0/+86
| | | | | | | | | If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 313993
* [x86] remove unnecessary OS specifier from testSanjay Patel2017-09-221-190/+178
| | | | llvm-svn: 313986
* [x86] auto-generate complete checks; NFCSanjay Patel2017-09-221-15/+59
| | | | llvm-svn: 313985
* [x86] update test to use FileCheck; NFCSanjay Patel2017-09-221-1/+17
| | | | llvm-svn: 313984
* [X86] Combining CMOVs with [ANY,SIGN,ZERO]_EXTEND for cases where CMOV has ↵Alexander Ivchenko2017-09-223-124/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constant arguments Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64]. One example of where it is useful is: before (20 bytes) <foo>: test $0x1,%dil mov $0x307e,%ax mov $0xffff,%cx cmovne %ax,%cx movzwl %cx,%eax retq after (18 bytes) <foo>: test $0x1,%dil mov $0x307e,%ecx mov $0xffff,%eax cmovne %ecx,%eax retq Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36711 llvm-svn: 313982
* Recommit r310809 with a fix for the spill problemNemanja Ivanovic2017-09-2225-96/+971
| | | | | | | | | | This patch re-commits the patch that was pulled out due to a problem it caused, but with a fix for the problem. The fix was reviewed separately by Eric Christopher and Hal Finkel. Differential Revision: https://reviews.llvm.org/D38054 llvm-svn: 313978
* [ARM] Add missing selection patterns for vnmlaSimon Pilgrim2017-09-221-2/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the following function: double fn1(double d0, double d1, double d2) { double a = -d0 - d1 * d2; return a; } on ARM, LLVM generates code along the lines of vneg.f64 d0, d0 vmls.f64 d0, d1, d2 i.e., a negate and a multiply-subtract. The attached patch adds instruction selection patterns to allow it to generate the single instruction vnmla.f64 d0, d1, d2 (multiply-add with negation) instead, like GCC does. Committed on behalf of @gergo- (Gergö Barany) Differential Revision: https://reviews.llvm.org/D35911 llvm-svn: 313972
* [X86] Updating the test case for FMF propagation.Jatin Bhateja2017-09-221-2/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D38163 llvm-svn: 313964
* AArch64: support SwiftCC properly on AAPCS64Saleem Abdulrasool2017-09-221-0/+18
| | | | | | | | | The previous SwiftCC support for AAPCS64 was partially correct. It setup swiftself parameters in the proper register but failed to setup swifterror in the correct register. This would break compilation of swift code for non-Darwin AAPCS64 conforming environments. llvm-svn: 313956
* [Hexagon] - Fix testcase for the HexagonVectorLoopCarriedReuse pass.Pranav Bhandarkar2017-09-211-0/+86
| | | | llvm-svn: 313936
* Revert "Add a testfile that I missed in a previous commit that added ↵Rafael Espindola2017-09-211-86/+0
| | | | | | | | | | HexagonVectorLoopCarriedReuse pass" This reverts commit r313926. It was failing in some bots. llvm-svn: 313931
* Add a testfile that I missed in a previous commit thatPranav Bhandarkar2017-09-211-0/+86
| | | | | | added HexagonVectorLoopCarriedReuse pass llvm-svn: 313926
* [AArch64] Fix bug in store of vector 0 DAGCombine.Geoff Berry2017-09-213-6/+25
| | | | | | | | | | | | | | | | | Summary: Avoid using XZR/WZR directly as operands to split stores of zero vectors. Doing so can lead to the XZR/WZR being used by an instruction that doesn't allow it (e.g. add). Fixes bug 34674. Reviewers: t.p.northover, efriedma, MatzeB Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38146 llvm-svn: 313916
* [SelectionDAG] Pick correct frame index in LowerArgumentsBjorn Pettersson2017-09-211-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SelectionDAGISel::LowerArguments is associating arguments with frame indices (FuncInfo->setArgumentFrameIndex). That information is later on used by EmitFuncArgumentDbgValue to create DBG_VALUE instructions that denotes that a variable can be found on the stack. I discovered that for our (big endian) out-of-tree target the association created by SelectionDAGISel::LowerArguments sometimes is wrong. I've seen this happen when a 64-bit value is passed on the stack. The argument will occupy two stack slots (frame index X, and frame index X+1). The fault is that a call to setArgumentFrameIndex is associating the 64-bit argument with frame index X+1. The effect is that the debug information (DBG_VALUE) will point at the least significant part of the arguement on the stack. When printing the argument in a debugger I will get the wrong value. I managed to create a test case for PowerPC that seems to show the same kind of problem. The bugfix will look at the datalayout, taking endianness into account when examining a BUILD_PAIR node, assuming that the least significant part is in the first operand of the BUILD_PAIR. For big endian targets we should use the frame index from the second operand, as the most significant part will be stored at the lower address (using the highest frame index). Reviewers: bogner, rnk, hfinkel, sdardis, aprantl Reviewed By: aprantl Subscribers: nemanjai, aprantl, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37740 llvm-svn: 313901
* [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} ↵Artem Belevich2017-09-212-0/+97
| | | | | | | | instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898
* [x86] add more tests for node-level FMF; NFCSanjay Patel2017-09-211-0/+45
| | | | llvm-svn: 313893
* Fix buildbot failures, add mtriple to gpr-vsr-spill.llZaara Syeda2017-09-211-1/+1
| | | | llvm-svn: 313890
* [Power9] Spill gprs to vector registers rather than stackZaara Syeda2017-09-211-0/+24
| | | | | | | | | | This patch updates register allocation to enable spilling gprs to volatile vector registers rather than the stack. It can be enabled for Power9 with option -ppc-enable-gpr-to-vsr-spills. Differential Revision: https://reviews.llvm.org/D34815 llvm-svn: 313886
* [X86][SSE] Add PSHUFLW/PSHUFHW tests inspired by PR34686Simon Pilgrim2017-09-212-0/+96
| | | | llvm-svn: 313883
* [SystemZ] Improve optimizeCompareZero()Jonas Paulsson2017-09-211-0/+44
| | | | | | | | | | More conversions to load-and-test can be made with this patch by adding a forward search in optimizeCompareZero(). Review: Ulrich Weigand https://reviews.llvm.org/D38076 llvm-svn: 313877
* [X86] Adding a testpoint for fast-math flags propagation.Jatin Bhateja2017-09-211-0/+47
| | | | | | | | | | | | Reviewers: jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38127 llvm-svn: 313869
* AMDGPU: Add option to stress callsMatt Arsenault2017-09-211-0/+36
| | | | | | | This inverts the behavior of the AlwaysInline pass to mark every function not already marked alwaysinline as noinline. llvm-svn: 313865
* AMDGPU: Fix crash on immediate operandMatt Arsenault2017-09-211-0/+58
| | | | | | | | We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852
* [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.Artem Belevich2017-09-201-0/+94
| | | | | | Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820
* AMDGPU: Start selecting v_mad_mixhi_f16Matt Arsenault2017-09-202-41/+212
| | | | llvm-svn: 313814
* X86: treat SwiftCC as Win64_CC on Win64Saleem Abdulrasool2017-09-201-0/+11
| | | | | | | | | | | The Swift CC is identical to Win64 CC with the exception of swift error being passed in r12 which is a CSR. However, since this calling convention is only used in swift -> swift code, it does not impact interoperability and can be treated entirely as Win64 CC. We would previously incorrectly lower the frame setup as we did not treat the frame as conforming to Win64 specifications. llvm-svn: 313813
* AMDGPU: Start selecting v_mad_mixlo_f16Matt Arsenault2017-09-202-0/+281
| | | | | | | | Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806
* AMDGPU: Fix encoding of op_sel for mad_mix* opcodesMatt Arsenault2017-09-201-24/+24
| | | | llvm-svn: 313797
* CodeGen: support SwiftError SwiftCC on Windows x64Saleem Abdulrasool2017-09-201-0/+18
| | | | | | | | | | Add support for passing SwiftError through a register on the Windows x64 calling convention. This allows the use of swifterror attributes on parameters which is used by the swift front end for the `Error` parameter. This partially enables building the swift standard library for Windows x86_64. llvm-svn: 313791
* [X86][SSE] Add PR22415 test case Simon Pilgrim2017-09-201-0/+22
| | | | llvm-svn: 313755
* Recommit [MachineCombiner] Update instruction depths incrementally for large ↵Florian Hahn2017-09-203-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751
* [IfConversion] Add testcases [NFC]Mikael Holmen2017-09-206-0/+211
| | | | | | | These tests should have been included in r310697 / D34099 but apparently I missed them. llvm-svn: 313737
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-16/+525
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
OpenPOWER on IntegriCloud