summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[LICM] Make LICM able to hoist phis"Benjamin Kramer2018-11-191-302/+15
| | | | | | This reverts commit r347190. llvm-svn: 347225
* [AMDGPU] Derive GCNSubtarget from MF to get overridden target featuresDavid Stuttard2018-11-191-2/+2
| | | | | | | | | | | | | | | | | | Summary: AMDGPUAsmPrinter has a getSTI function that derives a GCNSubtarget from the TM. However, this means that overridden target features are not detected and can result in incorrect behaviour. Switch to using STM which is a GCNSubtarget derived from the MF (used elsewhere in the same function). Change-Id: Ib6328ad667b7fcdc87e9c06344e59859207db9b0 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54301 llvm-svn: 347221
* [LV] Avoid vectorizing unsafe dependencies in uniform addressAnna Thomas2018-11-192-8/+15
| | | | | | | | | | | | | | | | | | | Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220
* [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop ↵Fedor Sergeev2018-11-191-2/+4
| | | | | | | | | | | | | passes Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass run, however condition for issuing it is accumulated among all the runs. That leads to confusing 'modification' messages as soon as the first modification is done. Changing condition to be "current pass made modifications", similar to how it is being done in all other pass managers. llvm-svn: 347215
* [SelectionDAG] simplify select FP with undef conditionSanjay Patel2018-11-191-1/+1
| | | | llvm-svn: 347212
* [SelectionDAG] add simplifySelect() to reduce code duplication; NFCSanjay Patel2018-11-192-36/+27
| | | | | | This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210
* Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.Martin Elshuber2018-11-194-1/+1371
| | | | | | | | | | | | | | This patch defines an interleaved-load-combine pass. The pass searches for ShuffleVector instructions that represent interleaved loads. Matches are converted such that they will be captured by the InterleavedAccessPass. The pass extends LLVMs capabilities to use target specific instruction selection of interleaved load patterns (e.g.: ld4 on Aarch64 architectures). Differential Revision: https://reviews.llvm.org/D52653 llvm-svn: 347208
* [ThinLTO] Fix comment. NFCEugene Leviant2018-11-191-1/+1
| | | | llvm-svn: 347207
* AMDGPU/InsertWaitcnts: Some more const-correctnessNicolai Haehnle2018-11-191-4/+4
| | | | | | | | | | Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54225 llvm-svn: 347192
* [ARM] Remove trunc sinks in ARM CGPSam Parker2018-11-191-73/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | Truncs are treated as sources if their produce a value of the same type as the one we currently trying to promote. Truncs used to be considered as a sink if their operand was the same value type. We now allow smaller types in the search, so we should search through truncs that produce a smaller value. These truncs can then be converted to an AND mask. This leaves sinks as being: - points where the value in the register is being observed, such as an icmp, switch or store. - points where value types have to match, such as calls and returns. - zext are included to ease the transformation and are generally removed later on. During this change, it also became apart from truncating sinks was broken: if a sink used a source, its type information had already been lost by the time the truncation happens. So I've changed the method of caching the type information. Differential Revision: https://reviews.llvm.org/D54515 llvm-svn: 347191
* [LICM] Make LICM able to hoist phisJohn Brawn2018-11-191-15/+302
| | | | | | | | | | | | | | | The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347190
* [MSP430] Optimize srl/sra in case of A >> (8 + N)Anton Korobeynikov2018-11-191-2/+12
| | | | | | | | | | | There is no variable-length shifts on MSP430. Therefore "eat" 8 bits of shift via bswap & ext. Path by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D54623 llvm-svn: 347187
* Fix disturbing warning - NFCISerge Guelton2018-11-191-1/+1
| | | | llvm-svn: 347186
* [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the ↵Craig Topper2018-11-191-3/+3
| | | | | | | | sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185
* [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switchesMax Kazantsev2018-11-191-0/+313
| | | | | | | | | | | | | | | | This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: https://reviews.llvm.org/D54021 Reviewed By: anna llvm-svn: 347183
* [ProfileSummary] Standardize methods and fix commentVedant Kumar2018-11-199-17/+17
| | | | | | | | | | | | | | | | | | | | | Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182
* [X86] Use compare with 0 to fill an element with sign bits when sign ↵Craig Topper2018-11-191-4/+4
| | | | | | | | extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181
* [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under ↵Craig Topper2018-11-191-7/+4
| | | | | | | | -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp ↵Simon Pilgrim2018-11-181-0/+27
| | | | | | conversions. llvm-svn: 347177
* [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.Craig Topper2018-11-181-8/+34
| | | | | | | | Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176
* [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.Simon Pilgrim2018-11-181-0/+41
| | | | | | SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173
* [X86] Disable combineToExtendVectorInReg under ↵Craig Topper2018-11-181-0/+74
| | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172
* [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an ↵Craig Topper2018-11-181-0/+16
| | | | | | | | | | | | | | | | extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171
* [DAG] add undef simplifications for select nodesSanjay Patel2018-11-182-14/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170
* Remove unused variable. NFCI.Simon Pilgrim2018-11-181-8/+9
| | | | llvm-svn: 347169
* [X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVectorSimon Pilgrim2018-11-181-36/+40
| | | | | | | | Refactor towards making this recursive (necessary for PR38243 rotation splat detection). IsSplatVector returns the original vector source of the splat and the splat index. GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector. llvm-svn: 347168
* [X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts.Simon Pilgrim2018-11-181-7/+6
| | | | | | Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162
* [SelectionDAG] simplify code; NFCSanjay Patel2018-11-181-6/+5
| | | | llvm-svn: 347160
* [X86][SSE] Use raw shuffle mask decode in ↵Simon Pilgrim2018-11-181-4/+10
| | | | | | | | SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158
* [WebAssembly] Add null streamer supportHeejin Ahn2018-11-182-0/+23
| | | | | | | | | | | | Summary: Now `llc -filetype=null` works. Reviewers: eush Subscribers: dschuff, jgravelle-google, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D54660 llvm-svn: 347155
* [X86] Add -x86-experimental-vector-widening-legalization check to ↵Craig Topper2018-11-181-2/+5
| | | | | | | | combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection. llvm-svn: 347151
* [X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use ↵Craig Topper2018-11-181-4/+4
| | | | | | widen to refer to adding elements not making elements larger. NFC llvm-svn: 347150
* [X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends ↵Craig Topper2018-11-181-0/+10
| | | | | | | | from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149
* [CorrelatedValuePropagation] Preserve debug locations (PR38178)Vedant Kumar2018-11-181-15/+16
| | | | | | | | | Fix all of the missing debug location errors in CVP found by debugify. This includes the missing-location-after-udiv-truncation case described in llvm.org/PR38178. llvm-svn: 347147
* Fix bot failure from r347145Teresa Johnson2018-11-171-8/+7
| | | | | | | | The #if check around the statistics computation gave an error about the statistic being an unused variable. Instead, guard with AreStatisticsEnabled(). llvm-svn: 347146
* [ThinLTO] Add some stats for read only variable internalizationTeresa Johnson2018-11-171-0/+14
| | | | | | | | | | | | | | | Summary: Follow up to D49362 ([ThinLTO] Internalize read only globals). Add a statistic on the number of read only variables (only counting live variables since dead variables will be dropped anyway). Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54642 llvm-svn: 347145
* [X86] Add support for matching PACKUSWB from a v64i8 shuffle.Craig Topper2018-11-171-0/+5
| | | | llvm-svn: 347143
* Move BuryPointer from Clang to LLVM for use in other LLVM toolsDavid Blaikie2018-11-172-0/+32
| | | | | | | Specifically planning to use this in llvm-symbolizer to remove the cost of cleanup there. llvm-svn: 347140
* llvm-symbolizer: Avoid calling getFromOffset when the index entry is already ↵David Blaikie2018-11-171-8/+14
| | | | | | | | | | | | available Especially for symbolizer it can be efficient to have to search through the entire index when it isn't needed - llvm-symbolizer looks up only a few CUs & already has an index available in getUnitForEntry, once it's passed down to DWARFUnitHeader::extract then there's no need for it to call getFromOffset. llvm-svn: 347134
* [X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and ↵Craig Topper2018-11-171-1/+1
| | | | | | prefer-vector-width=256. llvm-svn: 347131
* Reverted r347092 due to the following build fails:Vyacheslav Zakharin2018-11-174-781/+30
| | | | | | | http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/8662 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/26263 llvm-svn: 347129
* [X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask.Craig Topper2018-11-171-8/+5
| | | | llvm-svn: 347127
* Use llvm::copy. NFCFangrui Song2018-11-1721-37/+33
| | | | llvm-svn: 347126
* DAG combiner: fold (select, C, X, undef) -> XStanislav Mekhanoshin2018-11-161-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110
* [X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under ↵Craig Topper2018-11-161-3/+48
| | | | | | | | | | -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105
* [X86] Qualify part of the masked gather handling in ReplaceNodeResults with ↵Craig Topper2018-11-161-21/+23
| | | | | | | | a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed. llvm-svn: 347100
* [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch withFedor Sergeev2018-11-161-2/+116
| | | | | | | | | | | | | | | | | | | | | We need to control exponential behavior of loop-unswitch so we do not get run-away compilation. Suggested solution is to introduce a multiplier for an unswitch cost that makes cost prohibitive as soon as there are too many candidates and too many sibling loops (meaning we have already started duplicating loops by unswitching). It does solve the currently known problem with compile-time degradation (PR 39544). Tests are built on top of a recently implemented CHECK-COUNT-<num> FileCheck directives. Reviewed By: chandlerc, mkazantsev Differential Revision: https://reviews.llvm.org/D54223 llvm-svn: 347097
* [X86] Remove a branch on SSE4.1 from LowerLoadCraig Topper2018-11-161-14/+2
| | | | | | We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG. llvm-svn: 347095
* [LegalizeVectorOps] After custom legalizing an extending load or a ↵Craig Topper2018-11-161-2/+10
| | | | | | | | | | truncating store, make sure the custom code is also legal. For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization. Unfortunately, this doesn't seem to matter for the output of any existing lit tests. llvm-svn: 347094
* [X86] In LowerLoad, fix assert messages and rename a variable that use Zize ↵Craig Topper2018-11-161-8/+8
| | | | | | instead of Size. NFC llvm-svn: 347093
OpenPOWER on IntegriCloud