summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity ↵Simon Pilgrim2019-04-281-2/+12
| | | | | | | | | | reduction (PR38840) An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0. Differential Revision: https://reviews.llvm.org/D61230 llvm-svn: 359396
* [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result ↵Craig Topper2019-04-283-70/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392
* Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-12/+10
| | | | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. ........ Causing irregular buildbot failures. llvm-svn: 359391
* [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-04-271-10/+12
| | | | | | Minor generalization of the existing <32 x i1> pre-AVX2 split code. llvm-svn: 359389
* [X86][AVX512] Improve vector bool reductionsSimon Pilgrim2019-04-271-11/+22
| | | | | | As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern. llvm-svn: 359386
* [DJB] Fix variable case after D61178Fangrui Song2019-04-271-3/+3
| | | | llvm-svn: 359381
* [X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332)Simon Pilgrim2019-04-271-4/+44
| | | | | | | | Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector. We can extend this in the future to handle more opcodes and non-zero selections. llvm-svn: 359378
* [MCA] Add field `IsEliminated` to class Instruction. NFCIAndrea Di Biagio2019-04-271-3/+3
| | | | llvm-svn: 359377
* [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabledCraig Topper2019-04-275-19/+72
| | | | | | | | | | | | | | | | Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-275-128/+17
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-268-361/+696
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [ORC] Add a 'plugin' interface to ObjectLinkingLayer for events/configuration.Lang Hames2019-04-262-51/+153
| | | | | | | | | | | | | ObjectLinkingLayer::Plugin provides event notifications when objects are loaded, emitted, and removed. It also provides a modifyPassConfig callback that allows plugins to modify the JITLink pass configuration. This patch moves eh-frame registration into its own plugin, and teaches llvm-jitlink to only add that plugin when performing execution runs on non-Windows platforms. This should allow us to re-enable the test case that was removed in r359198. llvm-svn: 359357
* [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extractsJessica Paquette2019-04-261-38/+6
| | | | | | | | | | | | getConstantVRegValWithLookThrough does the same thing as the getConstantValueForReg function, and has more visibility across GISel. Plus, it supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code reuse and more functionality for free by using it. Add some test cases to select-extract-vector-elt.mir to show that we can now look through those instructions. llvm-svn: 359351
* [AsmPrinter] refactor to support %c w/ GlobalAddress'Nick Desaulniers2019-04-2618-82/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337
* [X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has ↵Simon Pilgrim2019-04-261-0/+7
| | | | | | one use llvm-svn: 359332
* [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64Jessica Paquette2019-04-262-1/+38
| | | | | | | | | There are instructions for these, so mark them as legal. Select the correct instruction in AArch64InstructionSelector.cpp. Update select-bswap.mir and arm64-rev.ll to reflect the changes. llvm-svn: 359331
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-264-102/+281
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.Simon Pilgrim2019-04-261-10/+11
| | | | | | Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326
* [ConstantRange] Add abs() supportNikita Popov2019-04-261-0/+31
| | | | | | | | | | | | | | Add support for abs() to ConstantRange. This will allow to handle SPF_ABS select flavor in LVI and will also come in handy as a primitive for the srem implementation. The implementation is slightly tricky, because a) abs of signed min is signed min and b) sign-wrapped ranges may have an abs() that is smaller than a full range, so we need to explicitly handle them. Differential Revision: https://reviews.llvm.org/D61084 llvm-svn: 359321
* [X86] Sink NoRegister creation for unused Base/Index registers into ↵Craig Topper2019-04-261-36/+26
| | | | | | getAddressOperands. NFCI llvm-svn: 359318
* [X86] Segment registers should have i16 type not i32.Craig Topper2019-04-261-1/+1
| | | | | | Probably doesn't really matter, but was inconsistent with the rest of the code. llvm-svn: 359317
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-266-154/+605
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [PowerPC] Update P9 vector costs for insert/extract elementRoland Froese2019-04-261-0/+29
| | | | | | | | | | | The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313
* s/Dwarf 5/DWARF v5/ NFCFangrui Song2019-04-261-1/+1
| | | | llvm-svn: 359307
* Fix Wparentheses warning. NFCI.Simon Pilgrim2019-04-261-5/+5
| | | | llvm-svn: 359299
* [X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching ↵Simon Pilgrim2019-04-261-50/+59
| | | | | | | | | | code from LowerVectorAllZeroTest Create a matchBitOpReduction helper that checks for the pattern with any opcode. First step towards reusing this code to recognize other scalar reduction patterns. llvm-svn: 359296
* caseFoldingDjbHash: simplify and make the US-ASCII fast path fasterFangrui Song2019-04-261-19/+16
| | | | | | | | | The slow path (with at least one non US-ASCII) will be slower but that doesn't matter. Differential Revision: https://reviews.llvm.org/D61178 llvm-svn: 359294
* [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 ↵Simon Pilgrim2019-04-264-2/+26
| | | | | | | | | | targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293
* [X86][AVX] Combine shuffles extracted from a common vectorSimon Pilgrim2019-04-261-0/+45
| | | | | | | | A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380. Differential Revision: https://reviews.llvm.org/D60512 llvm-svn: 359292
* [InferAddressSpaces] Add AS parameter to the pass factorySven van Haastregt2019-04-261-6/+11
| | | | | | | | | | | | | | This enables the pass to be used in the absence of TargetTransformInfo. When the argument isn't passed, the factory defaults to UninitializedAddressSpace and the flat address space is obtained from the TargetTransformInfo as before this change. Existing users won't have to change. Patch by Kevin Petit. Differential Revision: https://reviews.llvm.org/D60602 llvm-svn: 359290
* Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()Hans Wennborg2019-04-261-1/+1
| | | | | | | | | | | | | | The code was using the alignment of a pointer to the value, not the alignment of the constant itself. Maybe we got away with it so far because the pointer alignment is fairly high, but we did end up under-aligning <16 x i8> vectors, which was caught in the Chromium build after lld stopped over-aligning the .rodata.cst16 section in r356428. (See crbug.com/953815) Differential revision: https://reviews.llvm.org/D61124 llvm-svn: 359287
* [GlobalISel] Fix inserting copies in the right position for reg definitionsMarcello Maggioni2019-04-262-12/+38
| | | | | | | | | | | | | When constrainRegClass is called if the constraining happens on a use the COPY needs to be inserted before the instruction that contains the MachineOperand, but if we are constraining a definition it actually needs to be added after the instruction. In addition, the COPY needs to have its operands flipped (in the use case we are copying from the old unconstrained register to the new constrained register, while in the definition case we are copying from the new constrained register that the instruction defines to the old unconstrained register). llvm-svn: 359282
* [GlobalOpt] Swap the expensive check for cold calls with the cheap TTI checkJustin Bogner2019-04-261-2/+2
| | | | | | | | | | | isValidCandidateForColdCC is much more expensive than TTI.useColdCCForColdCall, which by default just returns false. Avoid doing this work if we're not going to look at the answer anyway. This change is NFC, but I see significant compile time improvements on some code with pathologically many functions. llvm-svn: 359253
* [ORC] Remove symbols from dependency lists when failing materialization.Lang Hames2019-04-251-2/+29
| | | | | | | | | When failing materialization of a symbol X, remove X from the dependants list of any of X's dependencies. This ensures that when X's dependencies are emitted (or fail themselves) they do not try to access the no-longer-existing MaterializationInfo for X. llvm-svn: 359252
* [CUDA] Implemented _[bi]mma* builtins.Artem Belevich2019-04-251-0/+2
| | | | | | | | | | | | | | | | These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248
* PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.Artem Belevich2019-04-253-54/+243
| | | | | | | | | | | | All of the new instructions are still handled mostly by tablegen. I've slightly refactored the code to drive intrinsic/instruction generation from a master list of supported variants, so all irregularities have to be implemented in one place only. The test generation script wmma.py has been refactored in a similar way. Differential Revision: https://reviews.llvm.org/D60015 llvm-svn: 359247
* [NVPTX] generate correct MMA instruction mnemonics with PTX63+.Artem Belevich2019-04-254-129/+170
| | | | | | | | | | | PTX 6.3 requires using ".aligned" in the MMA instruction names. In order to generate correct name, now we pass current PTX version to each instruction as an extra constant operand and InstPrinter adjusts its output accordingly. Differential Revision: https://reviews.llvm.org/D59393 llvm-svn: 359246
* [NVPTX] Refactor generation of MMA intrinsics and instructions. NFC.Artem Belevich2019-04-251-329/+183
| | | | | | | | | | | | | | | Generalized constructions of 'fragments' of MMA operations to provide common primitives for construction of the ops. This will make it easier to add new variants of the instructions that operate on integer types. Use nested foreach loops which makes it possible to better control naming of the intrinsics. This patch does not affect LLVM's output, so there are no test changes. Differential Revision: https://reviews.llvm.org/D59389 llvm-svn: 359245
* [Object][XCOFF] Add intial support for section header table.Sean Fertile2019-04-251-29/+83
| | | | | | | | | Adds a representation of the section header table to XCOFFObjectFile, and implements enough to dump the section headers with llvm-obdump. Differential Revision: https://reviews.llvm.org/D60784 llvm-svn: 359244
* [AMDGPU] gfx1010 - fix ubsan failureStanislav Mekhanoshin2019-04-251-1/+0
| | | | | | | Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239
* Assigning to a local object in a return statement prevents copy elision. NFC.David Blaikie2019-04-253-6/+10
| | | | | | | | | | | | | | | | I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment. P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these. (Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.) Reviewed By: dblaikie Author: Arthur O'Dwyer Differential Revision: https://reviews.llvm.org/D54885 llvm-svn: 359236
* [GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16sJessica Paquette2019-04-251-2/+2
| | | | | | | | This case was missing before, so we couldn't legalize it. Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir. llvm-svn: 359231
* [ObjC][ARC] Let ARC optimizer bail out if the number of pointer statesAkira Hatanaka2019-04-251-2/+42
| | | | | | | | | | | | | | | | | | | | it keeps track of becomes too large ARC optimizer does a top-down and a bottom-up traversal of the whole function to pair up retain and release instructions and remove them. This can be expensive if the number of instructions in the function and pointer states it tracks are large since it has to look at each pointer state and determine whether the instruction being visited can potentially use the pointer. This patch adds a command line option that sets a limit to the number of pointers it tracks. rdar://problem/49477063 Differential Revision: https://reviews.llvm.org/D61100 llvm-svn: 359226
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-256-102/+306
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] gfx1010 utility functionsStanislav Mekhanoshin2019-04-254-29/+90
| | | | | | Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224
* [GlobalISel][AArch64] Add generic legalization rule for extendsJessica Paquette2019-04-251-2/+23
| | | | | | | | | | | | | This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows extends whenever the types will fit in registers (or the source is an s1). Update tests. Add GISel checks throughout all of arm64-vabs.ll, where we now select a good portion of the code. Add GISel checks to arm64-subvector-extend.ll, which has a good number of vector extends in it. Differential Revision: https://reviews.llvm.org/D60889 llvm-svn: 359222
* [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the ↵Craig Topper2019-04-251-15/+18
| | | | | | | | | | | | | | | | | | input needs to be be split and the output type is a vector. We had special case handling here, but it uses a scalar any_extend for the promotion then bitcasts to the final type. This won't split up the input data into multiple promoted elements like we need. This patch falls back to doing the conversion through memory. Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll changes. The changes to vector-half-conversions.ll are fixing a previously unknown miscompile from this issue. Differential Revision: https://reviews.llvm.org/D61114 llvm-svn: 359219
* [Evaluator] Walk initial elements when handling load through bitcastRobert Lougher2019-04-251-38/+65
| | | | | | | | | | | | | | | | | | | | | | When evaluating a store through a bitcast, the evaluator tries to move the bitcast from the pointer onto the stored value. If the cast is invalid, it tries to "introspect" the type to get a valid cast by obtaining a pointer to the initial element (if the type is nested, this may require walking several initial elements). In some situations it is possible to get a bitcast on a load (e.g. with unions, where the bitcast may not be the same type as the store). However, equivalent logic to the store to introspect the type is missing. This patch add this logic. Note, when developing the patch I was unhappy with adding similar logic directly to the load case as it could get out of step. Instead, I have abstracted the "introspection" into a helper function, with the specifics being handled by a passed-in lambda function. Differential Revision: https://reviews.llvm.org/D60793 llvm-svn: 359205
* [GlobalISel][AArch64] Legalize G_FNEARBYINTJessica Paquette2019-04-253-1/+5
| | | | | | | | | Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc. Since the importer allows us to automatically select this after legalization, also add tests for selection etc. Also update arm64-vfloatintrinsics.ll. llvm-svn: 359204
* [GlobalISel] Add IRTranslator support for G_FNEARBYINTJessica Paquette2019-04-251-0/+2
| | | | | | | | | Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60922 llvm-svn: 359203
OpenPOWER on IntegriCloud