summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [JumpThreading][NFC] Rename LoadInst variablesBrian M. Rzycki2018-01-291-43/+46
| | | | | | | | | | | | | | | | | | Summary: The JumpThreading pass has several locations where to the variable name LI refers to a LoadInst type. This is confusing and inhibits the ability to use LI for LoopInfo as a member of the JumpThreading class. Minor formatting and comments were also altered to reflect this change. Reviewers: dberlin, kuba, spop, sebpop Reviewed by: sebpop Subscribers: sebpop, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42601 llvm-svn: 323695
* [X86] Emit 11-byte or 15-byte NOPs on recent AMD targets, else default to ↵Simon Pilgrim2018-01-294-7/+40
| | | | | | | | | | | | 10-byte NOPs (PR22965) We currently emit up to 15-byte NOPs on all targets (apart from Silvermont), which stalls performance on some targets with decoders that struggle with 2 or 3 more '66' prefixes. This patch flags recent AMD targets (btver1/znver1) to still emit 15-byte NOPs and bdver* targets to emit 11-byte NOPs. All other targets now emit 10-byte NOPs apart from SilverMont CPUs which still emit 7-byte NOPS. Differential Revision: https://reviews.llvm.org/D42616 llvm-svn: 323693
* [ARM][GISel] PR35965 Constrain RegClasses of nested instructions built from ↵Daniel Sanders2018-01-292-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Dst Pattern Summary: Apparently, we missed on constraining register classes of VReg-operands of all the instructions built from a destination pattern but the root (top-level) one. The issue exposed itself while selecting G_FPTOSI for armv7: the corresponding pattern generates VTOSIZS wrapped into COPY_TO_REGCLASS, so top-level COPY_TO_REGCLASS gets properly constrained, while nested VTOSIZS (or rather its destination virtual register to be exact) does not. Fixing this by issuing GIR_ConstrainSelectedInstOperands for every nested GIR_BuildMI. https://bugs.llvm.org/show_bug.cgi?id=35965 rdar://problem/36886530 Patch by Roman Tereshin Reviewers: dsanders, qcolombet, rovka, bogner, aditya_nandakumar, volkan Reviewed By: dsanders, qcolombet, rovka Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42565 llvm-svn: 323692
* [DWARFv5] Re-enable dumping a line table with no CU.Paul Robinson2018-01-293-21/+37
| | | | | | | | | | | r323476 added support for DW_FORM_line_strp, and incorrectly made that depend on having a DWARFUnit available. We shouldn't be tracking .debug_line_str in DWARFUnit after all. After this patch, I can do an NFC follow up and undo a bunch of the "plumbing" part of r323476. Differential Revision: https://reviews.llvm.org/D42609 llvm-svn: 323691
* [X86] Avoid using high register trick for test instructionAmaury Sechet2018-01-294-68/+22
| | | | | | | | | | | | | | | Summary: It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger. The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 . Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42646 llvm-svn: 323690
* [globalisel][legalizer] Change identity() to changeTo() to clarify that it ↵Daniel Sanders2018-01-291-1/+1
| | | | | | | | | | | | | changes things. NFC Prior to committing r323681, we decided to change pick() to identity() since it wasn't clear from the name what pick() did. However, identity() isn't a very good name either since it implies that no changes are made. For some reason, naming it changeTo() didn't occur to me until just after the commit. This should resolve the lack of clarity that pick() had while still implying that it changes the MIR. llvm-svn: 323689
* [CodeGen] Simplify conditional. NFCShoaib Meenai2018-01-291-1/+1
| | | | | | | | | | | Rafael pointed out that `hasInternalLinkage() || hasPrivateLinkage()` is equivalent to `hasLocalLinkage()` in post-commit review. I'm intentionally not updating the comment, partly because I like it being explicit, and partly because "global symbols with local linkage" sounds like an oxymoron. llvm-svn: 323688
* [AArch64] Change the filename of the Exynos M1 scheduling defsEvandro Menezes2018-01-292-3/+3
| | | | | | After request by Matthias Braun in https://reviews.llvm.org/D42387. llvm-svn: 323686
* Revert "AArch64: Omit callframe setup/destroy when not necessary"Jun Bum Lim2018-01-291-13/+5
| | | | | | | | This reverts commit r322917 due to multiple performance regressions in spec2006 and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially motivated this change. llvm-svn: 323683
* [globalisel][legalizer] Adapt LegalizerInfo to support inter-type ↵Daniel Sanders2018-01-295-304/+484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependencies and other things. Summary: As discussed in D42244, we have difficulty describing the legality of some operations. We're not able to specify relationships between types. For example, declaring the following setAction({..., 0, s32}, Legal) setAction({..., 0, s64}, Legal) setAction({..., 1, s32}, Legal) setAction({..., 1, s64}, Legal) currently declares these type combinations as legal: {s32, s32} {s64, s32} {s32, s64} {s64, s64} but we currently have no means to say that, for example, {s64, s32} is not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/ G_UNMERGE_VALUES have relationships between the types that are currently described incorrectly. Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics differently to atomics. The necessary information is in the MMO but we have no way to use this in the legalizer. Similarly, there is currently no way for the register type and the memory type to differ so there is no way to cleanly represent extending-load/truncating-store in a way that can't be broken by optimizers (resulting in illegal MIR). It's also difficult to control the legalization strategy. We've added support for legalizing non-power of 2 types but there's still some hardcoded assumptions about the strategy. The main one I've noticed is that type0 is always legalized before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if you need to widen the container. It will converge on the same result eventually but it will take a much longer route when legalizing type0 than if you legalize type1 first. Lastly, the definition of legality and the legalization strategy is kept separate which is not ideal. It's helpful to be able to look at a one piece of code and see both what is legal and the method the legalizer will use to make illegal MIR more legal. This patch adds a layer onto the LegalizerInfo (to be removed when all targets have been migrated) which resolves all these issues. Here are the rules for shift and division: for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV}) getActionDefinitions(BinOp) .legalFor({s32, s64}) // If type0 is s32/s64 then it's Legal .clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32 // If type0 is >s64 then NarrowScalar to s64 .widenScalarToPow2(0) // Round type0 scalars up to powers of 2 .unsupported(); // Otherwise, it's unsupported This describes everything needed to both define legality and describe how to make illegal things legal. Here's an example of a complex rule: getActionDefinitions(G_INSERT) .unsupportedIf([=](const LegalityQuery &Query) { // If type0 is smaller than type1 then it's unsupported return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits(); }) .legalIf([=](const LegalityQuery &Query) { // If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal // We don't need to worry about large type1's because unsupportedIf caught that. const LLT &Ty0 = Query.Types[0]; const LLT &Ty1 = Query.Types[1]; if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0) return false; return isPowerOf2_32(Ty1.getSizeInBits()) && (Ty1.getSizeInBits() == 1 || Ty1.getSizeInBits() >= 8); }) .clampScalar(0, s32, s64) .widenScalarToPow2(0) .maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16 .maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32 .widenScalarToPow2(1) // Round type1 scalars up to powers of 2 .unsupported(); This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than type1 (in practice, this would be a default rule for G_INSERT). It also uses one to describe the legal cases. This particular predicate is equivalent to: .legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}}) In terms of performance, I saw a slight (~6%) performance improvement when AArch64 was around 30% ported but it's pretty much break even right now. I'm going to take a look at constexpr as a means to reduce the initialization cost. Future work: * Make it possible for opcodes to share rulesets. There's no need for G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's no technical barrier to this, it just hasn't been done yet. * Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64) * Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules. * Improve initialization cost using constexpr Possible future work: * It's possible to make these rulesets change the MIR directly instead of returning a description of how to change the MIR. This should remove a little overhead caused by parsing the description and routing to the right code, but the real motivation is that it removes the need for LegalizeAction::Custom. With Custom removed, there's no longer a requirement that Custom legalization change the opcode to something that's considered legal. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner Reviewed By: bogner Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42251 llvm-svn: 323681
* [MachineVerifier] Add check that renamable operands aren't reserved registers.Geoff Berry2018-01-291-6/+8
| | | | | | | | | | | | Summary: Reviewers: qcolombet, MatzeB Subscribers: arsenm, sdardis, nhaehnle, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D42449 llvm-svn: 323676
* [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regsGeoff Berry2018-01-294-4/+26
| | | | | | | | | Summary: Fix a few places that were modifying code after register allocation to set the renamable bit correctly to avoid failing the validation added in D42449. llvm-svn: 323675
* Move getPlatformFlags to ELFObjectFileBase and simplify.Rafael Espindola2018-01-292-12/+9
| | | | | | | This removes a few std::error_code results that were ignored on every call. llvm-svn: 323674
* [X86] Don't create SHRUNKBLEND when the condition is used by the true or ↵Craig Topper2018-01-291-2/+3
| | | | | | | | | | false operand of the vselect. Fixes PR34592. Differential Revision: https://reviews.llvm.org/D42628 llvm-svn: 323672
* [globalisel] Make LegalizerInfo::LegalizeAction available outside of ↵Daniel Sanders2018-01-296-51/+59
| | | | | | | | | | | | LegalizerInfo. NFC Summary: The improvements to the LegalizerInfo discussed in D42244 require that LegalizerInfo::LegalizeAction be available for use in other classes. As such, it needs to be moved out of LegalizerInfo. This has been done separately to the next patch to minimize the noise in that patch. llvm-svn: 323669
* [AccelTable] Workaround for MSVC bugJonas Devlieghere2018-01-291-5/+32
| | | | | | | | Microsoft Visual Studio rejects the static constexpr static list of atoms even though it's valid C++. This provides a workaround to unbreak the bots. llvm-svn: 323667
* [SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle.Alexey Bataev2018-01-291-133/+382
| | | | | | | | | | | | | | | | | Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662
* [AccelTable] Try making MSVC happyJonas Devlieghere2018-01-291-4/+4
| | | | | | | | MSVC complains that the constexpr "expression did not evaluate to a constant". Trying to make it happy by adding a `const` specifier as suggested in https://stackoverflow.com/questions/37574343. llvm-svn: 323659
* [dsymutil] Generate Apple accelerator tablesJonas Devlieghere2018-01-291-0/+14
| | | | | | | | | | | This patch adds support for generating accelerator tables in dsymutil. This feature was already present in our internal repository but not yet upstreamed because it requires changes to the Apple accelerator table implementation. Differential revision: https://reviews.llvm.org/D42501 llvm-svn: 323655
* [NFC] Rename DwarfAccelTable and move header.Jonas Devlieghere2018-01-295-391/+6
| | | | | | | | | | This patch renames DwarfAccelTable.{h,cpp} to AccelTable.{h,cpp} and moves the header to the include dir so it is accessible by the dsymutil implementation. Differential revision: https://reviews.llvm.org/D42529 llvm-svn: 323654
* [NFC] Refactor Apple Accelerator TablesJonas Devlieghere2018-01-294-318/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the way data is stored in the accelerator table and makes them truly generic. There have been several attempts to do this in the past: - D8215 & D8216: Using a union and partial hardcoding. - D11805: Using inheritance. - D42246: Using a callback. In the end I didn't like either of them, because for some reason or another parts of it felt hacky or decreased runtime performance. I didn't want to completely rewrite them as I was hoping that we could reuse parts for the successor in the DWARF standard. However, it seems less and less likely that there will be a lot of opportunities for sharing code and/or an interface. Originally I choose to template the whole class, because it introduces no performance overhead compared to the original implementation. We ended up settling on a hybrid between a templated method and a virtual call to emit the data. The motivation is that we don't want to increase code size for a feature that should soon be superseded by the DWARFv5 accelerator tables. While the code will continue to be used for compatibility, it won't be on the hot path. Furthermore this does not regress performance compared to Apple's internal implementation that already uses virtual calls for this. A quick summary for why these changes are necessary: dsymutil likes to reuse the current implementation of the Apple accelerator tables. However, LLDB expects a slightly different interface than what is currently emitted. Additionally, in dsymutil we only have offsets and no actual DIEs. Although the patch suggests a lot of code has changed, this change is pretty straightforward: - We created an abstract class `AppleAccelTableData` to serve as an interface for the different data classes. - We created two implementations of this class, one for type tables and one for everything else. There will be a third one for dsymutil that takes just the offset. - We use the supplied class to deduct the atoms for the header which makes the structure of the table fully self contained, although not enforced by the interface as was the case for the fully templated approach. - We renamed the prefix from DWARF- to Apple- to make space for the future implementation of .debug_names. This change is NFC and relies on the existing tests. Differential revision: https://reviews.llvm.org/D42334 llvm-svn: 323653
* [AMDGPU][MC] Corrected parsing of image opcode modifiers r128 and d16Dmitry Preobrazhensky2018-01-292-2/+14
| | | | | | | | | | | See bugs 36092, 36093: https://bugs.llvm.org/show_bug.cgi?id=36092 https://bugs.llvm.org/show_bug.cgi?id=36093 Differential Revision: https://reviews.llvm.org/D42583 Reviewers: vpykhtin, artem.tamazov, arsenm llvm-svn: 323651
* Fix windows test failure caused by r323638Pavel Labath2018-01-291-1/+1
| | | | | | | | | | | | | | The test was failing because of an incorrect sizeof check in the name index parsing code. This code was meant to check that we have enough input to parse the fixed-size part of the dwarf header, which it did by comparing the input to sizeof(Header). Originally struct Header only contained the fixed-size part, but during review, we've moved additional members into it, which rendered the sizeof check invalid. I resolve this by moving the fixed-size part to a separate struct and updating the sizeof-expression to use that. llvm-svn: 323648
* [AArch64][AsmParser] NFC: Generalize LogicalImm[Not](32|64) codeSander de Smalen2018-01-295-72/+38
| | | | | | | | | | | | | | | | Summary: All variants of isLogicalImm[Not](32|64) can be combined into a single templated function, same for printLogicalImm(32|64). By making it use a template instead, further SVE patches can use it for other data types as well (e.g. 8, 16 bits). Reviewers: fhahn, rengolin, aadg, echristo, kristof.beyls, samparker Reviewed By: samparker Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42294 llvm-svn: 323646
* [DebugInfo] Fix fragment offset emission order for symbol locationsMikael Holmen2018-01-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When emitting the location for a global variable with fragmented debug expressions, make sure that the offset pieces, which represent optimized-out parts of the variable, are emitted before their succeeding fragments' expressions. Previously, if the succeeding fragment's location was a symbol, the offset piece was emitted after, rather than before, that symbol's expression. This effectively meant that the symbols were associated with the wrong parts of the variable. This fixes PR36085. Patch by: David Stenberg Reviewers: aprantl, probinson, dblaikie Reviewed By: aprantl Subscribers: JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D42527 llvm-svn: 323644
* [Sparc] Account for bias in stack readjustmentJonas Devlieghere2018-01-291-5/+24
| | | | | | | | | | | | | | | | | | | Summary: This was broken long ago in D12208, which failed to account for the fact that 64-bit SPARC uses a stack bias of 2047, and it is the *unbiased* value which should be aligned, not the biased one. This was seen to be an issue with Rust. Patch by: jrtc27 (James Clarke) Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: jacob_hansen, JDevlieghere, fhahn, fedor.sergeev, llvm-commits Differential Revision: https://reviews.llvm.org/D39425 llvm-svn: 323643
* Fix build broken by r323641Pavel Labath2018-01-291-1/+1
| | | | | | | | The call to ScopedPrinter::printNumber with size_t argument was ambiguous (I think) on 32-bit builds. Explicitly cast to a 64-bit int to avoid this. llvm-svn: 323642
* Refactor dwarfdump -apple-names outputPavel Labath2018-01-291-56/+79
| | | | | | | | | | | | | | | | | | Summary: This modifies the dwarfdump output to align it with the new .debug_names dump. It also renames two header fields to match similar fields in the dwarf5 header. A couple of tests needed to be updated to match new output. The changes were fairly straight-forward, although not really automatable. Reviewers: JDevlieghere, aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42415 llvm-svn: 323641
* [ARM] FP16Pat and FullFP16Pat patterns. NFC.Sjoerd Meijer2018-01-292-10/+16
| | | | | | | | | Create and use FP16Pat FullFP16Pat helper patterns to make the difference explicit. Differential Revision: https://reviews.llvm.org/D42634 llvm-svn: 323640
* [DebugInfo] Basic .debug_names dumping supportPavel Labath2018-01-292-18/+438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit renames DWARFAcceleratorTable to AppleAcceleratorTable to free up the first name as an interface for the different accelerator tables. Then I add a DWARFDebugNames class for the dwarf5 table. Presently, the only common functionality of the two classes is the dump() method, because this is the only method that was necessary to implement dwarfdump -debug-names; and because the rest of the AppleAcceleratorTable interface does not directly transfer to the dwarf5 tables (the main reason for that is that the present interface assumes the tables are homogeneous, but the dwarf5 tables can have different keys associated with each entry). I expect to make the common interface richer as I add more functionality to the new class (and invent a way to represent it in generic way). In terms of sharing the implementation, I found the format of the two tables sufficiently different to frustrate any attempts to have common parsing or dumping code, so presently the implementations share just low level code for formatting dwarf constants. Reviewers: vleschuk, JDevlieghere, clayborg, aprantl, probinson, echristo, dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42297 llvm-svn: 323638
* [X86FixupBWInsts] Fix miscompilation if sibling sub-register is live.Andrei Elovikov2018-01-291-9/+10
| | | | | | | | | | | | | | Summary: The issues was found during D40524. Reviewers: andrew.w.kaylor, craig.topper, MatzeB Reviewed By: andrew.w.kaylor Subscribers: aivchenk, llvm-commits Differential Revision: https://reviews.llvm.org/D42533 llvm-svn: 323635
* [AArch64] Generate the CASP instruction for 128-bit cmpxchgOliver Stannard2018-01-292-3/+91
| | | | | | | | | | | | | | | | The Large System Extension added an atomic compare-and-swap instruction that operates on a pair of 64-bit registers, which we can use to implement a 128-bit cmpxchg. Because i128 is not a legal type for AArch64 we have to do all of the instruction selection in C++, and the instruction requires even/odd register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM backend. Differential revision: https://reviews.llvm.org/D42104 llvm-svn: 323634
* [ThinLTO] - Stop internalizing and drop non-prevailing symbols.George Rimar2018-01-294-28/+90
| | | | | | | | | | | Implementation marks non-prevailing symbols as not live in the summary. Then them are dropped in backends. Fixes https://bugs.llvm.org/show_bug.cgi?id=35938 Differential revision: https://reviews.llvm.org/D42107 llvm-svn: 323633
* [X86] Make foldLogicOfSetCCs work better for vectors pre legal types/operationsCraig Topper2018-01-291-1/+1
| | | | | | | | | | | | | | | | | Summary: There's a check in the code to only check getSetCCResultType after LegalOperations or if the type is MVT::i1. But the i1 check is only allowing scalar types through. I think it should check that the scalar type is MVT::i1 so that it will work for vectors. The changed test already does this combine with AVX512VL where getSetCCResultType returns vXi1. But with avx512f and no VLX getSetCCResultType returns a type matching the width of the input type. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42619 llvm-svn: 323631
* [CVP] Don't Replace incoming values from unreachable blocks with undef.Davide Italiano2018-01-291-24/+4
| | | | | | | | | | This pretty much reverts r322006, except that we keep the test, because we work around the issue exposed in a different way (a recursion limit in value tracking). There's still probably some sequence that exposes this problem, and the proper way to fix that for somebody who has time is outlined in the code review. llvm-svn: 323630
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-2911-11/+11
| | | | | | "to to" -> "to" llvm-svn: 323628
* [InlineCost] Mark functions accessing varargs as not viable.Florian Hahn2018-01-281-6/+12
| | | | | | | | | | | | | This prevents functions accessing varargs from being inlined if they have the alwaysinline attribute. Reviewers: efriedma, rnk, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D42556 llvm-svn: 323619
* [Support] Move DJB hash to support. NFCJonas Devlieghere2018-01-285-8/+25
| | | | | | | | | | | This patch moves the DJB hash to support. This is consistent with other hashing algorithms living there. The hash is used by the DWARF accelerator tables. We're doing this now because the hashing function is needed by dsymutil and we don't want to link against libBinaryFormat. Differential revision: https://reviews.llvm.org/D42594 llvm-svn: 323616
* [X86] Fix a crash that can occur in combineExtractVectorElt due to not ↵Craig Topper2018-01-281-2/+3
| | | | | | checking the width of a ConstantSDNode before calling getConstantOperandVal. llvm-svn: 323614
* [X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm ↵Craig Topper2018-01-285-93/+76
| | | | | | eq/ne with immallzeros. llvm-svn: 323612
* [X86] Add patterns for using masked vptestnmd for 256-bit vectors without VLX.Craig Topper2018-01-271-13/+24
| | | | | | We can widen the mask and extract it back down. llvm-svn: 323610
* [X86] Use vptestm/vptestnm for comparisons with zero to avoid creating a ↵Craig Topper2018-01-271-0/+7
| | | | | | | | | | | | zero vector. We can use the same input for both operands to get a free compare with zero. We already use this trick in a couple places where we explicitly create PTESTM with the same input twice. This generalizes it. I'm hoping to remove the ISD opcodes and move this to isel patterns like we do for scalar cmp/test. llvm-svn: 323605
* [X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and ↵Craig Topper2018-01-275-43/+33
| | | | | | | | | | pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604
* [X86][SSE] Simplify demanded elements from BROADCAST shuffle source.Simon Pilgrim2018-01-271-0/+30
| | | | | | | | If broadcasting from another shuffle, attempt to simplify it. We can probably generalize this a lot more (embedding in combineX86ShufflesRecursively), but BROADCAST is one of the more troublesome as it accepts inputs of different sizes to the result. llvm-svn: 323602
* Add IRBuilder API to create memcpy/memmove calls with differing source and ↵Daniel Neilson2018-01-272-15/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | dest alignments Summary: This change is step two in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 323597
* [TargetLowering] Teach TargetLowering::SimplifySetCC to simplify setcc of ↵Craig Topper2018-01-271-14/+16
| | | | | | | | vXi1 vectors into logic ops. This transform was already being done for setcc of scalar i1. This extends it to vectors. llvm-svn: 323585
* [SelectionDAG] Make DAGTypeLegalizer::PromoteSetCCOperands handle ↵Craig Topper2018-01-271-4/+4
| | | | | | | | SETEQ/SETNE correctly for vector types. The code was using getValueSizeInBits and combining with the result of a call to DAG.ComputeNumSignBits. But for vector types getValueSizeInBits returns the width of the full vector while ComputeNumSignBits is going to give a number no larger than the width of a single element. So we should be using getScalarValueSizeInBits to get the element width. llvm-svn: 323583
* [GlobalISel][Legalizer] Convert the FP constants to the right APFloat type ↵Amara Emerson2018-01-271-1/+18
| | | | | | | | | | | for G_FCONSTANT. We weren't converting the immediate ConstantFP during legalization, which caused the wrong bit patterns to be emitted for half type FP constants. Fixes PR36106. llvm-svn: 323582
* Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements ↵Alexey Bataev2018-01-271-369/+131
| | | | | | | | as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581
* Revert "[SLP] Removed the warning about unused variable, NFC."Alexey Bataev2018-01-271-1/+1
| | | | | | This reverts commit r323533 to fix possible problems in users code. llvm-svn: 323580
OpenPOWER on IntegriCloud